[HN Gopher] I am endlessly fascinated with content tagging systems
       ___________________________________________________________________
        
       I am endlessly fascinated with content tagging systems
        
       Author : redbar0n
       Score  : 299 points
       Date   : 2022-10-18 15:07 UTC (7 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | fleddr wrote:
       | As many commenters have mentioned (as does the article)
       | hierarchical tags are a pain, if not an impossibility to get
       | right. Related tags, though, can be done on the cheap and are
       | surprisingly powerful, fun and cool under the right conditions.
       | 
       | Say you have a massive database of photos, each photo having
       | tags. As example we'll use the tag "United States", which is used
       | as a tag on 50,000 photos. Next, you go over each of those 50,000
       | photos and check which other tags were used, and sort them by
       | occurrence.
       | 
       | This reveals useful and often surprising implicit relations
       | between tags. The relation can be of any type, hierarchical or
       | otherwise. It reveals relations never explicitly mapped or
       | maintained. It's organic, which kind of fits the philosophy of
       | tagging.
        
       | VectorLock wrote:
       | One example of an unexpectedly rich and deep tagging ontology is
       | the Danbooru "Anime" image board [NSFW]
       | https://danbooru.donmai.us/
        
         | TazeTSchnitzel wrote:
         | There is a safe-for-work, or at least _safer_ -for-work version
         | of the site: https://safebooru.donmai.us/
         | 
         | (It is of course based on the tagging system: every post is
         | tagged by its "safeness" level.)
        
         | system2 wrote:
         | I know this is not reddit. But why do you know even know this
         | link and its tagging system...
        
           | VectorLock wrote:
           | I'm not scared away by things that might offend the
           | puritanically inclined and I'm interested in ontologies and
           | this is a fascinating one.
           | 
           | There was some drama about someone training a Stable-
           | Diffusion-alike by ripping their dataset that brought it to
           | my attention.
        
           | thrdbndndn wrote:
           | Danbooru is one of the most popular anime image board.
           | 
           | Anyone who's into Anime (not just for hentai) probably knows.
        
         | thrdbndndn wrote:
         | Yeah, danbooru or similar image boards basically have all the
         | things talked in this tweet thread.
         | 
         | They have tag aliases, meta-tags and so-called "tag
         | implications".
         | 
         | The last one is basically sub-tags but with more flexibility
         | and dead simple to implement: if A implicates B, then tagging
         | an image with A will automatically tag it with B. So you can
         | tag "American Male Novelist", and then the system will
         | automatically add "American", "Male", "Novelist", "Writer",
         | etc. (after such implications were added).
         | 
         | It much easier than Wikipedia's categories, but Wikipedia's way
         | is of course intentional because categories is meant to have a
         | stronger hierarchy than mere tags.
        
       | subpar wrote:
       | I've done this professionally in a couple different settings,
       | from building topic classifiers for news events (it is sometimes
       | hard to know when one news event should stop and another start)
       | to creating tagging systems for audio recordings of group
       | conversations (where topics often merge in and out of each other,
       | often within a single sentence).
       | 
       | I'm currently working on classifying non-speech, non-musical
       | sound and it can be useful to piggyback on an existing knowledge
       | system, though they tend to be industry-specific. As an example,
       | Google's ontology for sound identification [1] is a nice starting
       | point for general classification, whereas the taxonomy [2] used
       | by the audio post-production industry (sound effects, foley, etc)
       | is structurally quite different (which isn't surprising, but it
       | sure is fun!). From a totally different field (electro-acoustic
       | composition), the work of Michel Chion and Pierre Schaeffer [3]
       | add psychoacoustic elements to more traditional measurable
       | characteristics, i.e. how the sound is perceived and comprehended
       | is just as important as its medium of travel and its source. It
       | is helpful to see what others have done before you so you can
       | pick and choose elements of their work to incorporate into your
       | own.
       | 
       | 1: https://github.com/audioset/ontology
       | 
       | 2: https://docs.google.com/spreadsheets/d/1b2UhKpcOAE-
       | jd1edOsxC...
       | 
       | 3: [big pdf!]
       | https://monoskop.org/images/0/01/Chion_Michel_Guide_To_Sound...
        
       | polote wrote:
       | A big miss on the list, is that words (so a tag) do not mean the
       | same things for each people and do not even mean the same things
       | in different contexts
        
       | heliophobicdude wrote:
       | My similar issue is with names in source code.
       | 
       | Fuzzy matching names and interrogating the contributor about the
       | changes being checked in. Questions to ask the contributor, are
       | the names similar to any of these other names? Is there an
       | opportunity to use the same name or are they different concepts?
       | 
       | Code grows and grows and becomes harder to grep if inconsistently
       | naming things.
        
       | pessimizer wrote:
       | > It gets even more complex if tags can have multiple parents,
       | like Wikipedia categories. "American Male Novelists" is a subtag
       | of "American Male Writers" and "American Novelists". Now we have
       | diamond problems, redundancy, a whole host of other edge cases.
       | 
       | I don't understand this problem. I would think that you would
       | have
       | 
       | tag:american
       | 
       | tag:male
       | 
       | tag:novelist
       | 
       | tag:writer,
       | 
       | and tag:novelist would itself be tagged as tag:writer, because
       | all novelists are writers.
        
       | openfuture wrote:
       | Why twitter man.. these questions are clearly important but there
       | is a space to discuss them
       | https://matrix.to/#/#datalisp:matrix.org
        
         | googlryas wrote:
         | I had to click like 5 links from that link in order to get to a
         | site which requires me to sign in before allowing me to see the
         | content. I still have no idea what I'm supposed to be seeing.
         | And no idea what the connection between "datalisp" and content
         | tagging systems is.
         | 
         | Maybe that's why twitter man?
        
         | hoherd wrote:
         | Seriously. I'm not a twitter fan, but even so, it's a short-
         | form medium. Why do people abuse it like this, especially with
         | great content? What's so bad about tweeting a link to a blog?
         | 
         | Anyhow, I use threadereaderapp to get through the frustrating
         | twitter UI and the ways that it is abused:
         | https://threadreaderapp.com/thread/1534301374166474752.html
        
           | modriano wrote:
           | > Why do people abuse it like this, especially with great
           | content?
           | 
           | Probably to get a wider audience to actually read and engage
           | with the ideas, and to crowdsource relevant information from
           | said audience.
           | 
           | > What's so bad about tweeting a link to a blog?
           | 
           | Probably an 80%+ reduction (total guess) in the number of
           | people who engage directly with the content and author.
        
       | asdff wrote:
       | I don't know how people deal with tags. It adds so much friction
       | to me. Naming tags, deciding what rules this tag is supposed to
       | have, deciding what stuff is tagged. I tried the firm approach of
       | being extremely discrete with tags and it took a lot of effort,
       | and I've tried the loose approach of tagging things if they are
       | even slightly related which imo defeated the whole purpose of
       | organizing things to make it easy to find them later if a lot of
       | tangentially related things share the same tags.
       | 
       | Folders seem a lot more straightforward for me at least, and if I
       | need something in two places at once, there's always ln -s
        
       | PaulHoule wrote:
       | Maybe it's the project I am working on but right now I see the
       | ideal search interface to be something like an OWL class axiom,
       | that is, I am searching for instances of a class that has the
       | following restrictions                  * subclass of Actor
       | * subclass of Singer        * has been in at least 7 movies
       | * was born after December 3, 1980        * has been married to at
       | most 3 other people
       | 
       | these can be intersected, unioned, complemented, etc.
        
         | somat wrote:
         | It sounds like what you want is SQL.
         | 
         | There is no good solution for the cultural problem that a
         | written language is somehow unsuitable for end users. but
         | personally I have spent way too many hours trying to make a
         | search interface only to realize at the end that not only is my
         | interface complicated and hard to use it still has only a
         | fraction of the descriptive power a sql query has. At times I
         | am tempted to make full use of the built in database
         | permissions and let the user just type queries directly. but
         | this suggestion is always vetoed.
        
       | aaron695 wrote:
        
       | system2 wrote:
       | I am increasingly hating twitter being used for blogging.
        
       | Archelaos wrote:
       | Those interested in the state of the art of professional tagging
       | systems in culture heritage may have a look into the CIDOC
       | Conceptual Reference Model (CRM): https://www.cidoc-crm.org/
        
       | counttheforks wrote:
       | Anyone have a suggestion for a tagging filesystem that is
       | maintained? Or if not a filesystem, something that at least
       | works? I still feel like this is the best way to organize
       | personal photos and media, and while https://www.tagsistant.net/
       | is pretty good it hasn't been updated in 6 years and is fairly
       | buggy.
        
         | btrettel wrote:
         | I haven't tried it yet but https://tmsu.org/ is actively
         | maintained and looks nice.
        
         | somat wrote:
         | unix has tags, they are known as hardlinks.
        
         | hwayne wrote:
         | I just gave up and mimicked tags with symlinks and subfolders.
         | ie "foo" is tagged "todo" if there's a symlink to it in
         | "Tags/todo/".
         | 
         | It works surprisingly well, since I can manage it with standard
         | shell scripting.
        
         | comfypotato wrote:
         | Dr. Karl Voit did his dissertation designing a tag-based file
         | system. I don't know what the status is today, but the
         | dissertation itself may be a decent place to start your search.
         | 
         | https://karl-voit.at/tagstore/en/papers.shtml
        
         | greggman3 wrote:
         | MacOS has tags. Right click any file in finder, select
         | "Tags..."
         | 
         | No idea if they are implemented at a filesystem level but there
         | are various tools for finding things by tag
        
       | jrochkind1 wrote:
       | > I can't find anything on how to design and implement anymore
       | more than the barebones basics of a system.
       | 
       | All of this stuff (horse/horses etc) is extensively discussed,
       | maybe look under "taxonomy" or "ontology".
       | 
       | Now, whether you want to use any of those solutions or not or
       | find the discussion useful or not... if you aren't finding
       | anything about it at all, you aren't looking in the right places.
       | 
       | (I learned about it in librarian school)
        
         | avgcorrection wrote:
         | Librarians are the people that we (technologists) should learn
         | from. But all I see is programmers trying to invent things from
         | first principles.
        
           | jrochkind1 wrote:
           | Eh, as the librarian who wrote the post you're replying to...
           | I am actually ambivalent.
           | 
           | I wish librarianship as a field and industry were more what
           | I'd fantasize it should/could be, but it's not so much.
        
         | edflsafoiewq wrote:
         | Can you link some resources about it then?
        
           | meej wrote:
           | This is a good basic overview, goes beyond tagging/indexing,
           | was the textbook in LIS501 Information Organization and
           | Access at UIUC-GSLIS (now the iSchool at Illinois) in 2006:
           | 
           | https://mitpress.mit.edu/9780262512619/the-intellectual-
           | foun...
           | 
           | Controlled vocab standards:
           | 
           | https://www.niso.org/publications/ansiniso-z3919-2005-r2010
           | 
           | (this one is deprecated in favor the one that follows)
           | 
           | https://www.niso.org/schemas/iso25964
           | 
           | https://www.w3.org/2004/02/skos/
           | 
           | The book we used in my thesaurus construction class at UIUC:
           | 
           | https://www.alastore.ala.org/content/essential-thesaurus-
           | con...
           | 
           | My favorite intro to semantic modeling with RDF/OWL/SPARQL:
           | 
           | http://workingontologist.org/
           | 
           | Topic Maps are dead but i still have a soft spot for them:
           | 
           | https://www.isotopicmaps.org/
           | 
           | I also recommend Heather Hedden, linked in jrockhind's post.
        
           | Tomte wrote:
           | This is German, but I found it very good:
           | 
           | https://www.isi.hhu.de/fileadmin/redaktion/Fakultaeten/Philo.
           | ..
           | 
           | Books:
           | 
           | * Cataloging the World
           | 
           | * Organising Knowledge. Taxonomies, Knowledge and
           | Organisational Effectiveness
           | 
           | * The Intellectual Foundation of Information Organization
           | 
           | * The Oxford Guide to Library Research
        
           | jrochkind1 wrote:
           | I could, but honestly I'd just be googling "taxonomy". But ok
           | that's not entirely true, I know how to refine my search and
           | recognize when something is what I'm thinking of, from some
           | familiarity with the field.
           | 
           | (But if you want to look around, in addition to "taxonomy"
           | and "ontology", other good terms are "information
           | architecture" and "controlled vocabulary").
           | 
           | These are not things I have vetted, this is literally just me
           | googling and taking a quick skim...
           | 
           | https://blog.optimalworkshop.com/how-to-develop-a-
           | taxonomy-f...
           | 
           | https://www.uxbooth.com/articles/introduction-to-taxonomies/
           | 
           | https://www.nngroup.com/articles/taxonomy-101/
           | 
           | http://accidental-taxonomist.blogspot.com/2020/11/what-it-
           | th...
           | 
           | Or how about some textbooks:
           | 
           | https://narrowgaugebooks.indielite.org/book/9781627055802
           | 
           | https://www.hedden-information.com/accidental-taxonomist/
        
           | tantalor wrote:
           | https://en.wikipedia.org/wiki/Library_and_information_scienc.
           | ..
           | 
           | https://en.wikipedia.org/wiki/Tag_(metadata)
        
         | chaostheory wrote:
         | Yeah, the content for learning has been around for over a
         | decade or mor
         | 
         | Plus we have plenty of content for AI now
         | 
         | https://towardsdatascience.com/machine-learning-classifiers-...
        
         | samastur wrote:
         | The problem isn't knowing what the problem is (taxonomy and
         | ontology), but how to implement it effectively.
         | 
         | I've seen enough of Hillel's posts over the years that I am
         | fairly sure he is aware of taxonomy/ontology too.
        
         | pvg wrote:
         | _(I learned about it in librarian school)_
         | 
         | As the rest of us learned during the first tagging boom, the
         | librarian is the natural apex predator of tagging.
        
           | stinkytaco wrote:
           | I've been a librarian for more than 15 years and I can only
           | speak from personal experience when I say that I am the apex
           | predator of nothing. Every once and a while I will get it in
           | my head to systematize my personal knowledge base with a
           | controlled vocabulary and ontology and I just fall on my
           | face. I really want it for some twisted reason, though.
           | 
           | Turns out LC subject headings -- for all their failures --
           | are pretty good.
        
         | lofatdairy wrote:
         | To be fair to OP, the biggest hurdle in learning anything is
         | knowing what questions to ask. When you don't have ontology as
         | part of your vocabulary it's hard to find literature regarding,
         | say, "comparison of ontologies for user-generated text
         | content".
         | 
         | I suppose this flows back into library science, which is all
         | about systematizing where to look for answers to questions, but
         | I'm always astonished to find that there's oceans of literature
         | and research in questions I haven't even thought to ask.
        
           | vonseel wrote:
           | I think OP is referring to finding software-engineering
           | related design discussions surrounding tagging systems, but
           | yes, I'm sure there is a great depth of ontology material and
           | librarian knowledge that could add to software system
           | designs.
        
       | josefrichter wrote:
       | I was fascinated by ontologies 10 years ago. Since then, I've
       | been studying human brain, only to realize that this is an effort
       | to basically build a software version of human brain. Maybe it's
       | possible, but it's definitely not feasible in 99.9% of cases. The
       | closest thing we have is some machine learning approaches.
        
       | pphysch wrote:
       | My current solution to this problem is just putting a JSONB
       | column in relevant tables. GIN indexes do the heavy lifting as
       | needed.
       | 
       | This lets us implement arbitrary, queryable ontologies on top of
       | the data without requiring further database instrumentation
       | (aside from creating an index now and then).
        
       | flanked-evergl wrote:
       | Look at wikidata, RDF and semantic web. This is somewhat a well
       | solved problem that should not be solved differently again.
        
       | tinco wrote:
       | If you're at the point where you're adding hierarchies to your
       | tags, I think you're fighting a losing battle. At that point, why
       | not do what Google does and just make a BERT embedding. No way
       | you're going to manually achieve the full extent of complexity of
       | how humans group and describe things.
        
       | didip wrote:
       | If you don't want to think too hard, just funnel the tags
       | information into a search engine like Elastic Search.
       | 
       | It already handles stemming, stop words, aliases, etc.
        
       | xg15 wrote:
       | I worked with the Wikipedia category system a few years ago, and
       | you could see the problems with hierarchical tagging systems
       | right in action back then. (Though it may have gotten better in
       | the meantime)
       | 
       | The system appeared simple: There were just two relations,
       | "Article A is a member of category B" and "Category X is a
       | subcategory of category Y".
       | 
       | However, in practice, the community was using this system to
       | represent a whole host of wildly different relationships between
       | items, often with different implications what a category actually
       | applied to.
       | 
       | E.g., if A has a subcategory B, this could mean one of several
       | things: B might be an additional constraint on the items in A
       | ("American writers" -> "19th century American writers"), the
       | _things_ in B might be more specific than the things in A: (
       | "Writers" -> "Novelists"), A might apply to the _concept_ B, not
       | the things in B ( "Occupations" -> "Writers") or A might refer to
       | the _category_ B ( "Categories with more than 100 entries" ->
       | "Writers") and on and on...
       | 
       | Of course those different aspects could even be combined. E.g.
       | "Categories with more than 100 entries" might have a child
       | "Categories with more than 100 entries in need of review", which
       | represents a constraint but might itself contain less than 100
       | entries...
       | 
       | The basic question "Is item X in category Y" becomes impossible
       | to answer generally, because there is no clear indication if a
       | category only applies to its direct children or to all of its
       | descendants or only to the subcategories itself.
       | 
       | I'm sure there are sophisticated ontological systems which would
       | allow users to specify all those different relationships
       | separately. I'm also pretty sure that users would become sloppy
       | after a short time or would disagree which particular
       | relationship to use in a particular situation...
        
         | crazygringo wrote:
         | I encountered the same problem a few years ago and indeed
         | realized that using categories to understand what type of
         | article a thing was (person? subject? event?) was utterly
         | useless, for the reasons you describe.
         | 
         | On the other hand, I discovered that infoboxes (the data in the
         | top-right box on most pages) was generally extremely reliable,
         | if frustrating to parse.
        
           | maxbond wrote:
           | The infoboxes are created from a query to Wikidata, which you
           | can query yourself! No scraping necessary!
           | https://query.wikidata.org/
           | 
           | You'll want to learn SPARQL, but if you know SQL it's not so
           | bad to pick up.
        
             | crazygringo wrote:
             | As far as I can tell, that is not the case, sadly.
             | 
             | Right now it appears that only 3,975 articles have
             | infoboxes auto-generated from Wikidata. [1] The wikitext
             | contains something like "{{Wikidata Infobox ...}}" instead
             | of just "{{Infobox ...}}".
             | 
             | If you look up a popular article like Barack Obama [2],
             | it's just a traditional hand-edited infobox. In fact, one
             | of the first lines of data says "Vice President = Joe
             | Biden", while the Wikidata entry for Barack Obama [3]
             | doesn't reference Biden anywhere -- so not only is the
             | Wikipedia infobox not generated from Wikidata, but Wikidata
             | isn't pulling all the relevant info from Wikipedia either.
             | 
             | Back when I had been working on my project, I'd hoped
             | Wikidata could be a solution but it was far too incomplete
             | and information was regularly out of date. Perhaps
             | (hopefully) it's better now, but it's clearly not being
             | used to power infoboxes yet except in a tiny number of
             | cases. (Which actually complicates things more now, since
             | anybody parsing Wikipedia infoboxes now has to deal
             | separately with the 3,975 ones that grab from Wikidata,
             | since none of the actual data is copied over into the
             | wikitext...)
             | 
             | [1] https://en.wikipedia.org/wiki/Category:Articles_with_in
             | fobox...
             | 
             | [2] https://en.wikipedia.org/wiki/Barack_Obama
             | 
             | [3] https://www.wikidata.org/wiki/Q76
        
             | matkoniecz wrote:
             | Wikidata is not solution at all.
             | 
             | I recently run into the same kind of problem in Wikidata.
             | 
             | https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Ont
             | o...
             | 
             | typical problem is of "light rail (Q1268865) is data
             | visualization (Q6504956)" kind - this specific is fixed,
             | but there are many similar
             | 
             | https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive
             | /...
             | 
             | https://www.wikidata.org/wiki/Wikidata:Project_chat#Ontolog
             | y...
        
         | aaroninsf wrote:
         | An exactly analogous problem exists in the Collections
         | hierarchy at the Internet Archive, of uploaded/digitized
         | material (not the Wayback Machine web captures).
         | 
         | A single graph is applied locally with very different
         | semantics; and absent a distinct tagging systems, collection
         | membership is sometime used to mark material for treatment in
         | some way.
        
         | travisjungroth wrote:
         | The issue is that system has nodes and edges, but no concept of
         | distinct graphs. That leaves you trying to fit all notable
         | human knowledge onto a single graph, which is non-optimal.
         | Whether it's also a DAG, tree, or something else doesn't even
         | matter.
         | 
         | Ontologies are like languages. There is no _correct_ one. What
         | matters is how good a fit it is for the problem at hand and
         | that you're all using the same one! If half the people are
         | using Italian and half Spanish, it's going to be a disaster. I
         | wouldn't use APL to write a UI and I wouldn't architect a
         | computer system in Shipibo.
         | 
         | Similarly, if I'm bird watching, "Birds of Northern California"
         | is very useful. Organizing them by genus is less useful to me
         | in that moment, but it's not _wrong_.
        
           | [deleted]
        
           | moonchild wrote:
           | I don't think you necessarily need multiple graphs; just
           | labeled edges.
        
             | travisjungroth wrote:
             | You just need some way to interact with it as multiple
             | graphs. Some variation of labeled edges is probably the
             | best.
        
           | ok_dad wrote:
           | Isn't this literally just saying we need another layer of
           | categorization on top of the categorization layer?
        
             | all2 wrote:
             | Perhaps "adjacent to" rather than "on top of"? I've started
             | looking at this kind of problem in terms of DB queries or
             | set relations. Even "organization" can be a set relation if
             | there are the right bits of metadata in place.
        
             | travisjungroth wrote:
             | It's saying you need support for multiple types of
             | categories. You could use the same system to organize
             | itself. No need for a meta layer.
        
         | matkoniecz wrote:
         | I recently run into the same kind of problem in Wikidata.
         | 
         | https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Onto...
         | 
         | typical problem is of "light rail (Q1268865) is data
         | visualization (Q6504956)" kind - this specific is fixed, but
         | there are many similar
         | 
         | https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/...
         | 
         | https://www.wikidata.org/wiki/Wikidata:Project_chat#Ontology...
        
         | pessimizer wrote:
         | > I'm sure there are sophisticated ontological systems which
         | would allow users to specify all those different relationships
         | separately. I'm also pretty sure that users would become sloppy
         | after a short time or would disagree which particular
         | relationship to use in a particular situation
         | 
         | I think the problem is allowing users to freely tag, then.
         | There should be easily accessed guidelines about how each tag
         | should be used, and people who are constantly moving them,
         | correcting them, and updating usage guidelines.
         | 
         | We need the ability to implement governance systems on top of
         | web 2.0+ style content systems. People should be able to vote
         | for representatives (with any number of voting systems), create
         | committees, submit changes to be voted on, etc. Instead we
         | usually work based on hierarchical dictatorships or imagined
         | consensus. People need organizational management tools baked
         | into software, because organization of information depends on
         | it. Instead of proposing a new committee to come up with the
         | schema of everything, better tools that enable users to build
         | committees.
        
           | jrumbut wrote:
           | The fundamental tension in tagging systems, to me, is whether
           | tagging is a feature the software offers to the user or a
           | task the user performs to assist the software.
           | 
           | In the first case, you want freewheeling and tolerate
           | ontological inconsistencies because you want to offer
           | flexibility to users and will capture hard to quantify
           | emergent benefits (some made up examples: "try the tag
           | user233-favorite, I keep discovering awesome articles!", "the
           | physicist-needed tag has highlighted a lot of misinformation
           | surrounding quantum physics and relativity"). People use it
           | to the extent it is useful.
           | 
           | The other way, with formal semantics, governance (which you
           | made some very wise points about), etc allows the software to
           | reply to queries like "19th-century + Missouri + humorists"
           | in a performant and authoritative way. It's not really a
           | feature so much as it is a way to enable other features.
        
         | skyde wrote:
         | ("Occupations" -> "Writers") seem wrong why would you do this?
         | same for ("Categories with more than 100 entries" ->
         | "Writers").
         | 
         | This seems like trying to put tag on category entity instead of
         | creating a tag hierarchy.
         | 
         | Those 2 should be stored using different relationship type
         | mechanisms.
         | 
         | ("categoryTag", <SourceTag>, <DestinationTag>)
         | 
         | ex: ("categoryTag", "Occupations", "Writers")
         | 
         | and
         | 
         | ("parentTag",<tagName1>, <tagName2>)
         | 
         | ex: (("parentTag", "American writers" , "19th century American
         | writers")
        
           | xg15 wrote:
           | Indeed. It's a bit like if a programming language was trying
           | to represent base classes and meta classes using the same
           | mechanism.
           | 
           | My guess is that no one realized the need for "meta"
           | categories when the system was implemented, so later the
           | existing hierarchy was simply co-opted instead of
           | implementing a new functionality for that use case.
           | 
           | As long as the categories are only used by human editors and
           | use is only within some small subcommunity, it can work quite
           | well. The problem starts if you want to combine categories
           | used by different communities or if you (or your program)
           | lack the domain knowledge to understand which nodes represent
           | "meta" categories.
           | 
           | As another poster said, the better approach to use Wikipedia
           | data for automated processing is using infoboxea or the
           | explicitly machine-readable Wikidata repository. The category
           | system looks machine-readable on first glance but really
           | isn't.
        
         | layer8 wrote:
         | There are only two kinds of relation here, "subset of" and
         | "instance of" (aka "element of", type-token).
         | 
         | The category-category relations are intended to always be a
         | subset relation. The article-category relations are intended to
         | always be an instance-of relation.
         | 
         | - "19th century American writers" is a _subset_ of  "American
         | writers".
         | 
         | => Both are a category, so no problem.
         | 
         | - "Novelists" is a _subset_ of "Writers".
         | 
         | => Both are a category, so no problem.
         | 
         | - "Writer(s)" is an _instance_ of "Occupation".
         | 
         | => Here the problem is that "Writers" is a category. It would
         | be okay if it was an article "Writer (occupation)".
         | 
         | - "Writers" is an _instance_ of "Categories with more than 100
         | entries ".
         | 
         | => Here, again, the problem is that "Writers" is a category,
         | and having an instance-of relation between categories is not an
         | intended/supported use-case.
         | 
         | This could conceivably be solved by supporting an instance-of
         | relation between categories, in addition to the existing subset
         | (subcategory) relation. It could be called a meta-category
         | relation. Then you could have the category of occupation
         | categories.
         | 
         | Another way to put this is that categories have to be typed: a
         | category contains either (just) articles, or it contains (just)
         | categories. Subcategories then must match the type of their
         | supercategories and correspondingly must contain either
         | articles or categories.
         | 
         | Basically, Wikipedia's type system is not expressive enough to
         | allow everything people would want to express in it.
        
         | laszlokorte wrote:
         | Clearly the solution to all of this would be the category of
         | all those categories that do not contain themself.
        
         | maratc wrote:
         | The problem might not be with hierarchical tagging systems, but
         | with the specific hierarchical tagging system they use at
         | Wikipedia.
         | 
         | Imagine another system with the following categories:
         | 
         | * People:ByOccupation:Creative:Writers
         | 
         | * Time:CommonEra:ByCentury:19
         | 
         | * Location:Earth:Americas:NorthAmerica:USA
         | 
         | In this scheme of things, e.g. Mark Twain would be tagged with
         | all three. "19th century American writers" (which includes Mark
         | Twain) would not be a _category_ but _a saved search_. (Other
         | saved searches -- which would also include Mark Twain -- would
         | be  "19th century people from Americas" or "Stuff from Planet
         | Earth").
        
       | errantmind wrote:
       | Instagram's tagging system was actually really effective at
       | categorizing content and discovery because each hashtag was
       | treated as a node in a (giant) graph, where each node has
       | multiple properties, including post count (number of posts using
       | a tag), 'velocity' (number of posts using a particular tag per
       | unit time), etc. I could write up a big post about it as I made a
       | study of it in when I created a web app for finding the most
       | relevant tags a few years ago.
       | 
       | All that to say there was a lot to their system and it worked
       | because users became aware that they were rewarded for using the
       | most relevant tags. Using irrelevant tags was punished. This
       | guided users towards using a mix of relevant popular and niche
       | tags to maximize their reach, which, in turn, further improved
       | the tagging system.
       | 
       | Instagram's tagging system isn't as important anymore as their
       | algorithm has deemphasized it, in favor of other methods for
       | classification and discovery, but there were a couple of golden
       | years where it worked very well. Most users still look back on
       | those years as the 'good times' even if they don't know exactly
       | why. I'd go so far as to say they ruined the app after they
       | deemphasized tags (and added way too many ads)
        
         | leksak wrote:
         | Please write that big post! Sounds interesting
        
       | PaulHoule wrote:
       | People look forward to a visit with the ontologist they way they
       | do a visit with the orthodontist.
        
       | [deleted]
        
       | at_a_remove wrote:
       | This seems like one of those Eternal Problems that people,
       | whether librarians, programmers, or hobbyists, stumble across,
       | think they'll make headway in, then discover that they've really
       | managed to progress just a few feet across a vast and hostile
       | surface of landmines, pitfalls, and lures. Each "obvious" step
       | (I'll have parent relations to define a context!) is only yet
       | another bargain with the Devil, who laughs at your precautions.
        
         | MisterBastahrd wrote:
         | I guess if you're really focused on it. I built a content
         | tagging system for an old employer that would attempt to guess
         | context based on keywords and associations but give the writer
         | of the content the final say in what's actually being tagged.
         | 
         | Sure, I could have spent a thousand hours refining it, but the
         | improvement would have been marginal and it still would need
         | human interaction.
        
           | csours wrote:
           | Was it used for content related to that particular business?
           | I think as long as you have _relatively_ limited variety, you
           | can make something that works well enough.
        
             | at_a_remove wrote:
             | Similarly, I think if you have a limited number of people
             | doing the classification, you can also make a good shot at
             | it.
        
         | eastbound wrote:
         | Tagging's pain is that it's a problem that is easy enough where
         | you can come up with plenty of ideas without prior knowledge.
         | Its bane is that it is, in this sense, similar to bikeshedding.
         | Everyone can have an opinion about it; Fortunately, it's only
         | appealing to people who enjoy exploring problems.
        
       | contextfree wrote:
       | "Advice: don't let the tag predicates refer to other tags"
       | 
       | But then how would I search by the tag of all tags that do not
       | tag themselves???
        
       | ok_dad wrote:
       | I like how they worked out an advanced tagging system's
       | requirements from a ~dozen tweets, starting with the most basic
       | tagging system and working up through a tag hierarchy to a tree
       | to a DAG, then even talks about K/V tags and etc.
        
       | somat wrote:
       | My (Chomskyish)hierarchy of tag systems goes something like.
       | 
       | tagged data
       | 
       | key=value tagged data
       | 
       | hierarchically tagged data (we just found the the unix
       | filesystem!)
       | 
       | hierarchical key = value tagged data (oh damn, it's ldap, we dug
       | too deep.)
        
       | micromacrofoot wrote:
       | This reminds me of a talk from Clay Shirky about categorization
       | and general ontology. It's interesting to read in hindsight,
       | because it's from when recommendation algorithms were in their
       | infancy.
       | 
       | Warning PDF:
       | https://ia800203.us.archive.org/10/items/Ontology_is_Overrat...
       | 
       | > This is what we're starting to see with del.icio.us, with
       | Flickr, with systems that are allowing for and aggregating tags.
       | The signal benefit of these systems is that they don't recreate
       | the structured, hierarchical categorization so often forced onto
       | us by our physical systems. Instead, we're dealing with a
       | significant break -- by letting users tag URLs and then
       | aggregating those tags, we're going to be able to build alternate
       | organizational systems, systems that, like the Web itself, do a
       | better job of letting individuals create value for one another,
       | often without realizing it.
        
         | dadadad100 wrote:
         | Thank you for this link. I've been looking for a good
         | discussion of the browse vs search argument and this is very,
         | very good
        
       | joshu wrote:
       | there's a massive difference between tagging-for-self-recall and
       | tagging-for-other-recall. when i invented tagging the first was
       | paramount, but the latter has become dominant and has very
       | different design considerations
       | 
       | one interesting note: you can infer a bunch of hierarchical
       | information since people frequently tag from broader to more
       | specific, topicwise.
       | 
       | some things can be tagged by multiple people and you can thus
       | infer synonyms as well. this can thus be fixed in search.
        
         | milesskorpen wrote:
         | "When I invited tagging" is such a flex. But creating delicious
         | gives you some credible claims there.
        
           | joshu wrote:
           | I don't get to use it much these days
        
       | redbar0n wrote:
       | A very insightful thread by Hillel Wayne on content tagging
       | systems and their challenges.
       | 
       | Their ubiquitous use (in library and information sciences, and
       | popular social networks like Instagram, Twitter, and Pinterest),
       | their deceptive ease of implementation, and "obvious advantages"
       | over hierarchies/folders, means that almost every developer has
       | (or will) run into them at one point or another..
       | 
       | Feel free to comment with good theory and case studies on tagging
       | systems. (It's especially interesting with good case studies for
       | how to model an advanced tag system in a graph database).
        
         | lmkg wrote:
         | > It's especially interesting with good case studies for how to
         | model an advanced tag system in a graph database
         | 
         | I wouldn't accuse it of being a _good_ tag system, nor a true
         | graph database, but one thing to look at is Semantic MediaWiki.
         | It 's a MediaWiki extension which takes Categories as a
         | starting point, and extends it quite far with e.g. relations
         | and key-value pairs.
         | 
         | One interesting feature of Semantic MediaWiki is called
         | "Concepts" which are essentially "computed tags." They can be
         | used in place of Categories in most places, but while
         | Categories are set by editors on a piece of content, Concepts
         | are defined by a query against Categories or other properties.
         | This can help bridge gaps between different _types_ of tags
         | that represent different ways of thinking about the content.
        
       | tra3 wrote:
       | I've been dabbling in personal knowledge bases for a long time
       | now. I remember the when I discovered tags -- thought it was the
       | best thing ever. The first good implementation in the wild (for
       | me) was del.icio.us. Eventually I ran into all the problems that
       | the linked thread describes. "Movie" or "movies"? "Book" or
       | "books"?
       | 
       | In any case, I still think flat tag lists are better than a
       | directory tree structure ("Content/Movies" vs "movies, movie,
       | entertainment, science fiction, space travel, aliens").
       | 
       | A recent innovation that I'm enjoying is backlinks. I believe
       | roam research was the first major player that showed you related
       | entries via the links that you included, even though a similar
       | concept existed forever. Then you can generate clouds of
       | relationships and find concepts visually [0].
       | 
       | 0: https://noduslabs.com/cases/visualize-connections-notes-
       | roam...
        
         | jazzyjackson wrote:
         | > backlinks
         | 
         | > recent innovation
         | 
         | Ted Nelson is rolling in his Xanadu
        
           | tra3 wrote:
           | 100% there was prior art to this, I was thinking
           | zettelkasten. Didn't know about Xanadu though!
        
       | NWoodsman wrote:
       | In my app, users apply a set of tags to a note, but then the app
       | automatically creates hierarchical associations in a tree. There
       | are an exponential number of associations between tags (At one
       | point design was failing because it was trying to prebuild 100k+
       | GUI items for these cross-referenced tags) so I had to virtualize
       | the intersection of tags at the exact moment a user expands a
       | tree item.
       | 
       | You cannot plan what tag search will lead you back to the data
       | you want, so every node in the graph must be bidirectional.
        
       | emj wrote:
       | Openstreetmap is map data that is basically coordinates with tags
       | on them and relations between those tags. I guess this is true
       | for most GIS software but there is very little 2D map data that
       | can not be described in the OSM tagging model.
       | 
       | You can never express everything with tags, you need stats and
       | metadata on metadata, documentation and a strong heterogeneity
       | which also need to be able to adapt to new ideas.
       | 
       | https://wiki.openstreetmap.org/wiki/Tags
       | https://wiki.openstreetmap.org/wiki/Map_features
        
         | matkoniecz wrote:
         | https://wiki.openstreetmap.org/wiki/Tagging_mailing_list (
         | https://lists.openstreetmap.org/pipermail/tagging/ ) is a
         | fascinating, hilarious and interesting place.
         | 
         | Basically it is about an endless attempt to classify at least
         | part of reality, in organically growing worldwide project based
         | on bunch of passionate obsessive hobbyists with overly strong
         | opinions.
         | 
         | With bonus of bunch of politics, confusion and passion.
         | 
         | https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_AP...
         | is likely of interest.
        
       | philip1209 wrote:
       | We spent a lot of time building tagging systems to organize
       | technology skills on https://www.moonlightwork.com.
       | 
       | The coolest part was training a collaborative filter on the tags.
       | So, when you add "Django" as a skill, it could recommend "Python"
       | as a related skill. This made for some refined user experiences.
       | 
       | Getting typeahead search right took a lot of refinement. Here is
       | some of the logic we ended up implementing over time:
       | 
       | 1. Exact matches get prioritized first (e.g. "Go")
       | 
       | 2. Abbreviations support (e.g., "AWS" for "Amazon web services"
       | or "ROR" for "Ruby on Rails")
       | 
       | 3. Name that start with query should go before non-leading
       | matches (e.g., "Ru" should return "ruby" before "task runner")
       | 
       | 4. We tracked an "Aliases" column for each tag to enhance search.
       | So, "golang" was an alias for "go".
        
       | UltraViolence wrote:
        
       | dahdum wrote:
       | I adore tagging systems and have worked on them in several
       | different applications and implementations, but there are always
       | pitfalls and trade offs, and it's possible to bury yourself
       | 
       | Nowadays I nearly always store the assigned tags as an integer
       | array column in Postgres, then use the intarray extension to
       | handle the arbitrary boolean expression searches like
       | "((1|2)&(3)&(!5))". I still have a tags table that stores all the
       | metadata / hierarchy / rules, but for performance I don't use a
       | join table. This has solved most of my problems. Supertags just
       | expand to OR statements when I generate the expression.
       | Performance has been excellent even with large tables thanks to
       | pg indexing.
        
         | justinpage wrote:
         | Would you mind sharing a simple example that demonstrates this?
         | Sounds great!
        
         | srcreigh wrote:
         | Do you index arrays? What index type is that? Any tips?
         | 
         | I've used array column in PG before, haven't indexed arrays
         | though.
        
           | marcosdumay wrote:
           | AFAIK, postgres first got its reputation of high performance
           | because of array indexes.
           | 
           | People usually go with GIN indexes, that can be used on the
           | contains, overlaps or equals comparisons.
        
         | RussianCow wrote:
         | The tradeoff here is that you lose the foreign key constraint,
         | correct? So if you delete a tag, there is no way for the
         | database to automatically remove all references to it. Or is
         | there some way to do this now?
        
           | [deleted]
        
           | brianwawok wrote:
           | Right . More like nosql FKs.
           | 
           | How high is the business risk if you have a random tag with
           | no name? Skip it's display jn the UI
        
       | [deleted]
        
       | blueblob wrote:
       | A lot of the items described are problems in ontologies
        
         | edflsafoiewq wrote:
         | Yeah. A tag is a predicate. Sub-tags are implication (male
         | author => author). Tag aliases are equivalence (implication in
         | both directions).
        
       | roberthahn wrote:
       | I'm so happy to see people talk about this! I too am endlessly
       | fascinated with content tagging systems.
       | 
       | Hillel's thoughts are completely unsurprising to me so I guess
       | I've come to similar conclusions.
       | 
       | I do notice that we seem to care about different things though -
       | where Hillel appears to focus on tag types (and the
       | implementation challenges that go with that) I focus more on
       | human factors like what problem are we solving? for who? How do
       | we maintain relevance (and power) in tagging systems (and for
       | who?)
       | 
       | I'm of the opinion that tagging systems should not be made by the
       | few for the many but by each person for themselves. Which, of
       | course, sucks because that puts the onus on everyone who wants
       | tagged content to do their own work. But I believe the output of
       | that investment would be quite valuable and useful!
       | 
       | An easy example I could use might be recommendation engines.
       | Assume I have a database of tags (a tag cloud?), and I know you
       | have similar interests to me. If you also have a tag cloud, I
       | could input links to both of our tag clouds into a purpose-built
       | recommendation engine to discover new content I might not have
       | consumed yet.
        
         | jsemrau wrote:
         | > I could use might be recommendation engines. Assume I have a
         | database of tags (a tag cloud?), and I know you have similar
         | interests to me. If you also have a tag cloud
         | 
         | This was the first "naive" implementation on finclout. Every
         | post get automatically scanned for ranked keywords and then
         | matched with other known entities about the post. We also user
         | collect tags from the user and have users verify keyword
         | matches.
        
       | k__ wrote:
       | I don't know much about this topic.
       | 
       | The only thing I learned: if you think you have a taxonomy, then
       | you don't.
        
       | AtlasBarfed wrote:
       | Eh, the diamond problem and transitive issues don't exist because
       | what is being reduced to is simply a set and membership. if
       | expansions / aliases / synonyms / multi-membership produce
       | overlaps, who cares, it's a set of hashs. The overwrites only
       | represent wasted computation.
       | 
       | Really this is a simpler version of multiple inheritance. You
       | don't have the issue of conflicting method signatures and
       | implementations, only names.
       | 
       | The only danger is names meaning different things. You need your
       | tags to be relatively unique to the meaning.
        
       | cptcobalt wrote:
       | I can't wait for the author of this thread to discover the AO3
       | tagging system, which is, frankly, a masterpiece that
       | demonstrates how effective community management can lead to
       | _extremely good_ tagging and categorization, with very little
       | miscategorization.
       | 
       | https://www.wired.com/story/archive-of-our-own-fans-better-t...
       | 
       | https://archiveofourown.org/faq/tags
        
         | swyx wrote:
         | its literally the third tweet in his thread
        
         | account-5 wrote:
         | They mention it 4th post in the thread.
         | 
         | I never heard of it though, what's so good about it?
        
         | at_a_remove wrote:
         | The AO3 tagging system badly needs pruning. I hesitate to make
         | examples, as the specificity will serve as a "call out," but
         | quite a lot of authors throw in single-use, digressive tags as
         | some kind of commentary on their own work. Huge meandering
         | swaths of crap tags, and the people who make them ought to have
         | their permissions to create tags revoked.
        
           | PuppyTailWags wrote:
           | I kind of disagree with this. Tags are dual use in AO3,
           | specifically they serve as a way to find specific stories
           | with specific thematic or plot elements, but they
           | additionally serve as a free expression of the author because
           | its the author who chooses which if any tags they want to use
           | to describe their piece. When an author gets to decide the
           | categories of a work, the categorization also becomes an
           | expression.
           | 
           | Consider the flavor of "Dead Dove: Do Not Eat" tag, which
           | serves both as an author's expression of warning the reader
           | and also a category of fanfic that is expected to have
           | transgressive elements. Just tagging, idk, "child
           | endangerment" completely misses the point of "Dead Dove: Do
           | Not Eat" comparatively.
        
             | at_a_remove wrote:
             | I will paraphrase this to avoid a callout, but "no
             | regenerating limbs those arms are toast sorry QA despises
             | them" is _not a useful tag_. (This is a mild example, I 've
             | seen far worse)
             | 
             | First, it is a single-use tag. Tags are for _categories_ ,
             | not solo entries. Solo entries explode the tagspace to no
             | good end.
             | 
             | Second, that expression belongs in the summation of the
             | work, or just about anywhere else. Tags are for other
             | people to use to find similar works or for readers to look
             | for things based on their interests. Metadata is not for
             | artistic expression, unless you're one of those people who
             | believes that artists ought to be able to choose their own
             | Library of Congress call numbers and such, people who want
             | to include "elephant" in the metadata despite the work
             | having nothing to do with elephants.
        
               | PuppyTailWags wrote:
               | I think you're missing the point that, in AO3
               | specifically, tags are not solely metadata. Tags are also
               | artistic expression _in the context of AO3_. That 's the
               | thing. AO3 doesn't function like the Library of Congress,
               | and there are no librarians that are independently
               | assigning categories to fanfic. An author can choose to
               | opt out of tags entirely, and people cannot put tags on
               | other people's fanfic even if it's relevant and would
               | benefit that work's findability. The simple mechanism of
               | the author having sole control of what tags they want to
               | apply to the work causes the act of tagging to also serve
               | the purpose of artistic expression-- this results in
               | spontaneous tags going from single-use to culturally
               | known, such as "no beta we die like men", and therefore I
               | think arguably useful _but only in the context of AO3_.
        
               | Ajedi32 wrote:
               | > An author can choose to opt out of tags entirely, and
               | people cannot put tags on other people's fanfic even if
               | it's relevant and would benefit that work's findability
               | 
               | Curious about how this doesn't render the entire system
               | near-useless? In my experience with other sites with
               | user-generated content that allow tagging, this decision
               | always makes the whole system way worse, because the OP
               | alone is almost never going to be aware of all possible
               | tags that are applicable to whatever it is they posted,
               | and will instead just take the first 3-5 words that pop
               | into their head and stick those in the tags field. The
               | end result is a tagging system that barely works; you can
               | search for a tag but you'll miss tons of stuff, and you
               | can filter out a tag but you'll still see tons of stuff
               | in that category. And if you ever find a hyper-specific
               | tag you really enjoy it'll only have like 5 items in it
               | even if there are hundreds or thousands it could be
               | applicable to.
               | 
               | Don't get me wrong, the wiki-style approach of just
               | letting anyone edit tags has its own issues, but it does
               | at least result in tags on everything being at least
               | mostly complete, and actually useful for finding what you
               | want (or filtering out things you don't want).
        
               | PuppyTailWags wrote:
               | > Curious about how this doesn't render the entire system
               | near-useless? In my experience with other sites with
               | user-generated content that allow tagging, this decision
               | always makes the whole system way worse, because the OP
               | alone is almost never going to be aware of all possible
               | tags that are applicable to whatever it is they posted,
               | and will instead just take the first 3-5 words that pop
               | into their head and stick those in the tags field.
               | 
               | A few things makes this work brilliantly:
               | 
               | - authors are encouraged to tag as much as they want with
               | whatever they want
               | 
               | - tags have an autocompletion to help authors select tags
               | on keywords
               | 
               | - authors are prolific fanfic readers themselves and are
               | therefore usually extremely familiar with the tag system
               | 
               | - manual tag linking means searching for one tag will
               | also return results for all related or near-identical
               | tags, a linking which has an extremely high success rate
               | due to dedicated and extremely knowledgeable volunteers
               | 
               | This overall ends up being that authors use prolific
               | tags, and reuse prolific tags from others, and ultimately
               | search isn't strongly affected because the entire
               | readerbase is hyper-knowledgeable. Check out the
               | extremely specific fanfic-only "hanahaki disease" tag
               | description in ao3 and you'll quickly see that any
               | variety of related tags, with any level of
               | hyerspecificity(some tags have neither "hanahaki" nor
               | "disease"!), will appear searching for any of them,
               | including hanahaki disease in other languages!:
               | https://archiveofourown.org/tags/Hanahaki%20Disease
        
               | at_a_remove wrote:
               | Then tags in AO3 are just more of the text and not much
               | of a finding aid. You can't have both.
        
               | PuppyTailWags wrote:
               | Tags end up being an excellent finding aid due to the
               | strength of the community's tag linking, you see. So they
               | serve both purposes.
        
               | at_a_remove wrote:
               | "no regenerating limbs those arms are toast sorry QA
               | despises them" just isn't useful if I want to locate a
               | particular text, other than "I'm liable to get a Tumblr-
               | stink off of this crap."
               | 
               | And your defense of this is really ... _internal_ , as
               | in, this all looks like a lot of in-jokes to an outsider
               | who is new to AO3, or even new to a particular fandom. If
               | someone doesn't know the slang, the in-joke reference,
               | it's still unhelpful.
        
               | PuppyTailWags wrote:
               | > "no regenerating limbs those arms are toast sorry QA
               | despises them" just isn't useful if I want to locate a
               | particular text, other than "I'm liable to get a Tumblr-
               | stink off of this crap."
               | 
               | Yeah, but you're not looking for that tag, and that tag
               | wouldn't affect your search in any way. That's the thing.
               | You're approaching tags like they can only only ever be
               | used one way, and yes they _can be that, and also other
               | things that don 't affect your personal use_. So when you
               | search for your specific tag, all synonymous tags will
               | also appear, and all superfluous tags don't affect your
               | search. A one-off tag doesn't affect your ability to
               | search for multi-use tags.
               | 
               | EDIT: Additionally, the fact the tag exists has also
               | helpfully indicated to you that this is a fic you
               | probably don't want to read because of the author's
               | cultural hinting through their use of tags. You're
               | proving my point here-- the one-off tag doesn't affect
               | your ability to search for your specific fandom or
               | tropes, but also it allows you to pick flavors of fanfic
               | you want from that search because of your dislike of one-
               | off tags.
        
               | at_a_remove wrote:
               | You have it backward: I found the fic through other means
               | entirely and eventually dropped it. When I encountered it
               | again on AO3 (it was a cross-post), I said "Oh, look at
               | those horrible tags." It was notable in the fact that I
               | said "I need to keep this one handy the next time I end
               | up having yet another conversation with someone about how
               | much tagging sucks on AO3." Because this isn't the first
               | time someone has brought it up to me.
               | 
               | They just crap up the results if I am searching for
               | "regeneration" or "limbs." If something is used more than
               | one way, yes, it _does_ affect my personal use because it
               | means  "more stuff I have to filter through." When you
               | search, what you do not want is extraneous results.
               | That's the whole point of searching! And I guess my
               | library experience is showing, but AO3 just reeks of
               | amateur hour shenanigans. I predict that at some point
               | there will be a movement to clean up that kind of junk.
        
               | Tomte wrote:
               | > Tags are also artistic expression in the context of
               | AO3.
               | 
               | Seems to be similar on Tumblr.
        
         | rubinlinux wrote:
         | > The only system I know that does that is the fanfiction site
         | AO3, where teams of volunteers manually create aliases from,
         | say, "snarry" to "Harry/Snape"
         | 
         | They seem aware already.
        
         | taggingthrowaw wrote:
         | A taxonomy or hierarchical system sometimes also helps, eg. on
         | E621: https://e621.net/wiki_pages/23556 (NSFW if you scroll at
         | all or click anything).
        
       | endisneigh wrote:
       | Is there an optimal tagging system, performance wise? Seems like
       | there could be a database just for tagging.
        
       | jshandling wrote:
       | I set out building my first full-stack webapp [0] to make a
       | custom theme-based tagging/organizational system for musical
       | ideas. I did not initially realize all the hairy design choices
       | inherent in this domain, but have found it humbling and
       | educational.
       | 
       | Remaining features to be implemented include in-app audio
       | recording, editing, and custom labeling outside of the main tree
       | structured organizational system.
       | 
       | I'd appreciate any thoughts or suggestions if anyone cares to
       | take a look!
       | 
       | [0] https://www.soundseeker.app/
        
       | kortex wrote:
       | I think tag aliases are fine, but in my opinion, tags should not
       | have hierarchies. That is just opening the can of ontology worms,
       | and most systems are ill-equipped to deal with
       | ontologies...including ontological systems.
       | 
       | Tags are just dumb strings which label data. They are basically
       | KeyValues, where the value is just always equal to True. We don't
       | think of KVs as hierarchical unless they are explicitly a path
       | string, and in that case, they are forced to be a plain tree with
       | no cycles or diamonds.
        
         | bonaldi wrote:
         | Nothing you say is necessarily the case, and is dependent on
         | implementation. Take "value is just always equal to true",
         | well, no, not if your key is a predicate. "Color:red" is more
         | powerful than "#red" or "red:true", and "color:[lookup-ID-for-
         | red-concept]" is substantially more powerful than both.
        
         | pessimizer wrote:
         | Not having tag hierarchies doesn't fix the difficulty of
         | classification, it just handwaves it away. There will always
         | need to be (super)tags that are collections of other tags,
         | where it is a bug for an item that has a particular tag to not
         | also have another, related tag. The question should be _how_
         | you 're going to handle that, not if you're going to handle it,
         | or you'll end up with a lot of broken tags of dubious
         | usefulness.
         | 
         | Tags are just dumb strings that label data, but tags are also
         | data. If I can't label tag:"red" a tag:"colored" in your
         | system, it's not great. It's not much better if I'm labeling
         | things tag:"colored-red" because if I'm doing that and there's
         | no central validation to add semantics to that relationship,
         | I'm going to end up with tag:"red" things, tag:"colored"
         | things, tag:"colored-red" things, and probably even tag:"color-
         | red" and tag:"red-color" things.
         | 
         | edit: what's so bad about cycles when it comes to a tag being
         | assigned another tag that has been assigned the original tag?
         | It's just a mutual implication. There's nothing wrong to me
         | with adding a single tag and seeing five more added
         | automatically. It means that you're building a knowledge base.
        
         | zamubafoo wrote:
         | Optional forests of hierarchy trees are where it's at.
         | Essentially don't encode everything into one gigantic one.
         | 
         | Sometimes you know that users are going to tag `laptop` a bunch
         | and want that to also drag in `personal computer` (but not all
         | `PC`s are `laptop`s) or that `blue dress` is also a `dress` and
         | don't want to hard code special cases.
         | 
         | That said, if you are going to do this, then you must have it
         | controlled by an admin/moderator. Maybe allow for hierarchy
         | request submissions but have it moderated. There is at least
         | one public system where this just works to my knowledge and a
         | bunch of self-hosted ones as well.
        
         | goto11 wrote:
         | > They are basically KeyValues, where the value is just always
         | equal to True
         | 
         | That would be a set of values :-)
        
         | comfypotato wrote:
         | Org mode approaches this by making hierarchies and inheritance
         | optional. I personally like both, but I acknowledge (as was
         | mentioned in the tweets) that hierarchies can get to be very
         | convoluted if you don't work to maintain them sensibly.
        
           | AlanYx wrote:
           | What I like most about org mode tags is that regular
           | expressions can be subtags (or "members of a group tag" in
           | org mode lingo). So you can specify a hierarchy where the
           | parents have children you don't know in advance.
        
         | FpUser wrote:
         | >"I think tag aliases are fine, but in my opinion, tags should
         | not have hierarchies."
         | 
         | Many years ago I've developed a proprietary database for a
         | media related product. It was a NoSQL Entity-Attribute-Value
         | database where Attribute was basically a tag. Tags had no
         | hierarchy but query language allowed to specify sequence of
         | attributes like Genre, Artist, Album, Title. When said sequence
         | was not empty the result set would be a tree where each level
         | would correspond to an attribute position as defined in query.
        
         | NWoodsman wrote:
         | I understand your pain, but want to make you aware that LINQ
         | has become so powerful especially with lazy evaluation and
         | expression trees that hierarchical views of tags is really
         | basically simple and actually just one more method of
         | visualizing data...
        
       | PeterStuer wrote:
       | Look into AI systems from the 1960's and you will find Semantic
       | Networks. If you just need categories you can go with taxonomies
       | and folksonomies. If you want to (over?) formalize and describe
       | mainly non-agentive structure you look at ontologies.
        
       | aaviator42 wrote:
       | A few months ago I worked on some proof-of-concept code for
       | searching tagged data: https://github.com/aaviator42/Cha
       | 
       | I now work full-time in a role where part of my duties is
       | designing a content tagging system and its search
       | functionalities. It's very interesting and fun! Lots of puzzles.
       | 
       | How do you weigh different tags? How do you do fuzzy searching
       | ('city' should match with plural ('cities'), misspellings
       | ('citys'), etc)?
       | 
       | How do you program the system so that 'hotdog' is not matched
       | with 'hot' and 'dog'? What about synonyms? What about regional
       | terminology and synonym tables?
       | 
       | Then there's one-to-one and one-to-many and many-to-one mapping.
       | 
       | As a side project I'm also working on a beta public search engine
       | that I'll launch on HN sometime in the next year or so, where I'm
       | having similar puzzles.
        
         | cube2222 wrote:
         | > How do you program the system so that 'hotdog' is not matched
         | with 'hot' and 'dog'?
         | 
         | That sounds like a very good use case for word embeddings.
        
           | bell-cot wrote:
           | How do you deal with "hotdog" possibly being a noun (several
           | meanings), or proper noun (several meanings), or verb, or
           | interjection?
        
             | dmonitor wrote:
             | e621 frequently has to deal with characters with the same
             | name, or an artist with the same name as a character. they
             | just make ambiguous tags have a special syntax. so if bob
             | was an artist, but also had a character named bob, it would
             | just be bob_(bob) for the character and bob_(artist) for
             | the artist. and if someone tried to tag something as just
             | "bob" they would be told to be more specific. searching for
             | all bobs can be done with bob_(*).
             | 
             | so hotdog could have hotdog_(food), hotdog_(interjection),
             | and hot dogs (the animal) would be two tags: hot and dog.
             | 
             | it's not the cleanest solution, but it works well enough.
        
       | Tomte wrote:
       | Also great on the topic of tagging, with more information about
       | the AO3 scheme:
       | https://idlewords.com/talks/fan_is_a_tool_using_animal.htm
        
       | [deleted]
        
       | qwerty456127 wrote:
       | This is crazily sad non-invasive (without embedding into the file
       | body) tagging is not standardized across OSes and file systems.
       | The only system to support tags I know is KDE/Dolphin/Baloo,
       | outside KDE tagging seemingly is supported only by a handful of
       | incompatible 3-rd party apps.
       | 
       | Sadly I don't expect much progress to happen in this area. Almost
       | nobody cares about storing and organizing of files locally
       | nowadays.
       | 
       | I hope it is going to be done some day or later (there isn't much
       | to do: just standardize some xattrs and something like RDF schema
       | to be used in an alternative FS stream + add support for these to
       | the standard file management and search tools, this is orders of
       | magnitude easier than implementing a new FS) but probably not
       | soon - it would be a huge luck to get any resources allocated to
       | this.
        
       | terpimost wrote:
       | I was interested in that too. I stopped when as soon as I
       | realized that any good search in tagging system would be just a
       | full text search. E-commerce catalogs have detailed filters but I
       | think people use maximum 2 properties in addition to simple name
       | input search
        
       | wtf77 wrote:
       | I am endlessly fascinated by how twitter has now become a dumping
       | ground for complex topics that are difficult to read and follow.
       | But what happened to the old blogs?
        
         | throw10920 wrote:
         | Nothing has happened to them. I have a few hundred distinct
         | bookmarked blogs, if not over a thousand, and obviously my
         | bookmark collection is a tiny fraction of what actually exists.
         | They're still there.
        
         | hwayne wrote:
         | I have a really high standard for my blog posts. They go
         | through several rounds of rewrites, with feedback from friends,
         | before I'm happy with them. That plus the length (median ~2000
         | words) means that most of my blog posts take weeks or months to
         | write. I can hammer out a tweetstorm in 20 minutes.
         | 
         | (Also, tweets are a fun format! I want each tweet to be a
         | complete idea, which is hard when you have only 280
         | characters.)
        
         | [deleted]
        
         | dymk wrote:
         | It's lower effort to make a stream of consciousness post one
         | sentence at a time, and as a bonus, there's a built in audience
         | / discovery network where they're posting.
        
           | tuatoru wrote:
           | Lower effort for whom? Back when I were a lad, we were told
           | to write so that our readers did not have to work to
           | understand us. The point of writing is to be understood. Old
           | man yells at cloud.
        
             | tqi wrote:
             | I think it's helpful to keep in mind that with most of
             | examples that get shared around, the choice for the author
             | was not a string of tweets vs blog post, but rather a
             | string of tweets vs not sharing at all.
        
             | dymk wrote:
             | Lower effort to the writer, obviously.
             | 
             | The point of posting on Twitter is not to be understood,
             | it's to be retweeted.
        
           | dylan604 wrote:
           | How does one go back and edit a stream of consciousness like
           | that into an actual coherent thought later though?
           | 
           | I was just having a conversation similar to this where it was
           | explained "this is just how people my age do things". While
           | attempting to avoid boomer/millennial tropes, this does make
           | me wonder how much different schooling is now vs then (hoping
           | to avoid those memes too).
           | 
           | I was always getting in trouble for just saying whatever came
           | to mind vs slowing down to think if it really needed to be
           | said or more specifically _how_ it was said.
        
       | labrador wrote:
       | I am too but I've given up. I've collected a lot data over the
       | years and spent a lot of time trying to organize it so I can find
       | relevant connections. It's just too time consuming. I've decided
       | discerning relationships in unstructured data is where I want to
       | focus.
        
       | cpsns wrote:
       | I've written a tagging system from scratch for an existing system
       | and it was one of the most interesting things I've worked out. I
       | had total control over how it was implemented and I _think_ I
       | came up with a really nice, minimalist, scalable way to tag
       | things, and to search them.
        
         | rambambram wrote:
         | Care to elaborate? I'm also working on a categorization/tagging
         | system - albeit a simple one - and I find myself in a struggle
         | to keep it accessible enough to use on one hand and advanced
         | enough to actually add value on the other hand.
        
       | throwaway920102 wrote:
       | Empornium aka luminance has a great tagging system.
        
       | aaws11 wrote:
       | https://threadreaderapp.com/thread/1534301374166474752.html
        
       | ggm wrote:
       | Approximate date is the bugbear of photo tagging. EXIF and Dublin
       | core and vendors can't agree what to do. Camera manufacturers
       | don't care because at time of shot, date is fixed. It's archival,
       | scanned and copied predigital work.
        
       | dekervin wrote:
       | I hacked together a small extension to tag hacker news stories. A
       | small presentation here,
       | 
       | https://datum.alwaysdata.net/static/extension/index.html
       | 
       | With the js files for the extension.
       | 
       | The motivation to finish it partly came from this hn thread.
       | https://news.ycombinator.com/item?id=32970560
        
       | xcskier56 wrote:
       | The hierarchical nature of the information he's talking about
       | really reminds me of the ontologies and terminologies that are
       | used in healthcare to organize medical information. E.g.
       | Ibuprofen 10mg Tab < Ibuprofen < NSAID < ... < Therapeutic
       | Chemical.
       | 
       | This is a field that I'm only tertiary familiar with but it's a
       | fascinating discipline trying to group, and manage all of the
       | different categories of healthcare data. You can use the RxNav
       | tool to look at the RxNorm terminology which is only 1 of many
       | terminology systems.
       | 
       | https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchT...
        
       | turnsout wrote:
       | This is the reason the Semantic Web never took off--people on the
       | internet can't even agree on what a "sandwich" is, let alone the
       | exact hierarchy of ontology.
       | 
       | This is an area where large language models have a role to play--
       | whatever you're hoping to achieve with user-generated tags can
       | probably be achieved with ML-powered associations or navigation.
       | And the potential benefit is that it could be tailored to each
       | user--so you're only surfacing "Hot Dogs" when certain users
       | click "Sandwich."
        
         | CabSauce wrote:
         | This is what we do for the most part. Two tiers of 'tags'. One
         | is curated and required, the other is an embedding.
        
         | CobrastanJorji wrote:
         | I thought that the cube rule of food generally settled the
         | sandwich debate. A hot dog is not a type of sandwich, being
         | surrounded on three sides. Instead, it is a type of taco.
        
           | turnsout wrote:
           | Haha yes, and I would hate to meet the Turing-complete
           | tagging system that could capture this nuance!
        
       | swyx wrote:
       | we have a big tagging problem where i work and yesterday I tried
       | using gpt3 to assist. worked well!
       | 
       | code and context:
       | https://github.com/airbytehq/airbyte/issues/17893
        
       | raffraffraff wrote:
       | I'd love to know what those prolific Spotify engineers think of
       | this.
       | 
       | That was a joke because Spotify doesn't let you tag music.
        
       | acchow wrote:
       | Sounds like they are trying to embed the search semantics in the
       | data storage. Why not treat search as a distinct problem?
        
       | taylorbuley wrote:
       | Pro tip: use stemming!
        
       | robg wrote:
       | Surprised no one has nailed a use case for semantic tags and
       | their associations. Python and snake doesn't require hierarchies
       | to differentiate from Python and coding. Why aren't co-
       | occurrences within and between content samples enough?
        
       | feoren wrote:
       | I'm surprised I haven't seen more discussion of how tags are an
       | entry point into plain-old data architecture. It should be
       | obvious that by the time you're using tags for queries like
       | "start-date: BEFORE 2022-03-01", you've created an inner-platform
       | where you're building a plain-old relational database on top of
       | your tags. Stop what you're doing and elevate "start date" out of
       | tag-land and into a more structured representation with more
       | application support.
       | 
       | Many enterprise databases add a memo field called "Comments" to
       | almost every table. Clients very often end up coming up with
       | their own guidelines about how to embed various information in
       | the comments fields that the primary structure is missing.
       | Looking over how clients are using the "comments" fields is a
       | great way to discover new things that should be formally
       | incorporated into the structure of your data architecture.
       | Similarly with tags.
       | 
       | Look at tags as a starting point for adding a bit of loose
       | structure to the frontiers of your data architecture. Mix them in
       | with more structured data architecture. Be ready to "graduate"
       | tags up to the next level of structure when it becomes
       | appropriate. Stop worrying about how to make tagging perfect and
       | embrace it for what it is: an easy way to get started on modeling
       | the parts of the domain that you haven't spent a long time
       | thinking about yet. A good way to understand how users want to
       | use your system. Something you're always revisiting, cleaning up,
       | and using as a source of inspiration. If you see some tags
       | getting out of hand, don't try to improve your tagging system;
       | instead take what those tags are trying to represent and add more
       | structured fields and queries for them. This pipeline of less to
       | more structure should be constantly playing out in a healthy,
       | evolving system.
        
       | photochemsyn wrote:
       | One area that's illuminating is the effort to annotate the
       | results of whole-genome sequencing projects. Tagging stretches of
       | the genome which represent coherent units of some sort, and then
       | relating them to some functional capability of the organism, is
       | not at all a solved problem.
       | 
       | Here's an overview from 2011 where they're struggling to even get
       | a good tagging system up for single-celled microorganisms (a much
       | easier problem than multicellar genomes like humans):
       | 
       | https://pubmed.ncbi.nlm.nih.gov/22180819/
       | 
       | > "Highlights include the development of annotation assessment
       | tools, community acceptance of protein naming standards,
       | comparison of annotation resources to provide consistent
       | annotation, and improved tracking of the evidence used to
       | generate a particular annotation. The development of a set of
       | minimal standards, including the requirement for annotated
       | complete prokaryotic genomes to contain a full set of ribosomal
       | RNAs, transfer RNAs, and proteins encoding core conserved
       | functions, is an historic milestone."
        
       ___________________________________________________________________
       (page generated 2022-10-18 23:00 UTC)