[HN Gopher] I am endlessly fascinated with content tagging systems
___________________________________________________________________
I am endlessly fascinated with content tagging systems
Author : redbar0n
Score : 299 points
Date : 2022-10-18 15:07 UTC (7 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| fleddr wrote:
| As many commenters have mentioned (as does the article)
| hierarchical tags are a pain, if not an impossibility to get
| right. Related tags, though, can be done on the cheap and are
| surprisingly powerful, fun and cool under the right conditions.
|
| Say you have a massive database of photos, each photo having
| tags. As example we'll use the tag "United States", which is used
| as a tag on 50,000 photos. Next, you go over each of those 50,000
| photos and check which other tags were used, and sort them by
| occurrence.
|
| This reveals useful and often surprising implicit relations
| between tags. The relation can be of any type, hierarchical or
| otherwise. It reveals relations never explicitly mapped or
| maintained. It's organic, which kind of fits the philosophy of
| tagging.
| VectorLock wrote:
| One example of an unexpectedly rich and deep tagging ontology is
| the Danbooru "Anime" image board [NSFW]
| https://danbooru.donmai.us/
| TazeTSchnitzel wrote:
| There is a safe-for-work, or at least _safer_ -for-work version
| of the site: https://safebooru.donmai.us/
|
| (It is of course based on the tagging system: every post is
| tagged by its "safeness" level.)
| system2 wrote:
| I know this is not reddit. But why do you know even know this
| link and its tagging system...
| VectorLock wrote:
| I'm not scared away by things that might offend the
| puritanically inclined and I'm interested in ontologies and
| this is a fascinating one.
|
| There was some drama about someone training a Stable-
| Diffusion-alike by ripping their dataset that brought it to
| my attention.
| thrdbndndn wrote:
| Danbooru is one of the most popular anime image board.
|
| Anyone who's into Anime (not just for hentai) probably knows.
| thrdbndndn wrote:
| Yeah, danbooru or similar image boards basically have all the
| things talked in this tweet thread.
|
| They have tag aliases, meta-tags and so-called "tag
| implications".
|
| The last one is basically sub-tags but with more flexibility
| and dead simple to implement: if A implicates B, then tagging
| an image with A will automatically tag it with B. So you can
| tag "American Male Novelist", and then the system will
| automatically add "American", "Male", "Novelist", "Writer",
| etc. (after such implications were added).
|
| It much easier than Wikipedia's categories, but Wikipedia's way
| is of course intentional because categories is meant to have a
| stronger hierarchy than mere tags.
| subpar wrote:
| I've done this professionally in a couple different settings,
| from building topic classifiers for news events (it is sometimes
| hard to know when one news event should stop and another start)
| to creating tagging systems for audio recordings of group
| conversations (where topics often merge in and out of each other,
| often within a single sentence).
|
| I'm currently working on classifying non-speech, non-musical
| sound and it can be useful to piggyback on an existing knowledge
| system, though they tend to be industry-specific. As an example,
| Google's ontology for sound identification [1] is a nice starting
| point for general classification, whereas the taxonomy [2] used
| by the audio post-production industry (sound effects, foley, etc)
| is structurally quite different (which isn't surprising, but it
| sure is fun!). From a totally different field (electro-acoustic
| composition), the work of Michel Chion and Pierre Schaeffer [3]
| add psychoacoustic elements to more traditional measurable
| characteristics, i.e. how the sound is perceived and comprehended
| is just as important as its medium of travel and its source. It
| is helpful to see what others have done before you so you can
| pick and choose elements of their work to incorporate into your
| own.
|
| 1: https://github.com/audioset/ontology
|
| 2: https://docs.google.com/spreadsheets/d/1b2UhKpcOAE-
| jd1edOsxC...
|
| 3: [big pdf!]
| https://monoskop.org/images/0/01/Chion_Michel_Guide_To_Sound...
| polote wrote:
| A big miss on the list, is that words (so a tag) do not mean the
| same things for each people and do not even mean the same things
| in different contexts
| heliophobicdude wrote:
| My similar issue is with names in source code.
|
| Fuzzy matching names and interrogating the contributor about the
| changes being checked in. Questions to ask the contributor, are
| the names similar to any of these other names? Is there an
| opportunity to use the same name or are they different concepts?
|
| Code grows and grows and becomes harder to grep if inconsistently
| naming things.
| pessimizer wrote:
| > It gets even more complex if tags can have multiple parents,
| like Wikipedia categories. "American Male Novelists" is a subtag
| of "American Male Writers" and "American Novelists". Now we have
| diamond problems, redundancy, a whole host of other edge cases.
|
| I don't understand this problem. I would think that you would
| have
|
| tag:american
|
| tag:male
|
| tag:novelist
|
| tag:writer,
|
| and tag:novelist would itself be tagged as tag:writer, because
| all novelists are writers.
| openfuture wrote:
| Why twitter man.. these questions are clearly important but there
| is a space to discuss them
| https://matrix.to/#/#datalisp:matrix.org
| googlryas wrote:
| I had to click like 5 links from that link in order to get to a
| site which requires me to sign in before allowing me to see the
| content. I still have no idea what I'm supposed to be seeing.
| And no idea what the connection between "datalisp" and content
| tagging systems is.
|
| Maybe that's why twitter man?
| hoherd wrote:
| Seriously. I'm not a twitter fan, but even so, it's a short-
| form medium. Why do people abuse it like this, especially with
| great content? What's so bad about tweeting a link to a blog?
|
| Anyhow, I use threadereaderapp to get through the frustrating
| twitter UI and the ways that it is abused:
| https://threadreaderapp.com/thread/1534301374166474752.html
| modriano wrote:
| > Why do people abuse it like this, especially with great
| content?
|
| Probably to get a wider audience to actually read and engage
| with the ideas, and to crowdsource relevant information from
| said audience.
|
| > What's so bad about tweeting a link to a blog?
|
| Probably an 80%+ reduction (total guess) in the number of
| people who engage directly with the content and author.
| asdff wrote:
| I don't know how people deal with tags. It adds so much friction
| to me. Naming tags, deciding what rules this tag is supposed to
| have, deciding what stuff is tagged. I tried the firm approach of
| being extremely discrete with tags and it took a lot of effort,
| and I've tried the loose approach of tagging things if they are
| even slightly related which imo defeated the whole purpose of
| organizing things to make it easy to find them later if a lot of
| tangentially related things share the same tags.
|
| Folders seem a lot more straightforward for me at least, and if I
| need something in two places at once, there's always ln -s
| PaulHoule wrote:
| Maybe it's the project I am working on but right now I see the
| ideal search interface to be something like an OWL class axiom,
| that is, I am searching for instances of a class that has the
| following restrictions * subclass of Actor
| * subclass of Singer * has been in at least 7 movies
| * was born after December 3, 1980 * has been married to at
| most 3 other people
|
| these can be intersected, unioned, complemented, etc.
| somat wrote:
| It sounds like what you want is SQL.
|
| There is no good solution for the cultural problem that a
| written language is somehow unsuitable for end users. but
| personally I have spent way too many hours trying to make a
| search interface only to realize at the end that not only is my
| interface complicated and hard to use it still has only a
| fraction of the descriptive power a sql query has. At times I
| am tempted to make full use of the built in database
| permissions and let the user just type queries directly. but
| this suggestion is always vetoed.
| aaron695 wrote:
| system2 wrote:
| I am increasingly hating twitter being used for blogging.
| Archelaos wrote:
| Those interested in the state of the art of professional tagging
| systems in culture heritage may have a look into the CIDOC
| Conceptual Reference Model (CRM): https://www.cidoc-crm.org/
| counttheforks wrote:
| Anyone have a suggestion for a tagging filesystem that is
| maintained? Or if not a filesystem, something that at least
| works? I still feel like this is the best way to organize
| personal photos and media, and while https://www.tagsistant.net/
| is pretty good it hasn't been updated in 6 years and is fairly
| buggy.
| btrettel wrote:
| I haven't tried it yet but https://tmsu.org/ is actively
| maintained and looks nice.
| somat wrote:
| unix has tags, they are known as hardlinks.
| hwayne wrote:
| I just gave up and mimicked tags with symlinks and subfolders.
| ie "foo" is tagged "todo" if there's a symlink to it in
| "Tags/todo/".
|
| It works surprisingly well, since I can manage it with standard
| shell scripting.
| comfypotato wrote:
| Dr. Karl Voit did his dissertation designing a tag-based file
| system. I don't know what the status is today, but the
| dissertation itself may be a decent place to start your search.
|
| https://karl-voit.at/tagstore/en/papers.shtml
| greggman3 wrote:
| MacOS has tags. Right click any file in finder, select
| "Tags..."
|
| No idea if they are implemented at a filesystem level but there
| are various tools for finding things by tag
| jrochkind1 wrote:
| > I can't find anything on how to design and implement anymore
| more than the barebones basics of a system.
|
| All of this stuff (horse/horses etc) is extensively discussed,
| maybe look under "taxonomy" or "ontology".
|
| Now, whether you want to use any of those solutions or not or
| find the discussion useful or not... if you aren't finding
| anything about it at all, you aren't looking in the right places.
|
| (I learned about it in librarian school)
| avgcorrection wrote:
| Librarians are the people that we (technologists) should learn
| from. But all I see is programmers trying to invent things from
| first principles.
| jrochkind1 wrote:
| Eh, as the librarian who wrote the post you're replying to...
| I am actually ambivalent.
|
| I wish librarianship as a field and industry were more what
| I'd fantasize it should/could be, but it's not so much.
| edflsafoiewq wrote:
| Can you link some resources about it then?
| meej wrote:
| This is a good basic overview, goes beyond tagging/indexing,
| was the textbook in LIS501 Information Organization and
| Access at UIUC-GSLIS (now the iSchool at Illinois) in 2006:
|
| https://mitpress.mit.edu/9780262512619/the-intellectual-
| foun...
|
| Controlled vocab standards:
|
| https://www.niso.org/publications/ansiniso-z3919-2005-r2010
|
| (this one is deprecated in favor the one that follows)
|
| https://www.niso.org/schemas/iso25964
|
| https://www.w3.org/2004/02/skos/
|
| The book we used in my thesaurus construction class at UIUC:
|
| https://www.alastore.ala.org/content/essential-thesaurus-
| con...
|
| My favorite intro to semantic modeling with RDF/OWL/SPARQL:
|
| http://workingontologist.org/
|
| Topic Maps are dead but i still have a soft spot for them:
|
| https://www.isotopicmaps.org/
|
| I also recommend Heather Hedden, linked in jrockhind's post.
| Tomte wrote:
| This is German, but I found it very good:
|
| https://www.isi.hhu.de/fileadmin/redaktion/Fakultaeten/Philo.
| ..
|
| Books:
|
| * Cataloging the World
|
| * Organising Knowledge. Taxonomies, Knowledge and
| Organisational Effectiveness
|
| * The Intellectual Foundation of Information Organization
|
| * The Oxford Guide to Library Research
| jrochkind1 wrote:
| I could, but honestly I'd just be googling "taxonomy". But ok
| that's not entirely true, I know how to refine my search and
| recognize when something is what I'm thinking of, from some
| familiarity with the field.
|
| (But if you want to look around, in addition to "taxonomy"
| and "ontology", other good terms are "information
| architecture" and "controlled vocabulary").
|
| These are not things I have vetted, this is literally just me
| googling and taking a quick skim...
|
| https://blog.optimalworkshop.com/how-to-develop-a-
| taxonomy-f...
|
| https://www.uxbooth.com/articles/introduction-to-taxonomies/
|
| https://www.nngroup.com/articles/taxonomy-101/
|
| http://accidental-taxonomist.blogspot.com/2020/11/what-it-
| th...
|
| Or how about some textbooks:
|
| https://narrowgaugebooks.indielite.org/book/9781627055802
|
| https://www.hedden-information.com/accidental-taxonomist/
| tantalor wrote:
| https://en.wikipedia.org/wiki/Library_and_information_scienc.
| ..
|
| https://en.wikipedia.org/wiki/Tag_(metadata)
| chaostheory wrote:
| Yeah, the content for learning has been around for over a
| decade or mor
|
| Plus we have plenty of content for AI now
|
| https://towardsdatascience.com/machine-learning-classifiers-...
| samastur wrote:
| The problem isn't knowing what the problem is (taxonomy and
| ontology), but how to implement it effectively.
|
| I've seen enough of Hillel's posts over the years that I am
| fairly sure he is aware of taxonomy/ontology too.
| pvg wrote:
| _(I learned about it in librarian school)_
|
| As the rest of us learned during the first tagging boom, the
| librarian is the natural apex predator of tagging.
| stinkytaco wrote:
| I've been a librarian for more than 15 years and I can only
| speak from personal experience when I say that I am the apex
| predator of nothing. Every once and a while I will get it in
| my head to systematize my personal knowledge base with a
| controlled vocabulary and ontology and I just fall on my
| face. I really want it for some twisted reason, though.
|
| Turns out LC subject headings -- for all their failures --
| are pretty good.
| lofatdairy wrote:
| To be fair to OP, the biggest hurdle in learning anything is
| knowing what questions to ask. When you don't have ontology as
| part of your vocabulary it's hard to find literature regarding,
| say, "comparison of ontologies for user-generated text
| content".
|
| I suppose this flows back into library science, which is all
| about systematizing where to look for answers to questions, but
| I'm always astonished to find that there's oceans of literature
| and research in questions I haven't even thought to ask.
| vonseel wrote:
| I think OP is referring to finding software-engineering
| related design discussions surrounding tagging systems, but
| yes, I'm sure there is a great depth of ontology material and
| librarian knowledge that could add to software system
| designs.
| josefrichter wrote:
| I was fascinated by ontologies 10 years ago. Since then, I've
| been studying human brain, only to realize that this is an effort
| to basically build a software version of human brain. Maybe it's
| possible, but it's definitely not feasible in 99.9% of cases. The
| closest thing we have is some machine learning approaches.
| pphysch wrote:
| My current solution to this problem is just putting a JSONB
| column in relevant tables. GIN indexes do the heavy lifting as
| needed.
|
| This lets us implement arbitrary, queryable ontologies on top of
| the data without requiring further database instrumentation
| (aside from creating an index now and then).
| flanked-evergl wrote:
| Look at wikidata, RDF and semantic web. This is somewhat a well
| solved problem that should not be solved differently again.
| tinco wrote:
| If you're at the point where you're adding hierarchies to your
| tags, I think you're fighting a losing battle. At that point, why
| not do what Google does and just make a BERT embedding. No way
| you're going to manually achieve the full extent of complexity of
| how humans group and describe things.
| didip wrote:
| If you don't want to think too hard, just funnel the tags
| information into a search engine like Elastic Search.
|
| It already handles stemming, stop words, aliases, etc.
| xg15 wrote:
| I worked with the Wikipedia category system a few years ago, and
| you could see the problems with hierarchical tagging systems
| right in action back then. (Though it may have gotten better in
| the meantime)
|
| The system appeared simple: There were just two relations,
| "Article A is a member of category B" and "Category X is a
| subcategory of category Y".
|
| However, in practice, the community was using this system to
| represent a whole host of wildly different relationships between
| items, often with different implications what a category actually
| applied to.
|
| E.g., if A has a subcategory B, this could mean one of several
| things: B might be an additional constraint on the items in A
| ("American writers" -> "19th century American writers"), the
| _things_ in B might be more specific than the things in A: (
| "Writers" -> "Novelists"), A might apply to the _concept_ B, not
| the things in B ( "Occupations" -> "Writers") or A might refer to
| the _category_ B ( "Categories with more than 100 entries" ->
| "Writers") and on and on...
|
| Of course those different aspects could even be combined. E.g.
| "Categories with more than 100 entries" might have a child
| "Categories with more than 100 entries in need of review", which
| represents a constraint but might itself contain less than 100
| entries...
|
| The basic question "Is item X in category Y" becomes impossible
| to answer generally, because there is no clear indication if a
| category only applies to its direct children or to all of its
| descendants or only to the subcategories itself.
|
| I'm sure there are sophisticated ontological systems which would
| allow users to specify all those different relationships
| separately. I'm also pretty sure that users would become sloppy
| after a short time or would disagree which particular
| relationship to use in a particular situation...
| crazygringo wrote:
| I encountered the same problem a few years ago and indeed
| realized that using categories to understand what type of
| article a thing was (person? subject? event?) was utterly
| useless, for the reasons you describe.
|
| On the other hand, I discovered that infoboxes (the data in the
| top-right box on most pages) was generally extremely reliable,
| if frustrating to parse.
| maxbond wrote:
| The infoboxes are created from a query to Wikidata, which you
| can query yourself! No scraping necessary!
| https://query.wikidata.org/
|
| You'll want to learn SPARQL, but if you know SQL it's not so
| bad to pick up.
| crazygringo wrote:
| As far as I can tell, that is not the case, sadly.
|
| Right now it appears that only 3,975 articles have
| infoboxes auto-generated from Wikidata. [1] The wikitext
| contains something like "{{Wikidata Infobox ...}}" instead
| of just "{{Infobox ...}}".
|
| If you look up a popular article like Barack Obama [2],
| it's just a traditional hand-edited infobox. In fact, one
| of the first lines of data says "Vice President = Joe
| Biden", while the Wikidata entry for Barack Obama [3]
| doesn't reference Biden anywhere -- so not only is the
| Wikipedia infobox not generated from Wikidata, but Wikidata
| isn't pulling all the relevant info from Wikipedia either.
|
| Back when I had been working on my project, I'd hoped
| Wikidata could be a solution but it was far too incomplete
| and information was regularly out of date. Perhaps
| (hopefully) it's better now, but it's clearly not being
| used to power infoboxes yet except in a tiny number of
| cases. (Which actually complicates things more now, since
| anybody parsing Wikipedia infoboxes now has to deal
| separately with the 3,975 ones that grab from Wikidata,
| since none of the actual data is copied over into the
| wikitext...)
|
| [1] https://en.wikipedia.org/wiki/Category:Articles_with_in
| fobox...
|
| [2] https://en.wikipedia.org/wiki/Barack_Obama
|
| [3] https://www.wikidata.org/wiki/Q76
| matkoniecz wrote:
| Wikidata is not solution at all.
|
| I recently run into the same kind of problem in Wikidata.
|
| https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Ont
| o...
|
| typical problem is of "light rail (Q1268865) is data
| visualization (Q6504956)" kind - this specific is fixed,
| but there are many similar
|
| https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive
| /...
|
| https://www.wikidata.org/wiki/Wikidata:Project_chat#Ontolog
| y...
| aaroninsf wrote:
| An exactly analogous problem exists in the Collections
| hierarchy at the Internet Archive, of uploaded/digitized
| material (not the Wayback Machine web captures).
|
| A single graph is applied locally with very different
| semantics; and absent a distinct tagging systems, collection
| membership is sometime used to mark material for treatment in
| some way.
| travisjungroth wrote:
| The issue is that system has nodes and edges, but no concept of
| distinct graphs. That leaves you trying to fit all notable
| human knowledge onto a single graph, which is non-optimal.
| Whether it's also a DAG, tree, or something else doesn't even
| matter.
|
| Ontologies are like languages. There is no _correct_ one. What
| matters is how good a fit it is for the problem at hand and
| that you're all using the same one! If half the people are
| using Italian and half Spanish, it's going to be a disaster. I
| wouldn't use APL to write a UI and I wouldn't architect a
| computer system in Shipibo.
|
| Similarly, if I'm bird watching, "Birds of Northern California"
| is very useful. Organizing them by genus is less useful to me
| in that moment, but it's not _wrong_.
| [deleted]
| moonchild wrote:
| I don't think you necessarily need multiple graphs; just
| labeled edges.
| travisjungroth wrote:
| You just need some way to interact with it as multiple
| graphs. Some variation of labeled edges is probably the
| best.
| ok_dad wrote:
| Isn't this literally just saying we need another layer of
| categorization on top of the categorization layer?
| all2 wrote:
| Perhaps "adjacent to" rather than "on top of"? I've started
| looking at this kind of problem in terms of DB queries or
| set relations. Even "organization" can be a set relation if
| there are the right bits of metadata in place.
| travisjungroth wrote:
| It's saying you need support for multiple types of
| categories. You could use the same system to organize
| itself. No need for a meta layer.
| matkoniecz wrote:
| I recently run into the same kind of problem in Wikidata.
|
| https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Onto...
|
| typical problem is of "light rail (Q1268865) is data
| visualization (Q6504956)" kind - this specific is fixed, but
| there are many similar
|
| https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/...
|
| https://www.wikidata.org/wiki/Wikidata:Project_chat#Ontology...
| pessimizer wrote:
| > I'm sure there are sophisticated ontological systems which
| would allow users to specify all those different relationships
| separately. I'm also pretty sure that users would become sloppy
| after a short time or would disagree which particular
| relationship to use in a particular situation
|
| I think the problem is allowing users to freely tag, then.
| There should be easily accessed guidelines about how each tag
| should be used, and people who are constantly moving them,
| correcting them, and updating usage guidelines.
|
| We need the ability to implement governance systems on top of
| web 2.0+ style content systems. People should be able to vote
| for representatives (with any number of voting systems), create
| committees, submit changes to be voted on, etc. Instead we
| usually work based on hierarchical dictatorships or imagined
| consensus. People need organizational management tools baked
| into software, because organization of information depends on
| it. Instead of proposing a new committee to come up with the
| schema of everything, better tools that enable users to build
| committees.
| jrumbut wrote:
| The fundamental tension in tagging systems, to me, is whether
| tagging is a feature the software offers to the user or a
| task the user performs to assist the software.
|
| In the first case, you want freewheeling and tolerate
| ontological inconsistencies because you want to offer
| flexibility to users and will capture hard to quantify
| emergent benefits (some made up examples: "try the tag
| user233-favorite, I keep discovering awesome articles!", "the
| physicist-needed tag has highlighted a lot of misinformation
| surrounding quantum physics and relativity"). People use it
| to the extent it is useful.
|
| The other way, with formal semantics, governance (which you
| made some very wise points about), etc allows the software to
| reply to queries like "19th-century + Missouri + humorists"
| in a performant and authoritative way. It's not really a
| feature so much as it is a way to enable other features.
| skyde wrote:
| ("Occupations" -> "Writers") seem wrong why would you do this?
| same for ("Categories with more than 100 entries" ->
| "Writers").
|
| This seems like trying to put tag on category entity instead of
| creating a tag hierarchy.
|
| Those 2 should be stored using different relationship type
| mechanisms.
|
| ("categoryTag", <SourceTag>, <DestinationTag>)
|
| ex: ("categoryTag", "Occupations", "Writers")
|
| and
|
| ("parentTag",<tagName1>, <tagName2>)
|
| ex: (("parentTag", "American writers" , "19th century American
| writers")
| xg15 wrote:
| Indeed. It's a bit like if a programming language was trying
| to represent base classes and meta classes using the same
| mechanism.
|
| My guess is that no one realized the need for "meta"
| categories when the system was implemented, so later the
| existing hierarchy was simply co-opted instead of
| implementing a new functionality for that use case.
|
| As long as the categories are only used by human editors and
| use is only within some small subcommunity, it can work quite
| well. The problem starts if you want to combine categories
| used by different communities or if you (or your program)
| lack the domain knowledge to understand which nodes represent
| "meta" categories.
|
| As another poster said, the better approach to use Wikipedia
| data for automated processing is using infoboxea or the
| explicitly machine-readable Wikidata repository. The category
| system looks machine-readable on first glance but really
| isn't.
| layer8 wrote:
| There are only two kinds of relation here, "subset of" and
| "instance of" (aka "element of", type-token).
|
| The category-category relations are intended to always be a
| subset relation. The article-category relations are intended to
| always be an instance-of relation.
|
| - "19th century American writers" is a _subset_ of "American
| writers".
|
| => Both are a category, so no problem.
|
| - "Novelists" is a _subset_ of "Writers".
|
| => Both are a category, so no problem.
|
| - "Writer(s)" is an _instance_ of "Occupation".
|
| => Here the problem is that "Writers" is a category. It would
| be okay if it was an article "Writer (occupation)".
|
| - "Writers" is an _instance_ of "Categories with more than 100
| entries ".
|
| => Here, again, the problem is that "Writers" is a category,
| and having an instance-of relation between categories is not an
| intended/supported use-case.
|
| This could conceivably be solved by supporting an instance-of
| relation between categories, in addition to the existing subset
| (subcategory) relation. It could be called a meta-category
| relation. Then you could have the category of occupation
| categories.
|
| Another way to put this is that categories have to be typed: a
| category contains either (just) articles, or it contains (just)
| categories. Subcategories then must match the type of their
| supercategories and correspondingly must contain either
| articles or categories.
|
| Basically, Wikipedia's type system is not expressive enough to
| allow everything people would want to express in it.
| laszlokorte wrote:
| Clearly the solution to all of this would be the category of
| all those categories that do not contain themself.
| maratc wrote:
| The problem might not be with hierarchical tagging systems, but
| with the specific hierarchical tagging system they use at
| Wikipedia.
|
| Imagine another system with the following categories:
|
| * People:ByOccupation:Creative:Writers
|
| * Time:CommonEra:ByCentury:19
|
| * Location:Earth:Americas:NorthAmerica:USA
|
| In this scheme of things, e.g. Mark Twain would be tagged with
| all three. "19th century American writers" (which includes Mark
| Twain) would not be a _category_ but _a saved search_. (Other
| saved searches -- which would also include Mark Twain -- would
| be "19th century people from Americas" or "Stuff from Planet
| Earth").
| errantmind wrote:
| Instagram's tagging system was actually really effective at
| categorizing content and discovery because each hashtag was
| treated as a node in a (giant) graph, where each node has
| multiple properties, including post count (number of posts using
| a tag), 'velocity' (number of posts using a particular tag per
| unit time), etc. I could write up a big post about it as I made a
| study of it in when I created a web app for finding the most
| relevant tags a few years ago.
|
| All that to say there was a lot to their system and it worked
| because users became aware that they were rewarded for using the
| most relevant tags. Using irrelevant tags was punished. This
| guided users towards using a mix of relevant popular and niche
| tags to maximize their reach, which, in turn, further improved
| the tagging system.
|
| Instagram's tagging system isn't as important anymore as their
| algorithm has deemphasized it, in favor of other methods for
| classification and discovery, but there were a couple of golden
| years where it worked very well. Most users still look back on
| those years as the 'good times' even if they don't know exactly
| why. I'd go so far as to say they ruined the app after they
| deemphasized tags (and added way too many ads)
| leksak wrote:
| Please write that big post! Sounds interesting
| PaulHoule wrote:
| People look forward to a visit with the ontologist they way they
| do a visit with the orthodontist.
| [deleted]
| at_a_remove wrote:
| This seems like one of those Eternal Problems that people,
| whether librarians, programmers, or hobbyists, stumble across,
| think they'll make headway in, then discover that they've really
| managed to progress just a few feet across a vast and hostile
| surface of landmines, pitfalls, and lures. Each "obvious" step
| (I'll have parent relations to define a context!) is only yet
| another bargain with the Devil, who laughs at your precautions.
| MisterBastahrd wrote:
| I guess if you're really focused on it. I built a content
| tagging system for an old employer that would attempt to guess
| context based on keywords and associations but give the writer
| of the content the final say in what's actually being tagged.
|
| Sure, I could have spent a thousand hours refining it, but the
| improvement would have been marginal and it still would need
| human interaction.
| csours wrote:
| Was it used for content related to that particular business?
| I think as long as you have _relatively_ limited variety, you
| can make something that works well enough.
| at_a_remove wrote:
| Similarly, I think if you have a limited number of people
| doing the classification, you can also make a good shot at
| it.
| eastbound wrote:
| Tagging's pain is that it's a problem that is easy enough where
| you can come up with plenty of ideas without prior knowledge.
| Its bane is that it is, in this sense, similar to bikeshedding.
| Everyone can have an opinion about it; Fortunately, it's only
| appealing to people who enjoy exploring problems.
| contextfree wrote:
| "Advice: don't let the tag predicates refer to other tags"
|
| But then how would I search by the tag of all tags that do not
| tag themselves???
| ok_dad wrote:
| I like how they worked out an advanced tagging system's
| requirements from a ~dozen tweets, starting with the most basic
| tagging system and working up through a tag hierarchy to a tree
| to a DAG, then even talks about K/V tags and etc.
| somat wrote:
| My (Chomskyish)hierarchy of tag systems goes something like.
|
| tagged data
|
| key=value tagged data
|
| hierarchically tagged data (we just found the the unix
| filesystem!)
|
| hierarchical key = value tagged data (oh damn, it's ldap, we dug
| too deep.)
| micromacrofoot wrote:
| This reminds me of a talk from Clay Shirky about categorization
| and general ontology. It's interesting to read in hindsight,
| because it's from when recommendation algorithms were in their
| infancy.
|
| Warning PDF:
| https://ia800203.us.archive.org/10/items/Ontology_is_Overrat...
|
| > This is what we're starting to see with del.icio.us, with
| Flickr, with systems that are allowing for and aggregating tags.
| The signal benefit of these systems is that they don't recreate
| the structured, hierarchical categorization so often forced onto
| us by our physical systems. Instead, we're dealing with a
| significant break -- by letting users tag URLs and then
| aggregating those tags, we're going to be able to build alternate
| organizational systems, systems that, like the Web itself, do a
| better job of letting individuals create value for one another,
| often without realizing it.
| dadadad100 wrote:
| Thank you for this link. I've been looking for a good
| discussion of the browse vs search argument and this is very,
| very good
| joshu wrote:
| there's a massive difference between tagging-for-self-recall and
| tagging-for-other-recall. when i invented tagging the first was
| paramount, but the latter has become dominant and has very
| different design considerations
|
| one interesting note: you can infer a bunch of hierarchical
| information since people frequently tag from broader to more
| specific, topicwise.
|
| some things can be tagged by multiple people and you can thus
| infer synonyms as well. this can thus be fixed in search.
| milesskorpen wrote:
| "When I invited tagging" is such a flex. But creating delicious
| gives you some credible claims there.
| joshu wrote:
| I don't get to use it much these days
| redbar0n wrote:
| A very insightful thread by Hillel Wayne on content tagging
| systems and their challenges.
|
| Their ubiquitous use (in library and information sciences, and
| popular social networks like Instagram, Twitter, and Pinterest),
| their deceptive ease of implementation, and "obvious advantages"
| over hierarchies/folders, means that almost every developer has
| (or will) run into them at one point or another..
|
| Feel free to comment with good theory and case studies on tagging
| systems. (It's especially interesting with good case studies for
| how to model an advanced tag system in a graph database).
| lmkg wrote:
| > It's especially interesting with good case studies for how to
| model an advanced tag system in a graph database
|
| I wouldn't accuse it of being a _good_ tag system, nor a true
| graph database, but one thing to look at is Semantic MediaWiki.
| It 's a MediaWiki extension which takes Categories as a
| starting point, and extends it quite far with e.g. relations
| and key-value pairs.
|
| One interesting feature of Semantic MediaWiki is called
| "Concepts" which are essentially "computed tags." They can be
| used in place of Categories in most places, but while
| Categories are set by editors on a piece of content, Concepts
| are defined by a query against Categories or other properties.
| This can help bridge gaps between different _types_ of tags
| that represent different ways of thinking about the content.
| tra3 wrote:
| I've been dabbling in personal knowledge bases for a long time
| now. I remember the when I discovered tags -- thought it was the
| best thing ever. The first good implementation in the wild (for
| me) was del.icio.us. Eventually I ran into all the problems that
| the linked thread describes. "Movie" or "movies"? "Book" or
| "books"?
|
| In any case, I still think flat tag lists are better than a
| directory tree structure ("Content/Movies" vs "movies, movie,
| entertainment, science fiction, space travel, aliens").
|
| A recent innovation that I'm enjoying is backlinks. I believe
| roam research was the first major player that showed you related
| entries via the links that you included, even though a similar
| concept existed forever. Then you can generate clouds of
| relationships and find concepts visually [0].
|
| 0: https://noduslabs.com/cases/visualize-connections-notes-
| roam...
| jazzyjackson wrote:
| > backlinks
|
| > recent innovation
|
| Ted Nelson is rolling in his Xanadu
| tra3 wrote:
| 100% there was prior art to this, I was thinking
| zettelkasten. Didn't know about Xanadu though!
| NWoodsman wrote:
| In my app, users apply a set of tags to a note, but then the app
| automatically creates hierarchical associations in a tree. There
| are an exponential number of associations between tags (At one
| point design was failing because it was trying to prebuild 100k+
| GUI items for these cross-referenced tags) so I had to virtualize
| the intersection of tags at the exact moment a user expands a
| tree item.
|
| You cannot plan what tag search will lead you back to the data
| you want, so every node in the graph must be bidirectional.
| emj wrote:
| Openstreetmap is map data that is basically coordinates with tags
| on them and relations between those tags. I guess this is true
| for most GIS software but there is very little 2D map data that
| can not be described in the OSM tagging model.
|
| You can never express everything with tags, you need stats and
| metadata on metadata, documentation and a strong heterogeneity
| which also need to be able to adapt to new ideas.
|
| https://wiki.openstreetmap.org/wiki/Tags
| https://wiki.openstreetmap.org/wiki/Map_features
| matkoniecz wrote:
| https://wiki.openstreetmap.org/wiki/Tagging_mailing_list (
| https://lists.openstreetmap.org/pipermail/tagging/ ) is a
| fascinating, hilarious and interesting place.
|
| Basically it is about an endless attempt to classify at least
| part of reality, in organically growing worldwide project based
| on bunch of passionate obsessive hobbyists with overly strong
| opinions.
|
| With bonus of bunch of politics, confusion and passion.
|
| https://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_AP...
| is likely of interest.
| philip1209 wrote:
| We spent a lot of time building tagging systems to organize
| technology skills on https://www.moonlightwork.com.
|
| The coolest part was training a collaborative filter on the tags.
| So, when you add "Django" as a skill, it could recommend "Python"
| as a related skill. This made for some refined user experiences.
|
| Getting typeahead search right took a lot of refinement. Here is
| some of the logic we ended up implementing over time:
|
| 1. Exact matches get prioritized first (e.g. "Go")
|
| 2. Abbreviations support (e.g., "AWS" for "Amazon web services"
| or "ROR" for "Ruby on Rails")
|
| 3. Name that start with query should go before non-leading
| matches (e.g., "Ru" should return "ruby" before "task runner")
|
| 4. We tracked an "Aliases" column for each tag to enhance search.
| So, "golang" was an alias for "go".
| UltraViolence wrote:
| dahdum wrote:
| I adore tagging systems and have worked on them in several
| different applications and implementations, but there are always
| pitfalls and trade offs, and it's possible to bury yourself
|
| Nowadays I nearly always store the assigned tags as an integer
| array column in Postgres, then use the intarray extension to
| handle the arbitrary boolean expression searches like
| "((1|2)&(3)&(!5))". I still have a tags table that stores all the
| metadata / hierarchy / rules, but for performance I don't use a
| join table. This has solved most of my problems. Supertags just
| expand to OR statements when I generate the expression.
| Performance has been excellent even with large tables thanks to
| pg indexing.
| justinpage wrote:
| Would you mind sharing a simple example that demonstrates this?
| Sounds great!
| srcreigh wrote:
| Do you index arrays? What index type is that? Any tips?
|
| I've used array column in PG before, haven't indexed arrays
| though.
| marcosdumay wrote:
| AFAIK, postgres first got its reputation of high performance
| because of array indexes.
|
| People usually go with GIN indexes, that can be used on the
| contains, overlaps or equals comparisons.
| RussianCow wrote:
| The tradeoff here is that you lose the foreign key constraint,
| correct? So if you delete a tag, there is no way for the
| database to automatically remove all references to it. Or is
| there some way to do this now?
| [deleted]
| brianwawok wrote:
| Right . More like nosql FKs.
|
| How high is the business risk if you have a random tag with
| no name? Skip it's display jn the UI
| [deleted]
| blueblob wrote:
| A lot of the items described are problems in ontologies
| edflsafoiewq wrote:
| Yeah. A tag is a predicate. Sub-tags are implication (male
| author => author). Tag aliases are equivalence (implication in
| both directions).
| roberthahn wrote:
| I'm so happy to see people talk about this! I too am endlessly
| fascinated with content tagging systems.
|
| Hillel's thoughts are completely unsurprising to me so I guess
| I've come to similar conclusions.
|
| I do notice that we seem to care about different things though -
| where Hillel appears to focus on tag types (and the
| implementation challenges that go with that) I focus more on
| human factors like what problem are we solving? for who? How do
| we maintain relevance (and power) in tagging systems (and for
| who?)
|
| I'm of the opinion that tagging systems should not be made by the
| few for the many but by each person for themselves. Which, of
| course, sucks because that puts the onus on everyone who wants
| tagged content to do their own work. But I believe the output of
| that investment would be quite valuable and useful!
|
| An easy example I could use might be recommendation engines.
| Assume I have a database of tags (a tag cloud?), and I know you
| have similar interests to me. If you also have a tag cloud, I
| could input links to both of our tag clouds into a purpose-built
| recommendation engine to discover new content I might not have
| consumed yet.
| jsemrau wrote:
| > I could use might be recommendation engines. Assume I have a
| database of tags (a tag cloud?), and I know you have similar
| interests to me. If you also have a tag cloud
|
| This was the first "naive" implementation on finclout. Every
| post get automatically scanned for ranked keywords and then
| matched with other known entities about the post. We also user
| collect tags from the user and have users verify keyword
| matches.
| k__ wrote:
| I don't know much about this topic.
|
| The only thing I learned: if you think you have a taxonomy, then
| you don't.
| AtlasBarfed wrote:
| Eh, the diamond problem and transitive issues don't exist because
| what is being reduced to is simply a set and membership. if
| expansions / aliases / synonyms / multi-membership produce
| overlaps, who cares, it's a set of hashs. The overwrites only
| represent wasted computation.
|
| Really this is a simpler version of multiple inheritance. You
| don't have the issue of conflicting method signatures and
| implementations, only names.
|
| The only danger is names meaning different things. You need your
| tags to be relatively unique to the meaning.
| cptcobalt wrote:
| I can't wait for the author of this thread to discover the AO3
| tagging system, which is, frankly, a masterpiece that
| demonstrates how effective community management can lead to
| _extremely good_ tagging and categorization, with very little
| miscategorization.
|
| https://www.wired.com/story/archive-of-our-own-fans-better-t...
|
| https://archiveofourown.org/faq/tags
| swyx wrote:
| its literally the third tweet in his thread
| account-5 wrote:
| They mention it 4th post in the thread.
|
| I never heard of it though, what's so good about it?
| at_a_remove wrote:
| The AO3 tagging system badly needs pruning. I hesitate to make
| examples, as the specificity will serve as a "call out," but
| quite a lot of authors throw in single-use, digressive tags as
| some kind of commentary on their own work. Huge meandering
| swaths of crap tags, and the people who make them ought to have
| their permissions to create tags revoked.
| PuppyTailWags wrote:
| I kind of disagree with this. Tags are dual use in AO3,
| specifically they serve as a way to find specific stories
| with specific thematic or plot elements, but they
| additionally serve as a free expression of the author because
| its the author who chooses which if any tags they want to use
| to describe their piece. When an author gets to decide the
| categories of a work, the categorization also becomes an
| expression.
|
| Consider the flavor of "Dead Dove: Do Not Eat" tag, which
| serves both as an author's expression of warning the reader
| and also a category of fanfic that is expected to have
| transgressive elements. Just tagging, idk, "child
| endangerment" completely misses the point of "Dead Dove: Do
| Not Eat" comparatively.
| at_a_remove wrote:
| I will paraphrase this to avoid a callout, but "no
| regenerating limbs those arms are toast sorry QA despises
| them" is _not a useful tag_. (This is a mild example, I 've
| seen far worse)
|
| First, it is a single-use tag. Tags are for _categories_ ,
| not solo entries. Solo entries explode the tagspace to no
| good end.
|
| Second, that expression belongs in the summation of the
| work, or just about anywhere else. Tags are for other
| people to use to find similar works or for readers to look
| for things based on their interests. Metadata is not for
| artistic expression, unless you're one of those people who
| believes that artists ought to be able to choose their own
| Library of Congress call numbers and such, people who want
| to include "elephant" in the metadata despite the work
| having nothing to do with elephants.
| PuppyTailWags wrote:
| I think you're missing the point that, in AO3
| specifically, tags are not solely metadata. Tags are also
| artistic expression _in the context of AO3_. That 's the
| thing. AO3 doesn't function like the Library of Congress,
| and there are no librarians that are independently
| assigning categories to fanfic. An author can choose to
| opt out of tags entirely, and people cannot put tags on
| other people's fanfic even if it's relevant and would
| benefit that work's findability. The simple mechanism of
| the author having sole control of what tags they want to
| apply to the work causes the act of tagging to also serve
| the purpose of artistic expression-- this results in
| spontaneous tags going from single-use to culturally
| known, such as "no beta we die like men", and therefore I
| think arguably useful _but only in the context of AO3_.
| Ajedi32 wrote:
| > An author can choose to opt out of tags entirely, and
| people cannot put tags on other people's fanfic even if
| it's relevant and would benefit that work's findability
|
| Curious about how this doesn't render the entire system
| near-useless? In my experience with other sites with
| user-generated content that allow tagging, this decision
| always makes the whole system way worse, because the OP
| alone is almost never going to be aware of all possible
| tags that are applicable to whatever it is they posted,
| and will instead just take the first 3-5 words that pop
| into their head and stick those in the tags field. The
| end result is a tagging system that barely works; you can
| search for a tag but you'll miss tons of stuff, and you
| can filter out a tag but you'll still see tons of stuff
| in that category. And if you ever find a hyper-specific
| tag you really enjoy it'll only have like 5 items in it
| even if there are hundreds or thousands it could be
| applicable to.
|
| Don't get me wrong, the wiki-style approach of just
| letting anyone edit tags has its own issues, but it does
| at least result in tags on everything being at least
| mostly complete, and actually useful for finding what you
| want (or filtering out things you don't want).
| PuppyTailWags wrote:
| > Curious about how this doesn't render the entire system
| near-useless? In my experience with other sites with
| user-generated content that allow tagging, this decision
| always makes the whole system way worse, because the OP
| alone is almost never going to be aware of all possible
| tags that are applicable to whatever it is they posted,
| and will instead just take the first 3-5 words that pop
| into their head and stick those in the tags field.
|
| A few things makes this work brilliantly:
|
| - authors are encouraged to tag as much as they want with
| whatever they want
|
| - tags have an autocompletion to help authors select tags
| on keywords
|
| - authors are prolific fanfic readers themselves and are
| therefore usually extremely familiar with the tag system
|
| - manual tag linking means searching for one tag will
| also return results for all related or near-identical
| tags, a linking which has an extremely high success rate
| due to dedicated and extremely knowledgeable volunteers
|
| This overall ends up being that authors use prolific
| tags, and reuse prolific tags from others, and ultimately
| search isn't strongly affected because the entire
| readerbase is hyper-knowledgeable. Check out the
| extremely specific fanfic-only "hanahaki disease" tag
| description in ao3 and you'll quickly see that any
| variety of related tags, with any level of
| hyerspecificity(some tags have neither "hanahaki" nor
| "disease"!), will appear searching for any of them,
| including hanahaki disease in other languages!:
| https://archiveofourown.org/tags/Hanahaki%20Disease
| at_a_remove wrote:
| Then tags in AO3 are just more of the text and not much
| of a finding aid. You can't have both.
| PuppyTailWags wrote:
| Tags end up being an excellent finding aid due to the
| strength of the community's tag linking, you see. So they
| serve both purposes.
| at_a_remove wrote:
| "no regenerating limbs those arms are toast sorry QA
| despises them" just isn't useful if I want to locate a
| particular text, other than "I'm liable to get a Tumblr-
| stink off of this crap."
|
| And your defense of this is really ... _internal_ , as
| in, this all looks like a lot of in-jokes to an outsider
| who is new to AO3, or even new to a particular fandom. If
| someone doesn't know the slang, the in-joke reference,
| it's still unhelpful.
| PuppyTailWags wrote:
| > "no regenerating limbs those arms are toast sorry QA
| despises them" just isn't useful if I want to locate a
| particular text, other than "I'm liable to get a Tumblr-
| stink off of this crap."
|
| Yeah, but you're not looking for that tag, and that tag
| wouldn't affect your search in any way. That's the thing.
| You're approaching tags like they can only only ever be
| used one way, and yes they _can be that, and also other
| things that don 't affect your personal use_. So when you
| search for your specific tag, all synonymous tags will
| also appear, and all superfluous tags don't affect your
| search. A one-off tag doesn't affect your ability to
| search for multi-use tags.
|
| EDIT: Additionally, the fact the tag exists has also
| helpfully indicated to you that this is a fic you
| probably don't want to read because of the author's
| cultural hinting through their use of tags. You're
| proving my point here-- the one-off tag doesn't affect
| your ability to search for your specific fandom or
| tropes, but also it allows you to pick flavors of fanfic
| you want from that search because of your dislike of one-
| off tags.
| at_a_remove wrote:
| You have it backward: I found the fic through other means
| entirely and eventually dropped it. When I encountered it
| again on AO3 (it was a cross-post), I said "Oh, look at
| those horrible tags." It was notable in the fact that I
| said "I need to keep this one handy the next time I end
| up having yet another conversation with someone about how
| much tagging sucks on AO3." Because this isn't the first
| time someone has brought it up to me.
|
| They just crap up the results if I am searching for
| "regeneration" or "limbs." If something is used more than
| one way, yes, it _does_ affect my personal use because it
| means "more stuff I have to filter through." When you
| search, what you do not want is extraneous results.
| That's the whole point of searching! And I guess my
| library experience is showing, but AO3 just reeks of
| amateur hour shenanigans. I predict that at some point
| there will be a movement to clean up that kind of junk.
| Tomte wrote:
| > Tags are also artistic expression in the context of
| AO3.
|
| Seems to be similar on Tumblr.
| rubinlinux wrote:
| > The only system I know that does that is the fanfiction site
| AO3, where teams of volunteers manually create aliases from,
| say, "snarry" to "Harry/Snape"
|
| They seem aware already.
| taggingthrowaw wrote:
| A taxonomy or hierarchical system sometimes also helps, eg. on
| E621: https://e621.net/wiki_pages/23556 (NSFW if you scroll at
| all or click anything).
| endisneigh wrote:
| Is there an optimal tagging system, performance wise? Seems like
| there could be a database just for tagging.
| jshandling wrote:
| I set out building my first full-stack webapp [0] to make a
| custom theme-based tagging/organizational system for musical
| ideas. I did not initially realize all the hairy design choices
| inherent in this domain, but have found it humbling and
| educational.
|
| Remaining features to be implemented include in-app audio
| recording, editing, and custom labeling outside of the main tree
| structured organizational system.
|
| I'd appreciate any thoughts or suggestions if anyone cares to
| take a look!
|
| [0] https://www.soundseeker.app/
| kortex wrote:
| I think tag aliases are fine, but in my opinion, tags should not
| have hierarchies. That is just opening the can of ontology worms,
| and most systems are ill-equipped to deal with
| ontologies...including ontological systems.
|
| Tags are just dumb strings which label data. They are basically
| KeyValues, where the value is just always equal to True. We don't
| think of KVs as hierarchical unless they are explicitly a path
| string, and in that case, they are forced to be a plain tree with
| no cycles or diamonds.
| bonaldi wrote:
| Nothing you say is necessarily the case, and is dependent on
| implementation. Take "value is just always equal to true",
| well, no, not if your key is a predicate. "Color:red" is more
| powerful than "#red" or "red:true", and "color:[lookup-ID-for-
| red-concept]" is substantially more powerful than both.
| pessimizer wrote:
| Not having tag hierarchies doesn't fix the difficulty of
| classification, it just handwaves it away. There will always
| need to be (super)tags that are collections of other tags,
| where it is a bug for an item that has a particular tag to not
| also have another, related tag. The question should be _how_
| you 're going to handle that, not if you're going to handle it,
| or you'll end up with a lot of broken tags of dubious
| usefulness.
|
| Tags are just dumb strings that label data, but tags are also
| data. If I can't label tag:"red" a tag:"colored" in your
| system, it's not great. It's not much better if I'm labeling
| things tag:"colored-red" because if I'm doing that and there's
| no central validation to add semantics to that relationship,
| I'm going to end up with tag:"red" things, tag:"colored"
| things, tag:"colored-red" things, and probably even tag:"color-
| red" and tag:"red-color" things.
|
| edit: what's so bad about cycles when it comes to a tag being
| assigned another tag that has been assigned the original tag?
| It's just a mutual implication. There's nothing wrong to me
| with adding a single tag and seeing five more added
| automatically. It means that you're building a knowledge base.
| zamubafoo wrote:
| Optional forests of hierarchy trees are where it's at.
| Essentially don't encode everything into one gigantic one.
|
| Sometimes you know that users are going to tag `laptop` a bunch
| and want that to also drag in `personal computer` (but not all
| `PC`s are `laptop`s) or that `blue dress` is also a `dress` and
| don't want to hard code special cases.
|
| That said, if you are going to do this, then you must have it
| controlled by an admin/moderator. Maybe allow for hierarchy
| request submissions but have it moderated. There is at least
| one public system where this just works to my knowledge and a
| bunch of self-hosted ones as well.
| goto11 wrote:
| > They are basically KeyValues, where the value is just always
| equal to True
|
| That would be a set of values :-)
| comfypotato wrote:
| Org mode approaches this by making hierarchies and inheritance
| optional. I personally like both, but I acknowledge (as was
| mentioned in the tweets) that hierarchies can get to be very
| convoluted if you don't work to maintain them sensibly.
| AlanYx wrote:
| What I like most about org mode tags is that regular
| expressions can be subtags (or "members of a group tag" in
| org mode lingo). So you can specify a hierarchy where the
| parents have children you don't know in advance.
| FpUser wrote:
| >"I think tag aliases are fine, but in my opinion, tags should
| not have hierarchies."
|
| Many years ago I've developed a proprietary database for a
| media related product. It was a NoSQL Entity-Attribute-Value
| database where Attribute was basically a tag. Tags had no
| hierarchy but query language allowed to specify sequence of
| attributes like Genre, Artist, Album, Title. When said sequence
| was not empty the result set would be a tree where each level
| would correspond to an attribute position as defined in query.
| NWoodsman wrote:
| I understand your pain, but want to make you aware that LINQ
| has become so powerful especially with lazy evaluation and
| expression trees that hierarchical views of tags is really
| basically simple and actually just one more method of
| visualizing data...
| PeterStuer wrote:
| Look into AI systems from the 1960's and you will find Semantic
| Networks. If you just need categories you can go with taxonomies
| and folksonomies. If you want to (over?) formalize and describe
| mainly non-agentive structure you look at ontologies.
| aaviator42 wrote:
| A few months ago I worked on some proof-of-concept code for
| searching tagged data: https://github.com/aaviator42/Cha
|
| I now work full-time in a role where part of my duties is
| designing a content tagging system and its search
| functionalities. It's very interesting and fun! Lots of puzzles.
|
| How do you weigh different tags? How do you do fuzzy searching
| ('city' should match with plural ('cities'), misspellings
| ('citys'), etc)?
|
| How do you program the system so that 'hotdog' is not matched
| with 'hot' and 'dog'? What about synonyms? What about regional
| terminology and synonym tables?
|
| Then there's one-to-one and one-to-many and many-to-one mapping.
|
| As a side project I'm also working on a beta public search engine
| that I'll launch on HN sometime in the next year or so, where I'm
| having similar puzzles.
| cube2222 wrote:
| > How do you program the system so that 'hotdog' is not matched
| with 'hot' and 'dog'?
|
| That sounds like a very good use case for word embeddings.
| bell-cot wrote:
| How do you deal with "hotdog" possibly being a noun (several
| meanings), or proper noun (several meanings), or verb, or
| interjection?
| dmonitor wrote:
| e621 frequently has to deal with characters with the same
| name, or an artist with the same name as a character. they
| just make ambiguous tags have a special syntax. so if bob
| was an artist, but also had a character named bob, it would
| just be bob_(bob) for the character and bob_(artist) for
| the artist. and if someone tried to tag something as just
| "bob" they would be told to be more specific. searching for
| all bobs can be done with bob_(*).
|
| so hotdog could have hotdog_(food), hotdog_(interjection),
| and hot dogs (the animal) would be two tags: hot and dog.
|
| it's not the cleanest solution, but it works well enough.
| Tomte wrote:
| Also great on the topic of tagging, with more information about
| the AO3 scheme:
| https://idlewords.com/talks/fan_is_a_tool_using_animal.htm
| [deleted]
| qwerty456127 wrote:
| This is crazily sad non-invasive (without embedding into the file
| body) tagging is not standardized across OSes and file systems.
| The only system to support tags I know is KDE/Dolphin/Baloo,
| outside KDE tagging seemingly is supported only by a handful of
| incompatible 3-rd party apps.
|
| Sadly I don't expect much progress to happen in this area. Almost
| nobody cares about storing and organizing of files locally
| nowadays.
|
| I hope it is going to be done some day or later (there isn't much
| to do: just standardize some xattrs and something like RDF schema
| to be used in an alternative FS stream + add support for these to
| the standard file management and search tools, this is orders of
| magnitude easier than implementing a new FS) but probably not
| soon - it would be a huge luck to get any resources allocated to
| this.
| terpimost wrote:
| I was interested in that too. I stopped when as soon as I
| realized that any good search in tagging system would be just a
| full text search. E-commerce catalogs have detailed filters but I
| think people use maximum 2 properties in addition to simple name
| input search
| wtf77 wrote:
| I am endlessly fascinated by how twitter has now become a dumping
| ground for complex topics that are difficult to read and follow.
| But what happened to the old blogs?
| throw10920 wrote:
| Nothing has happened to them. I have a few hundred distinct
| bookmarked blogs, if not over a thousand, and obviously my
| bookmark collection is a tiny fraction of what actually exists.
| They're still there.
| hwayne wrote:
| I have a really high standard for my blog posts. They go
| through several rounds of rewrites, with feedback from friends,
| before I'm happy with them. That plus the length (median ~2000
| words) means that most of my blog posts take weeks or months to
| write. I can hammer out a tweetstorm in 20 minutes.
|
| (Also, tweets are a fun format! I want each tweet to be a
| complete idea, which is hard when you have only 280
| characters.)
| [deleted]
| dymk wrote:
| It's lower effort to make a stream of consciousness post one
| sentence at a time, and as a bonus, there's a built in audience
| / discovery network where they're posting.
| tuatoru wrote:
| Lower effort for whom? Back when I were a lad, we were told
| to write so that our readers did not have to work to
| understand us. The point of writing is to be understood. Old
| man yells at cloud.
| tqi wrote:
| I think it's helpful to keep in mind that with most of
| examples that get shared around, the choice for the author
| was not a string of tweets vs blog post, but rather a
| string of tweets vs not sharing at all.
| dymk wrote:
| Lower effort to the writer, obviously.
|
| The point of posting on Twitter is not to be understood,
| it's to be retweeted.
| dylan604 wrote:
| How does one go back and edit a stream of consciousness like
| that into an actual coherent thought later though?
|
| I was just having a conversation similar to this where it was
| explained "this is just how people my age do things". While
| attempting to avoid boomer/millennial tropes, this does make
| me wonder how much different schooling is now vs then (hoping
| to avoid those memes too).
|
| I was always getting in trouble for just saying whatever came
| to mind vs slowing down to think if it really needed to be
| said or more specifically _how_ it was said.
| labrador wrote:
| I am too but I've given up. I've collected a lot data over the
| years and spent a lot of time trying to organize it so I can find
| relevant connections. It's just too time consuming. I've decided
| discerning relationships in unstructured data is where I want to
| focus.
| cpsns wrote:
| I've written a tagging system from scratch for an existing system
| and it was one of the most interesting things I've worked out. I
| had total control over how it was implemented and I _think_ I
| came up with a really nice, minimalist, scalable way to tag
| things, and to search them.
| rambambram wrote:
| Care to elaborate? I'm also working on a categorization/tagging
| system - albeit a simple one - and I find myself in a struggle
| to keep it accessible enough to use on one hand and advanced
| enough to actually add value on the other hand.
| throwaway920102 wrote:
| Empornium aka luminance has a great tagging system.
| aaws11 wrote:
| https://threadreaderapp.com/thread/1534301374166474752.html
| ggm wrote:
| Approximate date is the bugbear of photo tagging. EXIF and Dublin
| core and vendors can't agree what to do. Camera manufacturers
| don't care because at time of shot, date is fixed. It's archival,
| scanned and copied predigital work.
| dekervin wrote:
| I hacked together a small extension to tag hacker news stories. A
| small presentation here,
|
| https://datum.alwaysdata.net/static/extension/index.html
|
| With the js files for the extension.
|
| The motivation to finish it partly came from this hn thread.
| https://news.ycombinator.com/item?id=32970560
| xcskier56 wrote:
| The hierarchical nature of the information he's talking about
| really reminds me of the ontologies and terminologies that are
| used in healthcare to organize medical information. E.g.
| Ibuprofen 10mg Tab < Ibuprofen < NSAID < ... < Therapeutic
| Chemical.
|
| This is a field that I'm only tertiary familiar with but it's a
| fascinating discipline trying to group, and manage all of the
| different categories of healthcare data. You can use the RxNav
| tool to look at the RxNorm terminology which is only 1 of many
| terminology systems.
|
| https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchT...
| turnsout wrote:
| This is the reason the Semantic Web never took off--people on the
| internet can't even agree on what a "sandwich" is, let alone the
| exact hierarchy of ontology.
|
| This is an area where large language models have a role to play--
| whatever you're hoping to achieve with user-generated tags can
| probably be achieved with ML-powered associations or navigation.
| And the potential benefit is that it could be tailored to each
| user--so you're only surfacing "Hot Dogs" when certain users
| click "Sandwich."
| CabSauce wrote:
| This is what we do for the most part. Two tiers of 'tags'. One
| is curated and required, the other is an embedding.
| CobrastanJorji wrote:
| I thought that the cube rule of food generally settled the
| sandwich debate. A hot dog is not a type of sandwich, being
| surrounded on three sides. Instead, it is a type of taco.
| turnsout wrote:
| Haha yes, and I would hate to meet the Turing-complete
| tagging system that could capture this nuance!
| swyx wrote:
| we have a big tagging problem where i work and yesterday I tried
| using gpt3 to assist. worked well!
|
| code and context:
| https://github.com/airbytehq/airbyte/issues/17893
| raffraffraff wrote:
| I'd love to know what those prolific Spotify engineers think of
| this.
|
| That was a joke because Spotify doesn't let you tag music.
| acchow wrote:
| Sounds like they are trying to embed the search semantics in the
| data storage. Why not treat search as a distinct problem?
| taylorbuley wrote:
| Pro tip: use stemming!
| robg wrote:
| Surprised no one has nailed a use case for semantic tags and
| their associations. Python and snake doesn't require hierarchies
| to differentiate from Python and coding. Why aren't co-
| occurrences within and between content samples enough?
| feoren wrote:
| I'm surprised I haven't seen more discussion of how tags are an
| entry point into plain-old data architecture. It should be
| obvious that by the time you're using tags for queries like
| "start-date: BEFORE 2022-03-01", you've created an inner-platform
| where you're building a plain-old relational database on top of
| your tags. Stop what you're doing and elevate "start date" out of
| tag-land and into a more structured representation with more
| application support.
|
| Many enterprise databases add a memo field called "Comments" to
| almost every table. Clients very often end up coming up with
| their own guidelines about how to embed various information in
| the comments fields that the primary structure is missing.
| Looking over how clients are using the "comments" fields is a
| great way to discover new things that should be formally
| incorporated into the structure of your data architecture.
| Similarly with tags.
|
| Look at tags as a starting point for adding a bit of loose
| structure to the frontiers of your data architecture. Mix them in
| with more structured data architecture. Be ready to "graduate"
| tags up to the next level of structure when it becomes
| appropriate. Stop worrying about how to make tagging perfect and
| embrace it for what it is: an easy way to get started on modeling
| the parts of the domain that you haven't spent a long time
| thinking about yet. A good way to understand how users want to
| use your system. Something you're always revisiting, cleaning up,
| and using as a source of inspiration. If you see some tags
| getting out of hand, don't try to improve your tagging system;
| instead take what those tags are trying to represent and add more
| structured fields and queries for them. This pipeline of less to
| more structure should be constantly playing out in a healthy,
| evolving system.
| photochemsyn wrote:
| One area that's illuminating is the effort to annotate the
| results of whole-genome sequencing projects. Tagging stretches of
| the genome which represent coherent units of some sort, and then
| relating them to some functional capability of the organism, is
| not at all a solved problem.
|
| Here's an overview from 2011 where they're struggling to even get
| a good tagging system up for single-celled microorganisms (a much
| easier problem than multicellar genomes like humans):
|
| https://pubmed.ncbi.nlm.nih.gov/22180819/
|
| > "Highlights include the development of annotation assessment
| tools, community acceptance of protein naming standards,
| comparison of annotation resources to provide consistent
| annotation, and improved tracking of the evidence used to
| generate a particular annotation. The development of a set of
| minimal standards, including the requirement for annotated
| complete prokaryotic genomes to contain a full set of ribosomal
| RNAs, transfer RNAs, and proteins encoding core conserved
| functions, is an historic milestone."
___________________________________________________________________
(page generated 2022-10-18 23:00 UTC)