https://beepb00p.xyz/annotating.html

settings:
show timestamps[ ]
Home Ideas Exobrain Tags Feed Site Me @twitter @github

How to annotate literally everything [see within blog graph]

Comprehensive overview of existing tools, strategies and thoughts on
interacting with your data

Table of Contents

  * 1. Motivation
  * 2. Annotating web
      + Pocket
      + Instapaper
      + Wallabag
      + Hypothes.is
      + Grasp
      + Summary
  * 3. Annotating PDFs
      + Okular, Evince, Atril
      + Emacs: pdf-tools
      + Other Linux readers
      + Emacs: org-noter
      + Xournal
      + Hypothes.is (again)
      + Polar
      + Annotating on Android
      + Summary
  * 4. Annotating E-ink
      + Kindle
      + Kobo
      + Koreader
  * 5. Miscellaneous
      + Annotating paper books
      + Annotating plaintext
      + Annotating videos
      + Other notable mentions
      + Other tools
      + Hall of shame!
  * 6. What makes a good annotation system?
      + Comparison
  * 7. Using annotation data
      + Extracting reading stats
      + Searching in annotations
      + Providing TODO items
      + Spaced repetition
      + Life log
  * 8. --

TLDR: when I read I try to read actively, which for me mainly
involves using various tools to annotate content: highlight and leave
notes as I read. I've programmed data providers that parse them and
provide nice interface to interact with this data from other tools.
My automated scripts use them to render these annotations in human
readable and searchable plaintext and generate TODOs/spaced
repetition items.

In this post I'm gonna elaborate on all of that and give some
motivation, review of these tools (mainly with the focus on open
source thus extendable software) and my vision on how they could work
in an ideal world. I won't try to convince you that my method of
reading and interacting with information is superior for you: it
doesn't have to be, and there are people out there more eloquent than
me who do that. I assume you want this too and wondering about the
practical details.

P1 Motivation

At some point in my life I realized I didn't remember most of the
books/papers/posts/videos I had consumed few years before. For
brevity I'll just refer to all of this as 'content' further on

That bothered me increasingly until I bought a Kindle which had
'highlight' functionality and virtual keyboard; and I had discovered
it to help a lot with recalling.

I've become increasingly obsessed with this and these days ability to
highlight when I read serves multiple purposes for me:

  * the very act of spending conscious effort on highlighting and
    commenting helps to remember better.
  * it's easier to recall the content I already read, I just skim
    through highlights and refresh the memory

    In particular, often I'd run on something on the internet that I
    remember reading before. If I have annotations for that, I can
    quickly go through them and restore the context.

  * it's easier to recommend content to other people because you can
    refer to specific moments or points you liked/disliked
  * it's got social value if highlights are visible to other people
    (e.g. Hypothesis, Medium, Goodreads)
  * it helps with book scoring. If I don't have any highlights, it
    probably means that the content was not interesting at all for
    me. Fiction books are not an exception: I tend to highlight use
    of language I liked, inspirational things, etc.
  * it serves as activity log if you are into #lifelogging.
  * you can populate your TODO list and step up your spaced
    repetition game.

I'm going to review some of the tools I tried using and still using
and highlight their different positive and negative aspects. If
you're getting impatient, you can skip straight to my comparison
table.

P2 Annotating web

PPocket

I won't really write much about it for one reason which is a big
deal: while you can highlight text, you can't leave notes. Nearest
functionality is 'recommending' a highlight while reading a comment,
but that's only displayed on your 'timeline'.

Pocket API doesn't support exporting highlights too, or to be precise
it seems to be hidden. If you need it you can use my script where I
hacked around it.

Also, interesting enough, Kobo reader has got Pocket integration, but
for some reason when you read Pocket articles on Kobo, you can't
highlight at all (let alone syncing highlights with Pocket). Not sure
what's the purpose of this integration.

Pocket was acquired by Mozilla in 2017, which might be a good thing,
but so far their main focus seem to be readability features.

You can also read a rant raising similar issues to what I mentioned.

PInstapaper

I won't go into Instapaper's readability capabilities (e.g. fonts and
article formatting) because it's not something I care much about, so
you might be better off googling that for yourself, here I'll
concentrate on annotating aspect. Here are couple of recent extensive
comparisons of Instapaper and Pocket, which feature screenshots and
other aspects of Instapaper:

  * Read-It-Later App Showdown: Instapaper vs. Pocket; screenshot of
    annotation interface.
  * Instapaper vs. Pocket (2019 Comparison)

So, to read something in Instapaper, first you'll have to import the
article into it (to unclutter and optimize it for reading). Due to
this import process, you can only read and highlight in Instapaper's
app, and you can only see your highlights there as well, which is its
main limitation for me.

The only reason I'm using it at all is that its Android app has got
offline capabilities, so I would export to Instapaper things I want
to read on the tube while I don't have connection and read/comment
while offline.

Mind that free version of Instapaper has got 5 notes per month limit.
Personally I'm happy to pay 3$ per month for premium version of such
a decent product though in absence of good alternatives.

Instapaper got Json API, through which you can access your saved
articles, comments and highlights. I'm using a fork of python wrapper
to access it. Highlights are only stored as text though (as opposed
to CSS/xpath locators), so there is no easy way to match them against
original text apart from some sort of fuzzy search.

Search function works for full text search in saved articles, but
doesn't let you restrict search for highlights, and you can't search
in notes at all.

One red flag was in 2018 when Instapaper wasn't available in Europe
for few months until they resolved GDPR issues. While I don't blame
it on Instapaper, this is a kind of thing that happens when you don't
own your data and use a closed source product.

PWallabag

Wallabag is the most mature open source/selfhosted read-it-later kind
of project I know of. Here's a review featuring some screenshots of
their web app and Android app.

It's very similar to Instapaper in terms of having to import the
article in Wallabag in order to annotate it. I used it for a while
and only had some issues with importing articles heavy on MathJax
backed Latex.

If you don't want to selfhost it, you can use wallabag.it hosting for
as little as 9 euros per year and two weeks of trial.

There is also an Android app, but sadly it lacks support for
highlighting.

I wish it had more attention from the community, and might try to
work on Android annotation when I got more time.

PHypothes.is

Hypothesis is simply awesome and my favorite web annotation tool.
Their killer feature is that it embeds a bit of JS in the page to
provide an in-browser overlay, so you don't have to leave the page
you were reading and can highlight and add comments natively. They
use something cool called fuzzy anchoring to achieve this. That also
makes annotations resilient to document markup changes, and if they
can't locate your annotations it would be still shown in metadata as
'orphaned', so you never lose your notes.

Another cool feature is that you can choose to make your annotations
public and see other people's annotations or create a private group
if you want to share them among specific people only.

To get a sense of it you can skim through tutorial which has plenty
of screenshots, and I also strongly recommend you checking it out in
action here: Annotation Is Now a Web Standard, or try the very page
you're reading now.

You don't have to install anything or register, it's just a widget
embedded in the page, but do make sure to allow JS. You should see
yellow highlights and the sidebar on the right.

It's open source, can be selfhosted and they provide their own
service for free (but please consider donating them!).

Since Hypothesis powered by javascript, it actually works well in
modern Android browsers via bookmarklet. It's somewhat not obvious in
terms of browser UI how to actually use them though:

  * for mobile Firefox, once you added a bookmarklet, to invoke it
    you need to tap on the address bar and click the bookmarklet.
  * for mobile Chrome, it's a bit more tedious but also possible.

One downside of this service is that you won't be able to annotate
while offline. I feel it's actually more of mobile browser's problem
in general rather than Hypothesis though. While you potentially can
annotate offline without querying API and preserving data in
localStorage, if you can't load the page in the first place, it
doesn't matter. Perhaps that can be given better support in browsers.

Hypothesis got JSON API which gives access to your and other people's
public annotations. I'm using judell/Hypothesis python wrapper to
access and back up this data.

PGrasp

Shameless plug! If you just want to send annotations directly into a
plaintext (e.g. org-mode) file and don't really care about displaying
them within the original web page you can use my grasp browser addon
for that.

I typically use it for highlights that would be good candidates for
TODO items, e.g. something actionable like piece of advice or further
reading.

Not available for mobile yet, but perhaps on Android native select
and share capabilities (e.g. into orgzly) makes more sense anyway.

PSummary

Hypothes.is is a clear winner for me on desktop and I'm using
Instapaper for offline reading on Android.

P3 Annotating PDFs

Small disclaimer: I don't own Mac/Windows/iPhone so have very little
idea what's going on in their world. Sorry! You can take a look at a
section I added with other people's suggestions.

PDF format is a complicated beast, and its native annotations are a
whole different story to annotating web.

First, its ISO standard is not freely available. Adobe website has
got some sort of reference which is not the same as standard, but
apparently close enough.

There are quite a few different kinds of PDF annotations, e.g. you
can see them here in section 12.5.6: Annotation Types or in Poppler
source code. In addition to Highlight and Text types there are things
like support for styling, underlines, strikethoughs, and even (heaven
forbid) sounds, movies and 3D.

Using native PDF annotations has one major drawback: you will have to
save the metadata back to the PDF file at some point. It mutates the
PDF, which has all kinds of nasty drawbacks:

  * at worst it's impossible due to DRM protection
  * it changes the file hash, which may break other tools, trigger
    unnecessary cloud syncs, etc.
  * you can't easily tell, which documents have your private notes
    and which don't

I work around it by making a copy of the file I'm about to annotate
first, and giving it [annotated] prefix so I wouldn't confuse it with
the original file.

POkular, Evince, Atril

Probably most widely used PDF readers, all of these use Poppler
library for working with PDF, which in particular does the messy
business of annotation handling.

All of them would let you view existing annotations, but there are
some nuances and limitations:

  * Atril (as of 1.20.3) allows you to add or edit popup notes only,
    other types of annotations aren't even displayed in the sidebar
  * Evince (as of 3.32.0) only allows to add or edit highlights or
    popup notes (no inline!). Here is article with some screenshot
    (not much changed since 2016).

    However it's got a nasty few years old bug (1, 2) that doesn't
    allow you saving over the same file you're editing. That means
    that to work around it every time you want to persist your
    changes, you'd have to save to a new file and reopen the new
    copy. That makes it pretty unusable unless you only want to do
    couple of changes.

  * Okular (as of 1.6.3) allows editing and adding pretty much every
    type of annotation that you would expect: highlights, popup and
    inline notes, freehand and more.

    Annotation process (screenshot) is pretty pleasant, hitting
    Ctrl-S results in saving the file you're working on without any
    problems.

    Okular also got support for something called 'document archive',
    which saves the original document in a zip file along with
    metadata.xml, which allows you to annotate non-PDF files (e.g.
    DJVU), which is a very neat feature. It's obviously Okular
    specific, in theory though it's possible to process metadata.xml
    with other tools.

    Search in okular can't be restricted to annotations only and
    while you can use normal PDF search for inline notes and
    highlights (along with the other text that happened to match), it
    doesn't work at all for popups.

    Even though Okular is part of KDE, there is no reason not to use
    it in other desktops environments, it's not that complicated in
    terms of UI; looks quite native in GTK, and few extra
    dependencies are barely a problem these days.

PEmacs: pdf-tools

Pdf-tools (as of 0.90) is a PDF viewer for Emacs that meant to be
more efficient than the builtin one (in terms of rendering), but is
also capable of interacting with PDF metadata.

Here's a screenshot and a short screencast, interesting stuff starts
somewhere around 02:00 mark.

One big drawback is that to highlight and add new annotations you
still have to use mouse, which loses half of the benefits of using
Emacs for me. Also it's got some minor issue displaying inline
annotations text in the 'Content' buffer and annotations list (you
can edit it if you click on it with your mouse though).

POther Linux readers

There are few other apps I tried using so figured it's worth
mentioning.

  * mupdf (as of 1.14.0) is both rendering library (claimed to be
    faster than poppler) and PDF viewers. It's capable of displaying
    all types of highlights and annotations, but there is no way to
    add or edit them.

    It mentions annotation editing in changelog, but in something
    called 'mupdf-gl', and it doesn't seem to be available in Ubuntu.

  * zathura (as of 0.4.3) is capable of both poppler and mupdf
    backends, but suffers from the same problem that you can't edit
    and add new highlights. It's pretty sad, because I like it as a
    viewer: it's minimalist and capable of VI style keybindings.

PEmacs: org-noter

Org-noter (as of 1.3.0) allows you to annotate a PDF while keeping
the text annotations in a separate org file which keep track of PDF
locations in Org note properties. Here's a short demo.

For me the main drawback is that it doesn't let you highlight, which
I tend to do a lot.

Existing annotations in PDF can be imported via
org-noter-create-skeleton function (it didn't work for me for some
reason though, and I wasn't motivated enough to investigate).

PXournal

Xournal is different from the above PDF viewers, since it isn't using
types of annotations described in the PDF standard and instead uses
its own tools.

It doesn't modify the original files and instead keeps .xoj file
containing the metadata and pointing at the original PDF, so in that
sense it's pretty similar to okular. Similarly, it's xournal-specific
and can't be viewed anywhere else unless you export it in PDF before
sharing (at which point your annotations wold basically become
background images).

PHypothes.is (again)

Already mentioned in the previous section, it's also capable of
annotating PDFs via pdf.js.

Check out their guide, especially if you're using Chrome, apart from
it it's as easy as opening the PDF in your browser and activating
Hypothesis. It fingerprints the PDFs so you don't have to worry about
losing your annotations and it's easy to collaborate with other
people.

It seems to work fast enough for big PDF books as well, however
generally reading long things in browser is not very convenient as
you lose reading position if you close the tab.

PPolar

Polar is a new project which aims to be not just reader, but
'personal knowledge repository'.

  * supports highlights and comments
  * document repository, so you get an overview of all the stuff you
    ever read/commented. It also keeps track of your reading
    position.
  * the PDFs are fingerprinted, so you don't need to worry about
    moving them around your filesystem
  * ~.polar directory holds all the data, which makes it easy to
    share among your computers (e.g. via git, or if you keep it on
    Dropbox and symlink)
  * metadata is in well structured json files, which makes it easy to
    access from scripts
  * highlight locators keep matched text alongside the absolute
    coordinates, which leaves potential for matching against
    different editions of the PDF file
  * it's got builtin flashcards engine. Personally, I'm too used to
    org-drill now, but that's a great a way of introducing spaced
    repetition to people.
  * the author is very passionate about this project, invests a lot
    of effort and quite ambitious

If you like it, please consider donating them!

The only downside is that annotation format is Polar specific, so
it'd be hard to share with other people unless they are willing to
use Polar as well.

PAnnotating on Android

Similar disclaimer: I've never had Iphone, so have no idea what
people use. If anyone sends me a link to a decent overview, I'd be
happy to include it!

  * Adobe Reader

    Supports most reasonable ways of annotation: highlights, popup/
    inline comments, strikethough, styling, etc. (screenshot).
    "Comment List" gives overview of your document: screenshot.

    It offers Adobe Cloud and Dropbox integration, but I rely on
    Syncthing for syncing my stuff anyway.

  * Xodo

    Basically supports same things that Adobe does.

    For me, Xodo wins by a very thin margin because its interface
    tends to be a bit more denser and 'material': interface,
    annotations list. Otherwise, it's virtually no different from
    Adobe Reader.

  * mupdf

    The F-droid description claims it supports annotation, but it
    couldn't display any of the existing ones in my pdf files. What's
    more, the app wasn't responsive on any long taps or my attempts
    to select text, let alone highlight or comment.

    Perhaps PDF 1.7 is too outdated? Something weird has been going
    on with the 'full' version, maybe this is somehow related (1, 2).

  * Pen&Pdf: I tried this one since it was open source and claimed to
    support annotation, but it didn't even manage to pick up any of
    the existing ones.

PSummary

If you want the convenience of editing and viewing on phone and
working with other people, Okular wins on desktop and Adobe Reader/
Xodo could be used on your phone.

If you care about preserving the original PDF files and want
convenience in accessing the annotations programmatically, Polar is
the best.

P4 Annotating E-ink

Two e-ink readers that support highlights and notes I know of are
Kindle (I used to own Paperwhite 2) and Kobo (I own Kobo Aura One).
Highlighting works as you would expect on E-ink touchscreen (long
press and dragging the selection); and you can leave notes by typing
on a virtual keyboard (somewhat laggy, but ok for up to few
sentences). Perhaps the only differences are how you can search and
access the annotations.

PKindle

Kindle stores bookmarks, notes and highlights in My Clippings.txt on
the device. The good thing about the format is that it's already
plaintext and fairly human readable, so you might be happy with that
alone. The format is a bit nasty for parsing (as you would expect
from something with .txt extension). Dates are locale dependent,
document locators may or may not have roman numerals, separators are
inconsistent at times, etc. When I was using Kindle I was just
copying the file from time to time, and you can set up some sort of
automatic copying when your device is connected similarly to what I'm
doing with Kobo.

Kindle uploads your notes and highlights to Kindle Cloud Reader (
screenshot, screenshot) , but it only works for stuff bought on
Kindle store. Reportedly people also have issues displaying their
highlights on Cloud Reader due to copyright restrictions.

Kindle also integrates with Goodreads, which synchronizes reading
progress and lets you selectively share annotations to Goodreads. But
that's also restricted to books bought from Amazon.

Search function is somewhat limited: you can search in the book and
it displays your highlights alongside content it found in the book,
but you can't restrict search to highlights. You can't search in
notes either. Funny enough though, the My Clipping.txt file can be
opened on Kindle itself (as any other txt file), and then you can
search in it. It's not super convenient, but better than nothing. (I
wasn't brave enough to try and see what happens if you try to
highlight in this file.)

PKobo

Stores all of it's stuff in .kobo/KoboReader.sqlite on the device.

The database has got lots of cool stuff: in addition to highlights
and notes you can also access reading progress, time spend reading
and possibly some other interesting data I didn't manage to reverse
engineer yet. You can check out kobuddy, which is my attempt to
extract useful data from the database and provide nicer high level
Python interface. It's also fairly straightforward to open it in
sqlitebrowser and play with your own queries.

Kobo doesn't seem to support cloud sync for annotations. I was
considering syncing the database wirelessly, as there are some SSH
modules for its firmware, but people report it may break wifi on it.

I'm using kobuddy as well to work around it.

There is an official Android app which lets you manage and annotate
books from Kobo store and seems to be syncing progress between eink
and phone. However annotations don't sync between Kobo and phone for
me, and other people also report same experience: 1, 2, 3. Some claim
it works on iphones though.

Kobo lets you conveniently search over all of your highlights and
notes.

PKoreader

Koreader is an alternative open source software for Kindle, Kobo and
other E-ink devices.

It's got some very cool features, in particular most common document
formats, dictionary and Wikipedia lookups, and various plugins.

It also supports highlighting, but (as of v2019.06), note taking is
unsupported yet, but some progress is going on. I'd be keen to try it
once it's implemented!

P5 Miscellaneous

PAnnotating paper books

So far, for me the only downside of using nice tools for annotating
digital content is that it ruined the experience of reading paper
books for me.

Usually I don't own the books I read, so using a highlighter or
pencil would be just mean to the owner. Even if you own the book and
okay with that, it's still not searchable and not easily accessible,
which feels very wrong to me.

To get around this I've tried few of tricks:

  * Take pictures of bits I'm interested in, perhaps highlight using
    image editor on the phone
  * Sticky notes are ok for commenting as long as you don't damage
    the book with the glue, but they down help with highlighting
  * Using paper strips as an annotation overlay.

    This one I'm particularly proud of coming up to as I haven't
    found anyone else doing that, and I rarely come up with useful
    meatspace things.

    This is how it looks in action: photo.

    Basically, before reading, I prepare a bunch of paper strips
    slightly longer than the page height, kinda like bookmarks. You
    will use it as a 'sidebar overlay' for writing notes and
    highlighting, so the width depends on your handwriting and how
    much you're expected to do that, I usually use something like 1/4
    of the page.

    If you want to annotate the page, you'll align strip's bottom to
    the bottom of the page and mark lines you found interesting on
    this strip and write comments on it as well. You can also use the
    other side of the strip to annotate the other page.

    The downside of this is that in order to annotations to make
    sense, it requires a physical copy of exact same book. Another
    one is that it doesn't have automatic timestamps, which somewhat
    bothers my #lifelogging OCD. You can get around it by writing
    down time as well, but that's quite distracting.

When I'm done with a book, I'd spend a bit of time digitizing
annotations and manually typing them into plaintext. Luckily, I don't
have to do that often.

PAnnotating plaintext

Often, I want to leave a quick comment to an org-mode item. I've got
a handy Emacs binding which appends a child note with a timestamp and
enters edit mode, so the whole process is smooth. If you're not using
org-mode you still can benefit from something similar, most of modern
text editors allow you binding snippets on hotkeys.

One big drawback with Org mode (and I believe most of outline/task
list formats) though is that if you insert child outline items in the
middle of text, it would structurally break it in two parts, so you'd
have to append your commend to the end of current outline (which can
be potentially very long). On the other hand, plain list items, which
you can insert in arbitrary place, are very limited and don't support
most of things outline support like tags, timestamps, priorities etc.

PAnnotating videos

Often when I watch lectures or some talks on Youtube or in VLC, I
want to leave a bookmark or write a note with a reference to a
specific timestamp. This is pretty much not possible apart from
opening your text editor and manually recording the position in
video. All the video annotation software I know of is more oriented
towards video editing/effects etc.

So, if I'm watching something in browser, I normally end up using
grasp and manually type the timestamp.

This is distracting, but even worse is that whatever you use have no
means of quickly jumping to the timestamp you recorded; you'd have to
move the slider to it manually.

There is no common standard that I know of for jumping at a certain
timestamp neither in web nor in desktop applications (e.g. via mime
handler).

I'd say this is somewhat unsolved problem, which is surprising since
presumably it could be helpful for lots of students.

POther notable mentions

Due to the lack of common standard for annotated content, some
services try to implement their own:

  * Medium. Highlights and annotation also serve a social function:
    when you read a Medium post you can see whether a certain bit of
    text was highlighted by other people (and how often).

    They don't tamper with browser selection, so you can still use
    external annotation tools like Hypothesis. However, judging by
    their API, there is no way to access your highlights. Anyway, I
    would encourage people not to use these especially if you only
    care about personal use, after all Medium is not the only source
    of information out there.

  * Genius. Annotated lyrics on genius.com looks neat, however I'm
    not sure what are its benefits over Hypothesis.

    Also, while I was looking up about Genius, I've stumbled upon an
    unusual opinion opposing web annotation: how to block Genius
    annotations:

        Genius was functionally equivalent to forcing crude, violent,
        or hateful user comments onto a web site she created as a
        safe space to write about the sensitive work she does

    I can't agree with this, but I think it deserves to be mentioned.

POther tools

These are tools suggested by other people, I haven't had time to try
them all properly yet, but will list nevertheless. If someone sends
me links to other people's experience with these, I'd be happy to add
them!

  * org-emms: org-mode link handler to start playback at certain
    timestamp, could potentially be useful for video annotation:

    [[emms:/path/to/audio.mp3::1:10:45]]  Starts playback at 1 hr 10 min 45 sec.

  * Incremental video in SuperMemo: could be useful for video
    annotation. Suggested by Ypo.
  * Readwise: paid (30 day trial) web tool that allows you to import
    and sync your highlights from Kindle/Instapaper/Pocket/Medium/etc
    so you could review them later via daily emails or web app.
    They've got a nice blog where they are describing how to use
    spaced repetition and read actively.

    Suggested by few people, in particular Daniel, one of the
    cofounders.

    UPD from [2019-12-21]: I gave it a try and it's actually pretty
    neat. They send you an email daily so you can practice spaced
    repetition with a single link click. Personally I'm too invested
    in all of my own infrastructure, but if you want to try out
    spaced repetition without extra hassle, I highly recommend
    checking it out!

  * Skim: open source PDF reader and note-taker for OS X. Uses xattrs
    and a separate .skim file to keep annotations.

    Recommendation from a follower.

  * Kontxt: paid, looks very similar to Hypothesis, also offers some
    sort of CMS. I couldn't find anything on it on Reddit/Hackernews,
    not sure if it's still under development.
  * Histre: also looks similar to Hypothesis in terms of web
    annotation, also enhances browser history. Here's HN discussion.
  * Hook productivity: paid MacOS tool that allows to interlink
    content between different apps.

    Suggested by Luc P. Beaudoin.

  * Worldbrain Memex: free, offline first tools that works similarly
    to Hypothesis as well and provides some browser history
    enhancements. I really need to looks closer into that one!

    Recommended by Jay.

  * VideoAnt: looks like video annotation tool. Suggested by philyg.
  * Microsoft OneNote: can be used for PDF annotation, but it seems
    that you have to import PDF in OneNote first, so it'll become
    locked in the app. However that allows extra features like using
    pen to annotate on top of PDF. In that regard it's similar to
    Xournal.

    Suggested by daok.

PHall of shame!

These are services that wouldn't let you select text. Not sure why
that happens: could be some sort of copyright restriction, being
assholes, or just accidental pointless restriction.

  * Facebook: Android app and mobile site prevent text selection.
  * Blinkist: Android app and website.

    You can't use native text selection as Blinkist forces some
    custom JS for highlighting. But their highlights suck: you can't
    leave a comment and also they actively prevent you from selecting
    text forcing to use their own JS thing.

    In addition you can't even export your highlights, the best you
    can do seems to be syncing with Evernote, and perhaps then you
    can use Evernote API. I didn't bother with it.

    UPD (20190818) I actually managed to dump my highlight data
    before canceling Blinkist subscription by using (apparently
    private) API, here's the script.

P6 What makes a good annotation system?

In my quest for the perfect annotation engine I've figured certain
aspects that make or would make an annotation tool pleasant to use.

  * Uniform

    Highlighting a piece of content and leaving a comment are fairly
    straightforward operations, and you shouldn't have to think much
    about how exactly you do it and which program you use. Most
    current annotation engines are also somewhat tedious to interact
    with, add more content in existing annotations, link, etc.

    Solving this requires the tools being cross platform and cross
    format.

    Hypothes.is is the move in the right direction, but there are
    plenty of other things starting from unsupported formats and
    working offline to paper books which are missed out.

    While current sad state of different tools/products for different
    forms of content is understandable, ideally it should be be
    format agnostic with some proper way of fingerprinting content.
    If humans can tell whether a novel published online as HTML and a
    paper novel are the same thing, so can software.

    Common standard (e.g. Web Annotation Data Model) is a good start,
    but even this one is pretty unknown and not widely accepted.

    Perhaps in the near future we could exploit existing (fairly
    robust) OCR technologies and augmented reality to develop a
    universal annotation tool, but so far that's a whole different
    ballpark.

  * Ease of interaction

    Annotating is meant to augment your limited memory capabilities
    and using them should be as easy as retrieving information from
    your brain.

    While brain-computer interfaces are not quite there, even with
    existing technologies you can achieve that with as little as few
    seconds lag just by using plaintext representations, indexing and
    incremental search.

    Personally, I'm solving this problem via orger.

  * Separate metadata

    Annotations layer should be loosely coupled to the underlying
    content. If it's not the case, it makes you too dependent on the
    specific tools, makes harder to keep track of your private data
    and to share data with other tools.

    For physical sources of information it matters even more;
    although they might decline completely in few decades, who knows.

    Good examples of this approach are Polar and Hypothesis which
    keep the metadata in a well defined format with locators.

  * Data ownership and resilience

    If annotations make essential part of your knowledge, you want to
    be capable of accessing them anytime.

    Ideally everything should work while fully offline without
    relying on any services.

    Currently it's not always feasible due to technical complications
    (e.g. having to selfhost), but this is a good value to pursue.

  * Social and collaborative

    Annotations are a valuable tool for collaborative learning and
    research, and improving tools can make these activities more
    pleasant.

    Blog comments seem to be somewhat in decline which is
    understandable since it's too annoying to register here and
    there. On the other hand, platforms like Facebook comments or
    Disqus are not very privacy friendly, don't give access to data
    stored (e.g. if Disqus disappears tomorrow so do comments in your
    blog), and are not very friendly towards people who do want to
    comment anonymously.

    Perhaps in some near future we could ditch all the internet
    commenting platforms and rely on annotation layer instead.
    Hypothesis basically lets you do that already, perhaps with a
    little work on design (sidebar is not necessarily convenient for
    social commenting), it could serve that purpose.

    I also consider comments people write as projections of their
    minds and it would be great to give other people easier access to
    that to get to know each other better.

    It's hardly worth mentioning that one should be in control
    whether highlights they are making are private or everyone else
    can see them.

  * Open source: not sure if that even needs justifying :)

    People have somewhat different requirements for their cognitive
    tools and it should be possible to can hack them and fix annoying
    bugs. That also gives way more potential for integrating them
    with other services.

PComparison

I'm only listing tools that support proper highlighting and
commenting.

+------------------------------------------------------------------------------------------+
|            |   mobile    | fingerprinting |  search in  | separate | sharing  |   open   |
|            | annotations |                | annotations | metadata |          |  source  |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Instapaper | Y, offline  | n/a            | N           | N        | N        | N        |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Wallabag   | N           | n/a            | N           | N        | N        | Y        |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Hypothesis | Y           | Y              | Y           | Y        | Y, web   | Y        |
|            |             |                |             |          | API      |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Copy-paste | Y, offline  | N (manual)     | Y           | Y        | Y, file  | Y        |
|            |             |                |             |          | sync     |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Okular     | n/a         | N              | limited     | limited  | Y, file  | Y        |
|            |             |                |             |          | sync     |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Emacs      | n/a         | N              | N           | N        | Y, file  | Y        |
| pdf-tools  |             |                |             |          | sync     |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Emacs      | N           | N              | Y           | Y        | Y, file  | Y        |
| org-noter  |             |                |             |          | sync     |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Hypothesis | N           | Y              | Y           | Y        | Y, web   | Y        |
| (PDF)      |             |                |             |          | API      |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Xournal    | N           | N              | N           | Y        | Y, file  | Y        |
|            |             |                |             |          | sync     |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
|            | N, on       |                |             |          | Y, file  |          |
| Polar (v1) | roadmap     | Y              | N           | Y        | sync,    | Y        |
|            |             |                |             |          | cloud    |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Xodo/Adobe |             |                |             |          | Y, file  |          |
| Reader     | Y, offline  | N              | N           | N        | sync,    | N        |
|            |             |                |             |          | cloud    |          |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Kindle     | N           | N              | limited     | Y        | limited  | N, but   |
|            |             |                |             |          |          | koreader |
|------------+-------------+----------------+-------------+----------+----------+----------|
| Kobo       | N, broken   | N              | Y           | Y        | N, but   | N, but   |
|            |             |                |             |          | possible | koreader |
+------------------------------------------------------------------------------------------+

P7 Using annotation data

Considering there are multiple tools I have to use none of which is
fully capable of doing everything I would ideally want from
annotation system, I've developed my ways of getting closer to my
ideal. For that I've got some infrastructure set up.

Backups: I've already mentioned script I'm using to back up Kobo
database, for cloud services I'm running bunch of daily cron jobs
that query APIs for data. Most of the job scripts are fairly ad-hoc
and just a matter of GET query with properly set oauth token so
perhaps not worth sharing, but let me know if you want something
specific. These files are always synced across all of my devices,
including phone, so I always have access to them.

That serves not just as data backup, but also as data providers for
my tools. I only interact with these daily snapshots on filesystem
rather than directly with API. That helps to avoid dealing with rate
limiting, flakiness in network connection or API itself, and makes it
way faster to iterate and develop. The only downside is that the data
is not necessarily up to date, but perhaps you can dump data more
often to get around this; I would still highly recommend you to
prefer that to interacting with API directly.

I'm using a special Python package to access the data, which I called
my. It's always in my PYTHONPATH so I can use it from any script/tool
or REPL. It's got bunch of different submodules, e.g. my.instapaper,
my.kobo, my.polar (and there are other modules as well).

I'm in progress of cleaning it up and documenting, you can read a
draft here.

PExtracting reading stats

As a specific example how I do it: recently my friend asked me if I
could recommend them posts I found interesting on Slate Star Codex.
With a tiny python script I was quickly able to give them some stats
on posts I read, so they could choose among them.

import my.hypothesis
from collections import Counter
SSC = (p for p in my.hypothesis.get_pages() if 'slatestarcodex' in p.url)
return Counter({p.url: len(p.highlights) for p in SSC}).most_common(10)

+----------------------------------------------------------------------------+
| http://slatestarcodex.com/2013/10/20/the-anti-reactionary-faq/        | 32 |
|-----------------------------------------------------------------------+----|
| https://slatestarcodex.com/2013/03/03/                                | 17 |
| reactionary-philosophy-in-an-enormous-planet-sized-nutshell/          |    |
|-----------------------------------------------------------------------+----|
| http://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage/          | 16 |
|-----------------------------------------------------------------------+----|
| https://slatestarcodex.com/2014/03/17/                                |    |
| what-universal-human-experiences-are-you-missing-without-realizing-it | 16 |
| /                                                                     |    |
|-----------------------------------------------------------------------+----|
| http://slatestarcodex.com/2014/07/30/meditations-on-moloch/           | 12 |
|-----------------------------------------------------------------------+----|
| http://slatestarcodex.com/2015/04/21/                                 | 11 |
| universal-love-said-the-cactus-person/                                |    |
|-----------------------------------------------------------------------+----|
| http://slatestarcodex.com/2015/01/01/untitled/                        | 11 |
|-----------------------------------------------------------------------+----|
| https://slatestarcodex.com/2017/02/09/considerations-on-cost-disease/ | 10 |
|-----------------------------------------------------------------------+----|
| http://slatestarcodex.com/2013/04/25/                                 | 9  |
| in-defense-of-psych-treatment-for-attempted-suicide/                  |    |
|-----------------------------------------------------------------------+----|
| https://slatestarcodex.com/2014/09/30/                                | 9  |
| i-can-tolerate-anything-except-the-outgroup/                          |    |
+----------------------------------------------------------------------------+

PSearching in annotations

I've got bunch of scripts and a rendering tool which I named orger
(yep, haven't invested that much thought into naming). Basically,
these scripts take specific data source as input and produce org-mode
output, e.g. renders json backed up from Instapaper into
instapaper.org file. That runs every few hours and keeps the contents
relatively up to date.

I chose org-mode as I was already used to its features, keybindings
and metadata. Also the hierarchy (e.g. book - highlight - comments)
fits naturally into outline format. But not that it's a real
necessity, I feel that as long as it's searchable plaintext, it's
good enough.

To search them, I've got a global keybinding, which invokes Emacs
with incremental search prompt against the directory with rendered
org files, which lets me interact with them in a blink on my
computer. On Android I'm using DocSearch indexer (sadly it's not
incremental, and app is not open source, so I'm looking for
alternative). I describe this extensively here.

Finally, I've got a Recoll indexer instance + web interface running
on my VPS; so if necessary I can access and search annotations via
the internet.

PProviding TODO items

While reading, I often encounter something I want to google or check
or read about later; or just come up to something actionable inspired
by what I'm reading. But I also don't want to interrupt from reading
and losing context: that especially matters while reading on a E-ink
device: distracting from the book, fetching your phone etc is really
annoying.

So, as a workaround, I have programmed rules that pick out notes that
start with "TODO" or marked with "TODO" tags; etc, and they are
automatically added to my agenda. Later, when I see it on agenda, I'd
assign it a priority and reschedule/unschedule depending on
importance.

Here's an example of me using using my.instapaper module for that.

I'm writing about it in more detail in post about Orger.

PSpaced repetition

It's kind of an extension of the previous use case: again, often you
want to send something straight into your spaced repetition queue
without having to remember to add that.

I've got two rules for that:

  * if something is annotated with a certain marker ('drill' for me,
    comes from org-drill package name)
  * if it's only got one word highlighted, which is useful for
    memorizing foreign words

Here's how I'm using it for Kobo highlights.

PLife log

I'm a big fan of #lifelogging and all the timestamped highlights,
comments and reading progress from Kobo are an effortless (no manual
logging!) contribution to my personal timeline, which I render and
sync on my devices every few hours.

I sometimes use it when conversation with other people comes to
awkward silence, so I can recall something I was reading recently and
spark off an interesting (well at least for me) topic.

P8 --

Some extra links:

  * Indieweb page on annotation, in particular with examples of silos

I'd be interested to know what do you think, and how are you managing
your annotations or if you need some help with your existing
workflow. Please also let me know if I missed any tools or features!

Pupdates

  * 2019-12-02: added section with suggestions by other people.

#pkm #annotation #sr #tools
05 July 2019
Discussion:

  * hackernews
  * /r/opensource
  * /r/gwern

 me @twitter  me @github CC BY 4.0