[HN Gopher] Ghostwriter - use the reMarkable2 as an interface to...
___________________________________________________________________
Ghostwriter - use the reMarkable2 as an interface to vision-LLMs
Author : wonger_
Score : 182 points
Date : 2025-02-08 03:02 UTC (19 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| rpicard wrote:
| This is so cool. I'm going to try it this weekend.
|
| I've been playing with the idea of auto creating tasks when I
| write todos by emailing the PDF and sending it to an LLM.
|
| This just opened up a whole realm of better ways to accomplish
| that goal in realtime.
| r2_pilot wrote:
| This works pretty well when I did a proof of concept with
| Claude and rMPP a couple of months ago. It even handles
| scheduling fuzzy times ("I want to do this sometime but I don't
| have any real time I want to do it, pick a time that doesn't
| conflict with my actually scheduled tasks"). All with minimal
| prompting. I just didn't have a decent workflow and did exactly
| what you considered, emailed the pdf. I should probably revisit
| this but I haven't had the inclination since I just ignored the
| tasks anyway lol
| rpicard wrote:
| Ha, automating the doing of the task is the next step.
| awwaiid wrote:
| Let me know if you need any help, I think only one other person
| has tried to get this working. I'm over on the reMarkable
| discord server, https://discord.gg/u3P9sDW (linked from
| https://github.com/reHackable/awesome-reMarkable)
|
| Rust binary so should be easy to install. In theory :)
| rpicard wrote:
| Will do! My wife and I love Harry Potter so I'm motivated to
| show her my investment in the tablet actually got me Tom
| Riddle's diary.
|
| I don't use discord much but I'll find you somewhere around
| here!
| awwaiid wrote:
| I'm on at awwaiid@gmail.com and probably other places :)
|
| "proof" to partner of tablet investment value based on
| interactive fiction conversation == excellent strategy and
| nothing could go wrong
| owulveryck wrote:
| Awesome.
|
| I wanted to try to implement this for months. You did a really
| good job.
| owulveryck wrote:
| At some point I wanted to turn goMarkableStream into a MCP
| server (model context protocol). I could get the screen, but
| without "hack" I couldn't write the response back.
| awwaiid wrote:
| The trick here is to inject events as if they came from the
| user. The virtual-keyboard works really reliably, you can see
| it over at https://github.com/awwaiid/ghostwriter/blob/main/s
| rc/keyboar... . It is the equivalent of plugging in the
| reMarkable type-folio.
|
| Main limitation is that the reMarkable drawing app is very
| very minimal, it doesn't let you place text in arbitrary
| screen locations and is instead sort of a weird overlay text
| area spanning the entire screen.
| awwaiid wrote:
| Thank you! Still a WIP, but a very fun learning / inspiration
| project. Got a bit of Rust jammed in there, bit of device
| constraint dancing, bit of multiple LLM api normalization, bit
| of spacial vision LLM education, etc.
| tony_francis wrote:
| Harry potter half-blood prince vibes. Interesting just how much
| the medium changes the feeling of interacting with a chat model
| s2l wrote:
| Now only if llm response font is some handwritten style.
| satvikpendem wrote:
| That's definitely pretty easy to achieve, just change the
| font settings to use a particular handwritten style font [0].
|
| [0] https://fonts.google.com/?categoryFilters=Calligraphy:%2F
| Scr...
| memorydial wrote:
| That would be next-level immersion! You could probably
| achieve this by rendering the LLM's response using a
| handwritten font--maybe even train a model on your own
| handwriting to make it feel truly personal.
| dharma1 wrote:
| Script fonts don't really look like handwriting - too
| regular.
|
| But one of the early deep learning papers from Alex Graves
| does this really well with LSTMs -
| https://arxiv.org/abs/1308.0850
|
| Implementation - https://www.calligrapher.ai/
| awwaiid wrote:
| ooo -- thanks for the link!
| memorydial wrote:
| Actually if you figure that out please post it here!! I'd
| love to see that!
| wdb wrote:
| Like Apple Notes's Smart Script?
| awwaiid wrote:
| This uses LLM Tools to pick between outputting an SVG or
| plugging in a virtual keyboard to type. The keyboard is much
| more reliable, and that's what you see in the screenshot.
|
| If nothing else it could use an SVG font that has
| handwriting; you'd need to bundle that for rendering via
| reSVG or use some other technique.
|
| But if I ever make a pen-backend to reSVG then it would be
| even cooler, you would be able to see it trace out the
| letters.
| memorydial wrote:
| Exactly! There's something about handwriting that makes it feel
| more personal--like scribbling notes in the margins of a
| spellbook. The shift from typing to pen input definitely
| changes the vibe of interacting with AI.
| GeoAtreides wrote:
| erm, you mean harry potter tom riddle's horcrux diary, sure
|
| you know, the diary that wrote back to you and possessed your
| soul? that cursed diary?
| guax wrote:
| I wonder if its better than the current version where my soul
| gets possessed by youtube shorts for 40 minutes.
| hexomancer wrote:
| That's beside the point but you are probably referring to harry
| potter and the chamber of secrets not the half-blood prince.
| cancelself wrote:
| @apple.com add to iPadOS Notes?
| newman314 wrote:
| I wonder if this can be abstracted to accept interaction from a
| Daylight too.
| complex1314 wrote:
| Really cool. Would this run on the remarkable paper pro too?
| awwaiid wrote:
| Buy me one and I'll find out! hahahaha
|
| But also -- the main thing that might be different is the
| screenshot algorithm. I'm over on the reMarkable discord; if
| you want to take up a bit of Rust and give it a go then I'd be
| happy to (slowly/async) help!
| complex1314 wrote:
| :) Thanks! Been looking into learning rust recently, so will
| keep that in mind if I get it off the ground.
| awwaiid wrote:
| Initially most of the Rust was written by copilot or
| Sourcegraph's Cody; then I learn more and more rust as I
| disagree with the code-helper's taste and organization.
| Though I have a solid foundation in other programming
| languages which accelerates the process ... it's still a
| weird way to learn a language that I'm getting used to and
| kinda like.
|
| That said, I based the memory capture on
| https://github.com/cloudsftp/reSnap/tree/latest which is a
| shell script that slurps out of process space device files.
| If you can find something like that which works on the rPP
| then I can blindly slap it in there and we can see what
| happens!
| 0xferruccio wrote:
| This is so cool! I love to see people hacking together apps for
| the reMarkable tablet
|
| I made a little app for reMarkable too and I shared it here some
| time back: https://digest.ferrucc.io/
| memorydial wrote:
| That's awesome! Love seeing the reMarkable get more
| functionality through creative hacks. Just checked out your app
| --what was the biggest challenge you faced while developing for
| the reMarkable?
| 0xferruccio wrote:
| I think the thing I really didn't like was the lack of an
| OAuth like flow with fine-grained permissions
|
| Basically authentication with devices is "all-access" or "no-
| access". I would've liked it if a "write-only" or "add-only"
| api permission scope existed
| pieterhg wrote:
| Blocked for AI reply @dang
| defrost wrote:
| Good catch, the last few pages of comment history are
| inhumanly insincere.
|
| https://news.ycombinator.com/threads?id=memorydial
|
| " @dang " isn't a thing, he doesn't watch for it - take
| credit and email him direct.
| kordlessagain wrote:
| Do you have proof this is true?
| awwaiid wrote:
| I might be biased because memorydial was complimentary to
| me ... but they SEEM like a human! Also I'm not all that
| opposed to robot participation in the scheme of things.
| Especially if they are nice to me or give good ideas :)
| loxias wrote:
| Most people don't correctly use an em-dash differently
| than a hyphen. That jumps out to me. :)
| defrost wrote:
| He has commented on this.
|
| Retrieval is tricky as Algolia doesn't index '@' symbols:
|
| https://hn.algolia.com/?query=%40dang%20by%3Adang&sort=by
| Dat...
| Ensign35 wrote:
| It's so great seeing these, always make me want to play with
| developing apps for the Remarkable 2. Do you have any sources
| you can recommend? Thank you!
|
| edit: found the official developer website
| https://developer.remarkable.com/documentation
| 0xferruccio wrote:
| IMO the easiest way to play around is to use the reverse
| engineered APIs
|
| https://github.com/erikbrinkman/rmapi-js
| Ensign35 wrote:
| Much appreciated :+1:
| awwaiid wrote:
| https://github.com/reHackable/awesome-reMarkable is a great
| resource to get other resources, including getting onto the
| discord if you want some interactive conversations.
| t0bia_s wrote:
| How about this on android driven Onyx Boox ereaders? Would it be
| possible?
| awwaiid wrote:
| The limitations for the reMarkable made it so that I took a
| screenshot and then injected input events to interact with the
| proprietary drawing app. Cross-app screenshots with the right
| permission are probably possible on Android, I'm not sure about
| injecting the drawing events.
|
| The other way to go would be to make a specific app. I just
| picked up an Apple Pencil and am thinking of porting the
| concepts to a web app which so far works surprisingly well ...
| but for a real solution it'd be better for this Agent to
| interact with existing apps.
| memorydial wrote:
| This is a brilliant use case--handwriting input combined with
| LLMs makes for a much more natural workflow. I wonder how well it
| handles messy handwriting and if fine-tuning on personal notes
| would improve recognition over time.
| r2_pilot wrote:
| I did this a few months ago with the Remarkable Paper Pro and
| Claude. It worked quite well, my handwriting is pretty
| terrible, and I even had a clunky workflow where I could just
| write down stuff I wanted to do, and roughly(or specifically)
| when I wanted to do it, and it was able to generate an ical I
| could load into my calendar.
| awwaiid wrote:
| Generally if I can read my handwriting then it can! It has no
| issues with that. Really the problem is more in spacial
| awareness -- it can't reliably draw an X in a box, let alone
| play tic-tac-toe or dots-and-boxes.
| chrismorgan wrote:
| > _Things that worked at least once:_
|
| I like it.
| awwaiid wrote:
| Top quality modern AI Eval!!!
| 8bithero wrote:
| Not to distract from the project but if anyone is interested in
| eink tablets with LLMs, the ViWoods tablet might be of interest
| to you.
| Ensign35 wrote:
| Is this a Remarkable rebrand? Even the UI looks the same!
|
| edit: https://viwoods.com/ (based in Hong Kong)
|
| edit 2:
|
| It's a blatant copy of the Remarkable 2 for sure :/ LLM
| integration is interesting --> Remarkable are you listening?
| seethedeaduu wrote:
| Kinda unrelated but should I go for kobo or the remarkable? I
| mostly want to read papers and maybe take notes. How do tthey
| compare in terms of hackability and freedom?
| vessenes wrote:
| Love this! There are some vector diffusion models out there; why
| not use tool calling to outsource to one of those if the model
| decides to draw something? Then it could specify coordinate range
| and the prompt.
| awwaiid wrote:
| Two reasons. One, because I haven't gotten to it yet. Two... er
| no just the one reason! Do you have a particular one, ideally
| with a hosted API, that you recommend?
| 3abiton wrote:
| I own a boox tablet (full fledge Android tablet with eink
| screen), and this sort of things would be perfect for it. I
| wonder if in 5 years the mobile hw would support something like
| that locally!
| xtiansimon wrote:
| For PDF paper readers, is the Remarkable's 11" size sufficient? I
| have the Sony DPT 2nd version at 13", and it's perfect viewing
| experience. But projects like this keep drawing me to the
| Remarkable product.
| pilotneko wrote:
| I have used the Remarkable 2 for papers, but it is slightly too
| small to read text comfortably. I'm also an active reader, so I
| miss the color highlighting. Annotations are excellent. For
| now, I'm sticking to reviewing papers in the Zotero application
| on my iPad.
| kordlessagain wrote:
| It's barely usable for PDFs
| freedomben wrote:
| Depends mostly on the font size in the PDF. For dense PDFs I
| agree, it's barely usable. For most PDFs though I'd call it
| "acceptable." If you have control over the font size (such as
| when you're converting some source material to PDF) you can
| make it an excellent reading experience IMHO.
| abawany wrote:
| I got the reMarkable Pro tablet recently and as a result was
| able to move on from my Sony DPT-S1 and reMarkable 2. The
| latter was nice for its hackability but the screen size of the
| Pro, its color functionality, and size have made it a great
| replacement.
| awwaiid wrote:
| Project author here -- happy to elaborate on anything; a
| continuous WIP project. The biggest insight has been limitations
| of vision models in spacial awareness -- see
| https://github.com/awwaiid/ghostwriter/blob/main/evaluation_...
| for some sketchy examples of my rudimentary eval.
|
| Next top things:
|
| * Continue to build/extract into a yaml+shellscript agentic
| framework/tool
|
| * Continue exploring pre-segmenting or other methods of spacial
| awareness
|
| * Write a reSvg backend that sends actual pen-strokes instead of
| lots of dots
| rybosome wrote:
| This is a really cool effect. How do you envision this being
| used?
|
| Thinking about it as a product, I'd want a way to easily slip
| in and out of "LLM please respond" so it wasn't constantly
| trying to write back the moment I stopped the stylus - maybe
| I'd want awhile to sketch and think, then restart a
| conversation. Or maybe for certain pages to be LLM-enabled, and
| others not.
|
| Does it require any sort of jailbreak to get SSH access to the
| device?
| awwaiid wrote:
| The reMarkable comes with root-ssh out of the box, so
| installation here is scp'ing a rust-built binary over, and
| then ssh'ing and running it. I haven't wrapped it in a
| startup-on-boot service yet.
|
| It is triggered right now by finger-tapping in the upper-
| right corner, so you can ask it to respond to the current
| contents of the screen on-demand. I think it would be cool to
| have another out-of-band communication, like voice, but this
| device has no microphone.
|
| Also right now it is one-shot, but on my long long TODO list
| is a second trigger that would _continue_ a back and forth
| multi-screenshot (like multi-page even) conversation.
| rybosome wrote:
| Ah great, I will definitely give this a try later then,
| thanks!
|
| I'm curious if this is becoming something that you are
| using in your own day-to-day, or if your focus right now is
| on building it?
|
| The context for my question is just a general interest in
| the transition to AI-enabled workflows. I know that I could
| be much more productive if I figured out how to integrate
| AI assistance into my workflows better.
| awwaiid wrote:
| Only building so far.
|
| The one use-case that is _close_ to ready-for-useful: I
| often take business meeting notes. In these notes I often
| write a T in a circle to indicate a TODO item. I am going
| to add a bit of config in there, basically "If you see a
| circle-T, then go add that to my todo list if it isn't
| already there. If you see a crossed-out circle-T then go
| mark it as done on the todo list" .
|
| I got slightly distracted implementing this, working
| instead toward a pluggable "if you see X call X.sh"
| interface. Almost there though :)
| loxias wrote:
| Wow! This is really cool! Really really cool! I imagine some
| sort of use where it's even more collaborative and not just
| "unadorned turn-by-turn".
|
| For example, maybe I'm taking notes involving words, simple
| math, and a diagram. Underline a key phrase and "the device"
| expands on the phrase in the margin. Maybe the device is
| diagramming, and I interrupt and correct it, crossing out some
| parts, and it understands and alters.
|
| Sorry, I know this is vague, I don't know precisely what I
| mean, but I do think that the combination of text (via some
| sort of handwriting recognition), stroke gestures, and a small
| iconography language with things enabled by LLMs probably opens
| up all sorts of new user interaction paradigms that I (and
| others) might be too set in our ways to think of immediately.
|
| I think there's a "mother of all demos" moment potentially
| coming soon with stuff like this, but I am NOT a UX designer
| and can't quite imagine it clearly enough. Maybe you can.
| awwaiid wrote:
| Yes! I have flashbacks to productive times standing in front
| of a whiteboard, alone or with others, doodling out thoughts
| and annotating them. When working with others I can usually
| talk to them, so we are also discussing as we are drawing and
| annotating. But also I've handed diagrams / equations to
| someone and then later they hand me back an annotated version
| -- that's interesting too.
| vendiddy wrote:
| I wish the remarkable tablets weren't so locked down.
|
| It's one of my favorite pieces of hardware and wish there were
| more apps for it.
| thrtythreeforty wrote:
| Locked down? You can get a shell by ssh'ing to it. Call me when
| an iPad lets you do that...
| freedomben wrote:
| I agree I definitely wouldn't call them "locked down." I do
| however think they could do a lot more to make it
| usable/hackable. This slightly undermines their cloud service
| ambitinos, but I think the hackability is what makes the
| Remarkable so ... well .. remarkable. Certainly that's why I
| bought one!
___________________________________________________________________
(page generated 2025-02-08 23:01 UTC)