[HN Gopher] ht: Headless Terminal
       ___________________________________________________________________
        
       ht: Headless Terminal
        
       Author : tosh
       Score  : 169 points
       Date   : 2024-06-02 07:53 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | m1keil wrote:
       | That's pretty interesting. What would be an example use case of
       | this?
        
         | npace12 wrote:
         | could be useful for llm tools using the command line
        
           | andyk wrote:
           | Yep this was my main reason for wanting it, though lots of
           | other good ideas in these HN comments.
        
             | m1keil wrote:
             | Ah, that makes sense, cheers.
        
         | telotortium wrote:
         | I'm guessing it's mostly useful for writing a terminal emulator
         | in the browser. In theory I think all input and output on the
         | terminal is done via text, which may contain control codes, so
         | you'd still have to write code to render the text, support
         | mouse input and selection, etc.
        
         | guyman50 wrote:
         | It looks like the use case is if you want to script the usage
         | of a tui program. In most cases it would be best to script the
         | operation yourself with sed rather than ht/nano. I could see
         | this being useful for scripting internal tui tools without
         | access to the source code.
        
         | craftkiller wrote:
         | My first thought was using it as a compatibility layer for
         | VT100-style CLI programs. Hypothetically, if we wanted to
         | finally replace VT100 emulation and move to some new legacy-
         | free protocol for terminals, we would need some sort of shim
         | for running legacy VT100 programs and displaying them in our
         | shiny new VT-next terminal. Similar to the `vt` program from
         | plan 9.
        
         | andyk wrote:
         | I shared the motivating use case for why Marcin and I built
         | this (LLM agents using terminals) in a diff comment but I'll
         | also expand the readme to give examples of use cases.
        
       | pzmarzly wrote:
       | I think it is a really good idea to separate vt100 emulation
       | "backend" from the UI. Then all terminal emulators could use a
       | common implementation and just focus on displaying the text,
       | instead of emulating quirks of 50-year old devices.
       | 
       | Using JSON as RPC also seems like a good idea, especially when
       | SSH-ing in, as the language server protocol has shown.
       | 
       | That being said, I don't see this project doing anything (yet?)
       | about escape sequences (for colors, clickable links, mouse and
       | clipboard integrations, setting title and cwd, etc) and handling
       | shortcuts (e.g. translating ctrl-c to sigint). There are very few
       | terminal emulators that get all of that right (I think Kitty,
       | iTerm, WezTerm), so it would be great if this project could lead
       | towards more of them being written.
        
         | mananaysiempre wrote:
         | Is there an actually convincing simple protocol for a 2D
         | character buffer? (As opposed to a jumped-up typescript like
         | the DEC terminals.) I've tried looking at the IBM tradition,
         | and while it's certainly _different_ , I wouldn't say it's
         | better on this particular point.
         | 
         | (There is also the part where a VT100-style display is a bit
         | more than just a character buffer--there's rewrapping on
         | resize, for example, which IIUC be on for some lines and off
         | for others--but let's assume we're willing to postpone updating
         | window contents until the resize is concluded so can afford to
         | do that on the application/VT100-processor side. Even though
         | that feels like 1995.)
        
         | ku1ik wrote:
         | Most sequences used by CLI apps that are supported by popular
         | (widely used) terminal emulators are supported here, i.e.
         | there's good compatibility with VT100/VT220/VT520/etc.
         | 
         | At the moment there's no support for mouse and clipboard - ht
         | uses asciinema's avt for terminal emulation, where this was
         | never needed (although may be added to avt if ht will need it).
         | 
         | Regarding the colors: internally there's full support for
         | standard indexed colors (palette) and RGB. The "getView" call
         | currently returns a plain text version of the screen buffer,
         | stripped of all color information (plaing text is what andyk's
         | headlong project, which uses ht, needs), but we can easily make
         | it return color attributes for each cell or segment of text
         | (either by modifying "getView", or adding complementary
         | "getRichView" or sth like that).
        
       | Y_Y wrote:
       | How does this relate to classic tools like `expect`?
        
         | yjftsjthsd-h wrote:
         | For one, this looks like a client/server model instead of one
         | program directly controlling another.
        
         | blueflow wrote:
         | Also there is script(1).
        
         | Zambyte wrote:
         | I don't think you can reasonably use expect with TUIs, but you
         | can use this with TUIs.
        
           | ComputerGuru wrote:
           | fish-shell does for its integration tests, and yes, it's
           | brutal :)
        
           | andyk wrote:
           | I tried to contrast to `expect` in a couple of my other
           | responses, but yeah this is my sense too after looking
           | briefly at `expect` - that ht always transparently sets up a
           | terminal for you under the hood and you interact with that so
           | you can always grab a screenshot of any terminal UI.
           | 
           | I don't think `expect` is targeted at this use case (though I
           | am only learning about `expect` right now so could be wrong)
        
         | xuhu wrote:
         | "getView command allows obtaining a textual view of a terminal
         | window."
        
       | electric_mayhem wrote:
       | First impression: nifty.
       | 
       | Having sat with it a few minutes though, why would I choose this
       | over the classic:
       | 
       | https://core.tcl-lang.org/expect/index
       | 
       | or any of its analogs such as:
       | https://pkg.go.dev/github.com/google/goexpect
       | https://pexpect.readthedocs.io/en/stable/
       | https://www.rubydoc.info/gems/ruby_expect/1.6.0/RubyExpect/E...
       | 
       | edit: ah. it preserves the UI. This has potential.
        
         | andyk wrote:
         | thanks for surfacing `expect` to our attention. I'll add a
         | compare/contrast to the ht readme
        
       | MobiusHorizons wrote:
       | Reading the readme, I find myself wondering what problems this
       | solves for. The example of wrapping nano strikes me as
       | particularly odd, since editing files is already fairly easy to
       | do programmatically either via direct file operations or with
       | tools like sed. Aside from editors, most tools that offer a tui
       | also expose the functionality for programmatic access (typically
       | with some additional flags to the same binary). So it just
       | strikes me as the wrong abstraction to interact with most things.
       | 
       | The other potential goal I could imagine was automation. If that
       | is the case, I would recommend the author to make that more clear
       | in the examples and also to describe it in relation to `expect`,
       | which would probably be my go to tool for such use cases.
       | 
       | Whatever the case it does look like a fun project, even if I
       | don't see any cases where it would be the right tool for the job.
        
         | woodrowbarlow wrote:
         | to me this seems essentially like 'screen/tmux without the
         | multiplexing features' which is useful because most of us do
         | 'terminal multiplexing' via our window manager and we're really
         | just using screen because we want to detach the process from
         | the terminal session (e.g. as a glorified nohup wrapper).
         | another similar tool is `nq`.
        
           | andrewshadura wrote:
           | Try dtach
        
         | mariocesar wrote:
         | I had this issue of needing to control Docker containers in a
         | VPS, without sharing access to the server itself. It seems like
         | it will be easy to create a simple web service that can
         | communicate with the ht API, list my containers, show me the
         | stats, and restart containers if I want to. I can manage all
         | security in the web service.
         | 
         | This could be a nice case.
        
           | shanemhansen wrote:
           | That would work but wouldn't making http requests to the
           | docker socket itself be a little easier?
           | 
           | Example:                   curl --silent --unix-socket
           | /run/user/1000/docker.sock http://v1.41/version
           | 
           | From: https://dev.to/smac89/curl-to-docker-through-
           | sockets-1mhe
           | 
           | You could even do something like a reverse proxy to very
           | limited paths although I tend to think that would ultimately
           | be a bad idea and making your own http calls is probably
           | better.
        
             | mbreese wrote:
             | You probably don't want to expose that service to the
             | internet...
             | 
             | I see this as something like the console management for
             | VPS. Back in the day, I remember reading about how
             | prgmr.com had setup a console that you'd directly SSH into.
             | That's now this interface [1] (and a company name change),
             | but I could see how programmatically working with this
             | would be helpful.
             | 
             | [1] https://tornadovps.com/documentation/vps-console
        
               | johnmaguire wrote:
               | The comment you're replying to mounted the Docker daemon
               | as a local socket, accessible only on the machine. (It
               | exposes an HTTP server still.)
               | 
               | I don't see why one would be any more comfortable
               | exposing a shell to the internet than the Docker daemon.
               | It grants _more_ capabilities. Either should likely be
               | protected by authentication.
        
               | mbreese wrote:
               | My understanding was that there was a server running
               | Docker containers where the admin wanted to allow others
               | to control/start containers without giving them access to
               | the machine through a local login. The idea proposed was
               | to make the docker port accessible to the outside world
               | (authenticated, somehow).
               | 
               | I'm not sure I'd want to expose the Docker port to the
               | outside world (or outside of a strictly firewalled
               | subnet). Even if it is wrapped, this seems to dangerous
               | to me.
               | 
               | The service I talked about is not a shell. It's a command
               | line program that operates as the shell when you login
               | via SSH. Instead of bash/zsh/etc, this program runs
               | instead. The purpose is to give the VM admin access for
               | out-of-band management (serial console, reinstalling the
               | OS, etc). I'm a big fan of this approach, where you don't
               | necessarily get full access to the host, but you do have
               | enough access to do the work that's needed (and still SSH
               | encrypted). No more, no less. To me, this seems like a
               | great approach for something like restricted VPS or
               | container admin.
               | 
               | I ended up doing something similar for an SSH jump box a
               | few jobs back where you could setup some basic admin
               | things (like uploading SSH keys) using a CLI program that
               | was used as an SSH shell.
               | 
               | To bring it back to the original post -- like others, I
               | had a hard time seeing what the OP could be used for.
               | Until I thought about this OOB CLI. It would be great for
               | scripting access to something like this.
        
               | shanemhansen wrote:
               | I think I was very unclear. I didn't mount anything, the
               | docker daemon by default is accessible over http (over a
               | unix domain socket rather than the typical TCP).
               | 
               | I was proposing that this persons web app not do any sort
               | of subprocess automation via something like ht, and
               | instead take in requests and talk to the docker daemon on
               | clients behalf. Since that allows any sort of
               | authentication or filtering that needs to happen.
               | 
               | I wasn't really seriously proposing the straight reverse
               | proxy setup. That's one of those layer violations like
               | PostREST that is either genius or lunacy. I haven't
               | figured out which one.
        
         | colinsane wrote:
         | here's a terrible script which runs as root on all my boxes (as
         | `redirect-tty /dev/tty1 unl0kr`):
         | https://git.uninsane.org/colin/nix-files/src/commit/9189f18c...
         | 
         | none of the Linux greeters meet all my needs, so i fall back to
         | `login`. but i still need a graphical program for actually
         | entering in my password -- particularly because some of my
         | devices don't have a physical keyboard (i.e. my phone). so i
         | take the output of a framebuffer-capable on-screen-keyboard [1]
         | and pipe that into `login`. but try actually doing that. try
         | `cat mypassword.txt | login MobiusHorizons`. it doesn't work:
         | `login` does some things on its stdin which only work on vtty.
         | so instead i run login on /dev/tty1, and pipe the password into
         | /dev/tty1 for the auth.
         | 
         | yes, this solution is terrible. a lot of things would make it
         | less terrible. i could fix one of the greeters to work the way
         | i need it (tried that). i could patch `login` (where it
         | probably won't ever be upstreamed). i could integrate the OSK
         | into the same input system the ttys use... or i could reach for
         | `ht`. everything except the last one is a day or more of work.
         | 
         | 1:
         | https://gitlab.com/postmarketOS/buffybox/-/tree/master/unl0k...
        
         | hitchstory wrote:
         | I wrote an integration testing framework which I wanted to
         | integrate with a tool _exactly_ like this so it could be used
         | to, e.g. test a command line app like vim.
         | 
         | Expect is what I tried to integrate with first. It falls over
         | quite quickly with any kind of app that does anything mildly
         | complicated with the terminal.
        
           | andyk wrote:
           | Interesting. When we decided to build ht we didn't compare it
           | to expect (which I hadn't heard of or used) but I'm comparing
           | the two now as they seem related.
           | 
           | How exactly did `expect` fall over?
           | 
           | From what I can tell, expect does not provide the
           | functionality of a stateful terminal server/client under the
           | hood for you so it isn't as easy to grab "text" screenshots
           | of a Terminal User Interface, which is one of the main
           | motivations behind ht (will update the readme to make this
           | main use-case more clear)
        
         | andyk wrote:
         | Hey, project lead here. I had a very specific use case in mind:
         | I'm playing with using LLM agent frameworks for software
         | engineering - like MemGPT, swe-agent, Langchain and my own
         | hobby project called headlong
         | (https://github.com/andyk/headlong). Headlong is focused on
         | making it easy for a human to edit the thought history of an
         | agent via a webapp. The longer term goal of headlong is
         | collecting large-ish human curated datasets that intermix
         | actions/observations/inner-thoughts and then use those data to
         | fine-tune models to see if we can improve their reasoning.
         | 
         | While working on headlong I tried out and implemented a variety
         | of 'tools' (i.e., functions) like editFile(), findFile(),
         | sendText(), checkTime(), searchWeb(), etc., which the agents
         | call using LLM function calling.
         | 
         | A bunch of these ended up being functions that interacted with
         | an underlying terminal. This is similar to how swe-agent works
         | actually.
         | 
         | But I figured instead of writing a bunch of functions that sit
         | between the LLM and the terminal, maybe let the LLM use a
         | terminal more like a human does, i.e., by "typing" input into
         | it and looking at snapshots of the current state of it. Needed
         | a way to get those stateful text snapshots though.
         | 
         | I first tried using tmux and also looked to see if any existing
         | libs provide the same functionality. Couldn't find anything so
         | teamed up with Marcin to design and make ht.
         | 
         | playing with the agent using the terminal directly has evolved
         | into a hypothesis that I've been exploring: the terminal may be
         | the "one tool to rule them all" - i.e., if an agent learns to
         | use a terminal well it can do most of what humans do on our
         | computers. Or maybe terminal + browser are the "two tools to
         | rule them all"?
         | 
         | Not sure how useful ht will be for other use cases, but maybe!
        
           | MobiusHorizons wrote:
           | This makes a lot of sense. I would call that out, because
           | it's really surprising out of context. Hopefully you can see
           | how unusual it would be to try to use human interfaces from
           | code for which in at least the majority of cases, there are
           | programatic interfaces for each task that already exist, and
           | would be much less bug prone / finicky. I guess the analogy
           | would be choosing to use Webdriver to interact with a service
           | for which there is already an API.
        
         | wolrah wrote:
         | The immediate thought I had upon reading the description was
         | "this would be great for Minecraft servers".
         | 
         | Most of us running Minecraft servers on Linux have it wrapped
         | in screen or tmux because the CLI is the only way to issue
         | certain commands including stopping it properly.
         | 
         | This could provide an alternative.
        
         | tstack wrote:
         | As others have kinda alluded to, it could be useful for testing
         | TUI applications. I develop a logfile viewer for the terminal
         | (https://lnav.org) and have a similar application[1] for
         | testing, but it's a bit flaky. It produces/checks snapshots
         | like [2]. I think the problems I run into are more around
         | different versions of ncurses producing slightly different
         | outputs.
         | 
         | [1] -
         | https://github.com/tstack/lnav/blob/master/test/scripty.cc [2]
         | - https://github.com/tstack/lnav/blob/master/test/tui-
         | captures...
        
       | hexsprite wrote:
       | Seems like it could be useful for e2e testing of command line
       | applications
        
         | cpendery wrote:
         | You could always try https://github.com/microsoft/tui-test too.
         | It could still use some more polishing on my part though
        
           | lostmsu wrote:
           | > npm i -D @microsoft/tui-test
        
       | pama wrote:
       | This could simplify RL on ncurses codes.
        
         | blueflow wrote:
         | What is RL referring to?
        
           | theblazehen wrote:
           | Potentially readline?
           | https://en.wikipedia.org/wiki/GNU_Readline
        
             | throwanem wrote:
             | More likely reinforcement learning, I think.
        
       | jauntywundrkind wrote:
       | Combined with nohup, this could probably be useful for
       | detaching/reattaching to long running processes across user
       | sessions?
       | 
       | I'm back to using tmux again, but for a while I was using another
       | program dtach, to start vim sessions that I could disconnect and
       | reattach to. Inside neovim I'd have a bunch of terminals &
       | buffers & what not, so it felt redundant having tmux also hosting
       | an event higher level environment.
       | 
       | Dtach is super super lightweight. Tmux is keeping copies of each
       | screen in memory, is doing some reprocessing. Dtach is basically
       | a pipe wrapping input a program's input and output, data in data
       | out.
       | 
       | I even wrote a little shell script to let me very quickly 'dta
       | my-proje' which will go autocomplete my-project name and either
       | open the existing dtach session in that project or go create
       | dtach session with vim in it.
       | https://github.com/jauntywunderkind/dtachment/blob/master/dt...
       | 
       | It would be interesting to see something like dtach used for
       | automation or scripting, as it seems targeted for. The idea of
       | being able to relay around the input and output feels like it
       | should have some neat uses. There isn't really a protocol, afaik,
       | and there's definitely no retained state for a getView like ht.
       | But it should in many ways function similarly?
        
       | nxobject wrote:
       | Every legacy IBM mainframe application from the 60s with a web
       | interface would like to have a word... s/IBM 3270/VT100.
        
       | broknbottle wrote:
       | Hmm this may have some use cases for macOS automation where
       | there's no MDM.
        
       | sigmonsays wrote:
       | i dont see any way of actually typing key presses, like
       | modifiers.
       | 
       | This project looks pretty interesting, maybe i'm missing
       | something.
        
         | andyk wrote:
         | Include in your input json the ascii control character that the
         | keyboard combo would generate (e.g., \x03 for ctrl-c).
         | 
         | To send control-c to the terminal, for example, you'd send the
         | following JSON message to ht:                 { "type":
         | "input", "payload": "\x03" }
        
         | ku1ik wrote:
         | To expand on what andyk wrote in the sibling comment:
         | 
         | Programs running in a terminal don't get individual components
         | of composite key presses such as ctrl+a, shift+b, so they don't
         | see "a with ctrl modifier" or "b with shift modifier". The
         | modifier keys are handled by terminal emulator before sending
         | the key's ascii value to the program, modifying the regular
         | ascii letters appropriately. So when "a" (ascii value 0x61) is
         | pressed while holding shift, its ascii value is ... shifted
         | (down) by a constant 0x20, making it ascii 0x41, which
         | represents "A". Similar with ctrl key, which shifts down the
         | ascii value by 0x60, turning "a" into 0x01. So to send "ctrl+d"
         | you send input with a single byte of value 0x04 ("d" ascii 0x64
         | minus 0x60). ht uses PTY under the hood, and this is how you
         | send keyboard input into to a program via a PTY. This is kinda
         | low level though, and there's definitely a possibility of
         | implementing a high level input method in ht, which would parse
         | string such as "<ctrl+d>" and automatically turn it into 0x04
         | before sending it to the process.
         | 
         | In other words, the way input in ht works right now was the
         | easiest, simplest way of implementing this to get it out the
         | door.
        
       | rank0 wrote:
       | Super cool! Also super difficult to secure if used server-side...
        
       | remram wrote:
       | I was wondering about this, is there a tmux API that is not
       | command-line? A way to access the tmux socket directly?
       | 
       | I would rather not multiply the number of tools I use, even
       | though ht might be more appropriate in a clean-room environment.
        
         | TickleSteve wrote:
         | libtmux https://github.com/tmux-python/libtmux
        
           | andyk wrote:
           | Oh this is cool. I looked at using tmux before we built ht
           | because I've used screen and tmux forever. I didn't find
           | libtmux though. Will def check it out.
        
           | remram wrote:
           | This is a cool wrapper, but it calls tmux on the command
           | line.
           | 
           | I'm really confused as to why the tmux protocol can't be used
           | directly, and entirely different systems like ht have to be
           | created.
        
         | jeroenjanssens wrote:
         | I've created tmuxr, an R package to manage tmux [0].
         | 
         | [0] https://github.com/jeroenjanssens/tmuxr
        
       | no-dr-onboard wrote:
       | seems really interesting for fuzzing.
        
       | doubloon wrote:
       | Would be awesome for ui testing.
        
       | andyk wrote:
       | andyk here. it's clear our readme is lacking use cases! adding
       | some now. When we introduced ht on twitter I gave a little more
       | context -- https://x.com/andykonwinski/status/1796589953205584234
       | -- but that should have been in the project readme.
       | 
       | Also a few people comparing to `expect`. I haven't used `expect`
       | before, but it looks very cool. Their docs/readme seem only
       | slightly more fleshed out than ours :-D Looks like the main way
       | to use expect is via:                 spawn ...       expect ...
       | send ...       expect ...       etc.
       | 
       | so, the expect syntax seems targeted more towards testing where
       | you simultaneously get the output from the underlying binary and
       | then check if it's what you expect (thus the name I guess). I
       | can't see if there is a way to just get the current terminal
       | "view" (aka text screenshot) via an expect command?
       | 
       | ht is more geared towards scripting (or otherwise
       | programmatically accessing) the terminal as a UI (aka Terminal
       | UI). So ht always runs a terminal for you and gives you access to
       | the current terminal state. Need to try out expect myself, but
       | from what I can tell, it doesn't seem to always transparently run
       | a Terminal for you.
       | 
       | There might already be some other existing tool that overlaps
       | with the ht functionality, but we couldn't find it when looked
       | around a bunch before building ht.
        
         | m0shen wrote:
         | `expect` is absolutely geared towards scripting, as it's an
         | extension of TCL. Though as far as getting a "current terminal
         | view" `expect` has `term_expect`: https://core.tcl-
         | lang.org/expect/file?name=example/term_expe...
        
         | metadat wrote:
         | Expect is The Original Way, and has been the standard since
         | before I learned to program more than 20 years ago. :-D
         | 
         | Expect is also extra cool because of `autoexpect'.
         | generate an Expect script from observing a (shell) session
         | 
         | https://manpages.ubuntu.com/manpages/focal/en/man1/autoexpec...
        
       ___________________________________________________________________
       (page generated 2024-06-04 23:00 UTC)