[HN Gopher] Show HN: Tuc - When cut doesn't cut it
       ___________________________________________________________________
        
       Show HN: Tuc - When cut doesn't cut it
        
       Announcing `tuc`, a utility similar to coreutils `cut`, but more
       powerful. It allows to split text or bytes into parts and
       reassemble them in any order.  I always found `cut` very practical
       for some tasks where `sed` or `awk` were overkill or awkward to
       use, but I also felt the need for more features.  Some key
       differences from `cut`: - parts can be referenced by negative
       indexes - delimiters can be any number of characters long, or match
       a regex - can split text into lines, and reassemble them
        
       Author : riquito
       Score  : 164 points
       Date   : 2022-06-13 14:55 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ur-whale wrote:
       | Triple thumbs up for -e
       | 
       | Never understood why that was never added to cut in the first
       | place.
        
       | shreve wrote:
       | I am in love with this trend of replacing old school unix
       | utilities with new rust projects that stay just as fast or
       | faster, but increase the usability or feature set tenfold. rg,
       | fd, exa, bat, and now tuc.
        
         | carlhjerpe wrote:
         | I use rg, exa, bat, and zellij as my "rust replaced old stuff".
         | Zellij isn't yet as polished as I'd like it, but it's way more
         | intuitive than tmux.
        
           | apgwoz wrote:
           | The names of these utilities are bad. I have literally no
           | idea what any of them do. The same could be said for standard
           | unix utilities, true, but they have 50+ years (in some cases)
           | of brain bake in, and have the advantage of names that have
           | _some_ relation to their function (ls : list files :: exa :
           | "extract the list of files from a dirent?")
        
             | carlhjerpe wrote:
             | Does it really matter though? You can just alias them over
             | the originals or something close-by.
        
           | tomjakubowski wrote:
           | Usually you have to install tmux on a remote machine anyway,
           | so Zellij seems like a good one of these to try.
        
             | krylon wrote:
             | OpenBSD ships tmux in the base system. I would be very
             | pleased if more systems did this.
        
           | hultner wrote:
           | Oh god, is tmux considered "old stuff" now? I barely finished
           | replacing my screens.
        
             | carlhjerpe wrote:
             | I actually started with screen, but I only used it for
             | daemonising some foreground processes, I work in Zellij
             | daily.
        
             | tomjakubowski wrote:
             | It happens to all of us. Did you know Interstellar came out
             | eight years ago?
        
         | layer8 wrote:
         | The problem I see is that I don't see those consolidating
         | anytime soon into a new set of ubiquitous "core" utilities that
         | can be expected to be available everywhere.
        
           | humanwhosits wrote:
           | I guess it's just up to someone to set up the package on
           | debian that contains all of these
        
             | michaelmior wrote:
             | That doesn't solve the "available everywhere" problem. It
             | potentially would make the tools easier to install on
             | Debian-based systems, if you have root access. I'm not sure
             | any new set of tools will ever be available with the
             | ubiquity of coreutils in the next handful of years, if
             | ever.
        
               | jarbus wrote:
               | I agree. If these could be made installable without root,
               | and independent of the current version of glibc, then
               | there is huge potential to replace the older tools. I'd
               | love to use things like fd, but they don't work on older
               | servers without root and a newer version of glibc
        
               | tomjakubowski wrote:
               | They're already installable without root. Just put them
               | in ~/.local/bin. Most Rust binaries are portable and only
               | dynamically link libc.
        
               | jarbus wrote:
               | That's what I initially tried for fd, but unfortunately
               | glibc is required for it, and probably other utilities as
               | well.
        
               | homarp wrote:
               | fd has musl version
               | 
               | https://github.com/sharkdp/fd/releases/download/v8.4.0/fd
               | -mu...
        
               | roey2009 wrote:
               | What would prevent us from adding those to a docker file?
        
               | jarbus wrote:
               | Docker not being installed on remote
        
               | layer8 wrote:
               | Local installation is still not really a practical
               | solution when you work e.g. on customer machines, on
               | machines of another team, or generally if you work on
               | many different machines. You still need to be aware of
               | the standard tools and know how to use them when needed.
               | 
               | I mean, I get it -- I used to locally install vim on
               | machines that only had vi, to make my muscle memory be
               | functional when editing files. But it's not the same as a
               | core tool just being available by default so you don't
               | have to ever concern yourself with any alternatives.
        
               | fragmede wrote:
               | There's _moreutils_ for an older rethink of the same set
               | of utilities. I don't see why an _evenmoreutils_ wouldn't
               | eventually become popular enough to take hold? Probably
               | not as quickly as you'd like in today's world of instant
               | gratification, but we'll get there, eventually.
        
               | throwamon wrote:
               | Have you tried running them with Nix? I believe you can
               | use most things from it without root, but I'm not sure.
        
               | michaelmior wrote:
               | I suppose I shouldn't have said _only_ Debian. But having
               | tools easily installable is still quite different from
               | being able to reasonably assume they are already
               | installed.
        
               | moondev wrote:
               | This is why I like that you can "go run" golang programs
               | without installing them. Anywhere golang is installed it
               | will automatically build cache and run the binary for
               | whatever platform you are on.
               | 
               | For example                  go run
               | sigs.k8s.io/kind@latest create cluster
               | 
               | I wonder if something like this exists for rust
        
               | PaulDavisThe1st wrote:
               | Absolute no possible exploit path there _at all_
        
           | krylon wrote:
           | coreutils-ng? It would be nice to have a bunch of these in
           | one package.
        
         | smartmic wrote:
         | It's easy to fall in love with someone who is young, hip and
         | all that. When it comes done to work on my loved UNIX systems
         | though, I still prefer to stay with the old-school tools given
         | by coreutils et. al. They are a quasi-standard, I can rely on
         | them and I always know what to expect. Better yet, I will find
         | them on every system and can reduce my mental load to learn and
         | internalize something new. Sure, they're not perfect, but these
         | advantages trump the disadvantages, and it's all worked out
         | pretty well for decades. Here, I don't have to chase the next
         | bride.
        
           | tambourine_man wrote:
           | >It's easy to fall in love with someone who is young, hip and
           | all that
           | 
           | >Here, I don't have to chase the next bride.
           | 
           | Who hurt you? :)
        
           | hinkley wrote:
           | And yet, bash eventually replaced bourne shell (by having an
           | sh-compatibility mode), and vim has replaced vi.
           | 
           | If you get anywhere in the neighborhood of a proper superset
           | of the old application, we do occasionally retire the old
           | ones.
        
         | jeppesen-io wrote:
         | You can say that again. After using rg for a few years now, I
         | can't imagine not having this tool, that I use weekly, if not
         | daily
        
         | [deleted]
        
       | mmastrac wrote:
       | This is an incredibly useful improvement over cut, thank you. The
       | mental distance from cut to awk/sed is often just too high and
       | having a more useful utility will drastically reduce how much I
       | reach for those tools.
        
         | omoikane wrote:
         | I would still reach for awk/sed, for the fact that they tend to
         | be preinstalled. I might fall back on perl/ruby/python if
         | awk/sed were insufficient, which also tend to be preinstalled.
        
       | gleenn wrote:
       | This is awesome. Especially the ability to compress delimiters. I
       | can't tell you how many times I want to grab a couple fields from
       | the output of another command but can't get delimiters to work
       | correctly or they're using custom spacing to align columns and it
       | blows everything up and then I'm crying in awk land.
        
       | linsomniac wrote:
       | This is the "cut" I've always wanted!
        
       | blackfede wrote:
       | Thanks! I am already using it!
        
       | nathants wrote:
       | this is excellent.
        
       | emmelaich wrote:
       | Seems a bit of functionality overlap with lam/jot/rs; have you
       | looked at those?
       | 
       | They're from BSD originally. Included in macos, and in Linux
       | distros as bsd-utils.
        
         | gpvos wrote:
         | In Ubuntu I can find rs and athena-jot, but not lam; bsdutils
         | and bsdmainutils contain different tools.
        
         | riquito wrote:
         | I didn't know about them, I'll check them out
        
       | alex_hirner wrote:
       | Did you find any limitations of `pico-args` turned into a caveat
       | for tuc?
        
         | riquito wrote:
         | > Did you find any limitations of `pico-args` turned into a
         | caveat for tuc?
         | 
         | Quite the opposite, early on I started with `clap`, then moved
         | briefly to `argh` before settling with `pico-args`. Compilation
         | time and size where the main driving factors, alonside support
         | for non-spaced values (e.g. -d' ').
         | 
         | Maybe if tuc had subcommands it would have been a different
         | story, but I didn't find enough value in more blasoned arg
         | libraries.
        
       | matteo-biagetti wrote:
       | awesome!
        
       | [deleted]
        
       | seanw444 wrote:
       | Hacker News comes through again. Been looking for tool exactly
       | like this.
        
       | Aardwolf wrote:
       | > cargo install tuc
       | 
       | Slightly off topic question about this: in Linux, are rust
       | programs always installed like this, or should these also be made
       | available in the regular package manager of your distro?
        
         | elsjaako wrote:
         | In general you want to install it with a package manager.
         | 
         | (But also it's your system. What's the point of Linux if you
         | can't do it your way)
        
         | steveklabnik wrote:
         | The intention of "cargo install" is to provide a quick and easy
         | way to distribute programs useful to other Rust programmers.
         | 
         | In general, end users should use some other method that doesn't
         | require having a Rust toolchain pre-installed, but doing that
         | can take work, and so not every program pursues it.
        
         | masklinn wrote:
         | > or should these also be made available in the regular package
         | manager of your distro?
         | 
         | Getting a package in the official repositories is quite a high
         | bar to clear.
         | 
         | Plus it's... not great experience for early development, as
         | many distros will lock in the program entirely, leaving you
         | with a very long tail of extremely outdated installs.
         | 
         | So generally the expectation is that once a program is popular
         | or desirable enough, and is somewhat stable, it gets integrated
         | into the base repos.
        
           | Aardwolf wrote:
           | Is there an equivalent for C/C++ programs?
           | 
           | There's pip for python, npm for JS, cargo for rust
           | 
           | For C/C++ all I know are multiple different possible build
           | and make systems, but none works like a package manager, as
           | far as I know
           | 
           | Note that I don't always love when something installed with
           | pip or npm puts files all over your OS or homedir without
           | being managed by the package manager, though
        
             | ploxiln wrote:
             | ./configure --prefix=/usr/local && make && sudo make
             | install
             | 
             | (I consider it a feature that this doesn't automatically
             | download and install hundreds to thousands of things I
             | haven't even heard of)
        
               | mprovost wrote:
               | I mean, ignoring the fact that the configure script is
               | often a larger program than the one you're trying to
               | install, it (and make) can do anything to your system and
               | unless you read the contents of each you're just taking
               | it on faith that it isn't downloading and installing
               | hundreds of things.
        
               | ploxiln wrote:
               | ... but if I'm installing _curl_ or _jq_ or similar, I 'm
               | quite familiar with the provenance of the project, and of
               | the tarball I'm running a configure script from.
               | 
               | And maybe I need to install one or two dependencies,
               | similarly they should be familiar, or small and
               | comprehensible, and only downloaded and installed with my
               | explicit actions.
               | 
               | (And yeah autoconf generated configure scripts are crazy
               | huge and baroque, and could easily be 1/10 the size for
               | the needed functionality, but compared to "npm install"
               | I'll take it.)
        
             | db65edfc7996 wrote:
             | Perhaps not everything you want, but in Python land there
             | is pipx [0]. Pipx will create a virtual environment per
             | binary program so that they are all isolated from each
             | other and put things in a consistent location
             | (~/.local/pipx). Then it is easy enough to do `pipx install
             | black`, `pipx install cookiecutter`, whatever. Also has
             | nice upgrade option in `pipx upgrade-all`
             | 
             | [0] https://pypa.github.io/pipx/
        
             | masklinn wrote:
             | > Is there an equivalent for C/C++ programs? [...] For
             | C/C++ all I know are multiple different possible build and
             | make systems, but none works like a package manager, as far
             | as I know
             | 
             | Downloading the source by hand, trying to wrangle what
             | dependencies it has not vendored (which may or may not be
             | available through your system package managers, in versions
             | which may or may not be recent enough), and trying to find
             | out how to build it.
             | 
             | Though do note that this issue can also hit when installing
             | python, js, or rust package, if they ultimately have native
             | dependencies. Their respective build systems will generally
             | try to make it work out of the box, but if your
             | configuration was not specifically tested / supported it
             | can break with fun C-level compilation errors.
        
               | nickstinemates wrote:
               | The main solution to all of this complexity is another
               | complex (but awesomely powerful) package manager called
               | Portage. It's mainly used in Gentoo Linux
               | 
               | It's awesome. And complicated.
        
               | masklinn wrote:
               | That, or Nix, maybe. Also awesome. Also complicated.
        
               | speed_spread wrote:
               | Stop! I'm starting to miss Funtoo... And I now have all
               | these Ryzen cores idling, longing for a world update...
               | Must... Resist...
        
             | nickstinemates wrote:
             | Yes, it's called portage and comes with Gentoo. :)
        
       | overlordalex wrote:
       | Just want to say thank you for the unlimited delimiters - this
       | was something that always limited my usage of cut so just this
       | feature alone makes tuc worth it
        
       | donio wrote:
       | Doesn't seem to fix the #1 thing missing from cut: an easy syntax
       | for splitting on whitespace _sequences_ (rather than a single
       | space), like awk does by default.
       | 
       | (I see that it support splitting on regex but I was hoping for it
       | to be the default or a single character switch)
        
         | [deleted]
        
         | vultour wrote:
         | Isn't that what the ---greedy-delimiter is for?
        
           | donio wrote:
           | Nice, missed that, and it has a single character shorthand
           | too: -g
           | 
           | Edit: or maybe not, I think I'd still have to use --regex for
           | real whitespace sequences that can be a mix of spaces and
           | tabs.
        
             | riquito wrote:
             | As you figured out, -g (--greedy) matches the same
             | delimiter multiple times (e.g. one or more spaces). If you
             | want to match different delimiters (e.g. a mix of spaces),
             | one or more times, you must use -e (--regex).
        
         | [deleted]
        
       | masklinn wrote:
       | Seems similar in intent to choose
       | (https://github.com/theryangeary/choose) as a cut which doesn't
       | suck. The features outlined are very close, I just don't
       | understand what "can split text into lines" is, do you mean that
       | the selected fields can be split into lines?
       | 
       | The main advantage of tuc seems to be "templated" outputs.
        
         | totalperspectiv wrote:
         | It's also very similar to my tool hck
         | (https://github.com/sstadick/hck) which is in turn similar to
         | choose, just faster, supports compression, and supports column
         | selection via matching headers.
        
         | riquito wrote:
         | > I just don't understand what "can split text into lines" is,
         | do you mean that the selected fields can be split into lines?
         | 
         | Good question, I struggle to word it properly, any help is
         | appreciated.
         | 
         | Assume a file (we will call it "input"), such as
         | first line here        followed by second line
         | 
         | You can use a delimiter and cut inside each line
         | 
         | e.g.                   $ tuc -d ' ' -f 2 < input         line
         | by
         | 
         | or you can cut it "by lines", practically considering the whole
         | file as your single "line" and using newline as delimiter
         | $ tuc -l -f 2 < input         followed by second line
         | 
         | If you want to remove a line, or keep something inbetween it
         | can be more practical/intuitive than head/tail or sed
        
           | forty wrote:
           | This feature seems more like a replacement for head and tail
           | (and combination of both) rather than cut.
           | 
           | Maybe a good way to explain it would be to show how to
           | achieve the same thing with those well known commands
           | (comparison which should certainly be in favor of tuc ^^)
           | 
           | EDIT: sorry you just said more or less the same thing, I need
           | to read better :)
        
           | I_complete_me wrote:
           | Ref. your $ tuc -d ' ' -f 2 < input how is different from $
           | cut -d ' ' -f 2 input ?
        
             | riquito wrote:
             | > Ref. your $ tuc -d ' ' -f 2 < input how is different from
             | $ cut -d ' ' -f 2 input ?
             | 
             | It's not. `tuc` is a superset of cut and in that particular
             | example there's no difference. If you wanted instead to cut
             | on multibyte delimiters, or on a random amount of spaces,
             | `tuc` would work, while `cut` would fall short
        
       | ChicagoBoy11 wrote:
       | I've found this seriously cool!
       | 
       | While not my primary role at my job, I often find myself dealing
       | with lots of disparate data sets, usually needing to do some sort
       | of manipulation, cleaning, searching, etc. Every now and then
       | encounter something like this and it seems to me that there are
       | potentially a nice set of command line tools/utilities that I
       | should be adding to my belt. Anywhere I should particularly start
       | taking a look? Like, if my goal is to because much better at
       | wrangling CSV/text-delimited files, searching across folders of
       | docs for numbers, etc., where is my first entry point into trying
       | to become much more proficient at it?
        
         | pdimitar wrote:
         | Here's a recent gateway post:
         | https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...
         | 
         | There are even more. :)
        
         | niccl wrote:
         | I do a lot of that sort of thing, and my go-to tools are grep
         | (with full regular expressions), sort, uniq, head, tail, sed,
         | and, the 200 pound gorilla (not as heavy as it could be) awk.
         | Then if all else fails, Python
        
       | bradwood wrote:
       | Awk?
        
       | bitcraft wrote:
       | This is great! I always felt like cut was really handicapped by
       | lack of negative indexes.
        
         | tyingq wrote:
         | awk can do something like negative indexes if needed:
         | $ echo "a b c" | awk '{print $(NF-1)}'       b
        
         | kseistrup wrote:
         | The biggest handicap of cut for me has always been that it
         | cannot split on blanks (TABs or SPACEs), you have to choose
         | between TAB or SPACE. So I wrote an awk script that can print
         | field ranges like cut, but recognizes blanks. Now I will see if
         | I can get tuc worked into my muscle memory.
        
           | layer8 wrote:
           | I usually just pipe through sed to normalize the separators
           | before applying cut.
        
             | kseistrup wrote:
             | That's also a little awkward, when there could easily be an
             | option to split by all blanks.
        
               | layer8 wrote:
               | I'm not defending cut here, but using sed is also pretty
               | straightforward and fits its purpose. I'd argue that
               | using the existing general-purpose tools is better than
               | creating custom narrow-purpose tools in simple cases like
               | this one. Besides maintainability and familiarity, it
               | also exercises your proficiency in applying the standard
               | tools.
        
               | kseistrup wrote:
               | You have a point, of course.
        
         | samwhiteUK wrote:
         | When I want to use negative indices, I pipe the string through
         | rev first, then do my cut, then rev again
        
           | mprovost wrote:
           | That's the classic solution but blows up when using multibyte
           | characters since rev just reads the bytes in each line in
           | reverse.
        
         | pimlottc wrote:
         | What do you mean by negative indexes?
        
           | ratrocket wrote:
           | I believe "negative index" means array[-1] is the last
           | element in array, array[-2] is the second-to-last element,
           | etc.
           | 
           | In the context of "cut", it would mean being able to do
           | something like:
           | 
           | cut -d" " -f1--2
           | 
           | the "-f1--2" (read: fields from 1 to minus 2; it's a range)
           | means to select from the first field to the second-to-last
           | field. (that double "--" is pretty awkward, to be sure!)
           | 
           | Some programming languages (ruby is the one that I know) have
           | this feature for accessing array elements.
        
       | nijave wrote:
       | It'd be nice if it could split and support escapes and quoted
       | strings. I often run into issues with things like csv where
       | fields might be quoted or quoted strings where quotes are escaped
        
       ___________________________________________________________________
       (page generated 2022-06-13 23:00 UTC)