[HN Gopher] Show HN: Tuc - When cut doesn't cut it
___________________________________________________________________
Show HN: Tuc - When cut doesn't cut it
Announcing `tuc`, a utility similar to coreutils `cut`, but more
powerful. It allows to split text or bytes into parts and
reassemble them in any order. I always found `cut` very practical
for some tasks where `sed` or `awk` were overkill or awkward to
use, but I also felt the need for more features. Some key
differences from `cut`: - parts can be referenced by negative
indexes - delimiters can be any number of characters long, or match
a regex - can split text into lines, and reassemble them
Author : riquito
Score : 164 points
Date : 2022-06-13 14:55 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ur-whale wrote:
| Triple thumbs up for -e
|
| Never understood why that was never added to cut in the first
| place.
| shreve wrote:
| I am in love with this trend of replacing old school unix
| utilities with new rust projects that stay just as fast or
| faster, but increase the usability or feature set tenfold. rg,
| fd, exa, bat, and now tuc.
| carlhjerpe wrote:
| I use rg, exa, bat, and zellij as my "rust replaced old stuff".
| Zellij isn't yet as polished as I'd like it, but it's way more
| intuitive than tmux.
| apgwoz wrote:
| The names of these utilities are bad. I have literally no
| idea what any of them do. The same could be said for standard
| unix utilities, true, but they have 50+ years (in some cases)
| of brain bake in, and have the advantage of names that have
| _some_ relation to their function (ls : list files :: exa :
| "extract the list of files from a dirent?")
| carlhjerpe wrote:
| Does it really matter though? You can just alias them over
| the originals or something close-by.
| tomjakubowski wrote:
| Usually you have to install tmux on a remote machine anyway,
| so Zellij seems like a good one of these to try.
| krylon wrote:
| OpenBSD ships tmux in the base system. I would be very
| pleased if more systems did this.
| hultner wrote:
| Oh god, is tmux considered "old stuff" now? I barely finished
| replacing my screens.
| carlhjerpe wrote:
| I actually started with screen, but I only used it for
| daemonising some foreground processes, I work in Zellij
| daily.
| tomjakubowski wrote:
| It happens to all of us. Did you know Interstellar came out
| eight years ago?
| layer8 wrote:
| The problem I see is that I don't see those consolidating
| anytime soon into a new set of ubiquitous "core" utilities that
| can be expected to be available everywhere.
| humanwhosits wrote:
| I guess it's just up to someone to set up the package on
| debian that contains all of these
| michaelmior wrote:
| That doesn't solve the "available everywhere" problem. It
| potentially would make the tools easier to install on
| Debian-based systems, if you have root access. I'm not sure
| any new set of tools will ever be available with the
| ubiquity of coreutils in the next handful of years, if
| ever.
| jarbus wrote:
| I agree. If these could be made installable without root,
| and independent of the current version of glibc, then
| there is huge potential to replace the older tools. I'd
| love to use things like fd, but they don't work on older
| servers without root and a newer version of glibc
| tomjakubowski wrote:
| They're already installable without root. Just put them
| in ~/.local/bin. Most Rust binaries are portable and only
| dynamically link libc.
| jarbus wrote:
| That's what I initially tried for fd, but unfortunately
| glibc is required for it, and probably other utilities as
| well.
| homarp wrote:
| fd has musl version
|
| https://github.com/sharkdp/fd/releases/download/v8.4.0/fd
| -mu...
| roey2009 wrote:
| What would prevent us from adding those to a docker file?
| jarbus wrote:
| Docker not being installed on remote
| layer8 wrote:
| Local installation is still not really a practical
| solution when you work e.g. on customer machines, on
| machines of another team, or generally if you work on
| many different machines. You still need to be aware of
| the standard tools and know how to use them when needed.
|
| I mean, I get it -- I used to locally install vim on
| machines that only had vi, to make my muscle memory be
| functional when editing files. But it's not the same as a
| core tool just being available by default so you don't
| have to ever concern yourself with any alternatives.
| fragmede wrote:
| There's _moreutils_ for an older rethink of the same set
| of utilities. I don't see why an _evenmoreutils_ wouldn't
| eventually become popular enough to take hold? Probably
| not as quickly as you'd like in today's world of instant
| gratification, but we'll get there, eventually.
| throwamon wrote:
| Have you tried running them with Nix? I believe you can
| use most things from it without root, but I'm not sure.
| michaelmior wrote:
| I suppose I shouldn't have said _only_ Debian. But having
| tools easily installable is still quite different from
| being able to reasonably assume they are already
| installed.
| moondev wrote:
| This is why I like that you can "go run" golang programs
| without installing them. Anywhere golang is installed it
| will automatically build cache and run the binary for
| whatever platform you are on.
|
| For example go run
| sigs.k8s.io/kind@latest create cluster
|
| I wonder if something like this exists for rust
| PaulDavisThe1st wrote:
| Absolute no possible exploit path there _at all_
| krylon wrote:
| coreutils-ng? It would be nice to have a bunch of these in
| one package.
| smartmic wrote:
| It's easy to fall in love with someone who is young, hip and
| all that. When it comes done to work on my loved UNIX systems
| though, I still prefer to stay with the old-school tools given
| by coreutils et. al. They are a quasi-standard, I can rely on
| them and I always know what to expect. Better yet, I will find
| them on every system and can reduce my mental load to learn and
| internalize something new. Sure, they're not perfect, but these
| advantages trump the disadvantages, and it's all worked out
| pretty well for decades. Here, I don't have to chase the next
| bride.
| tambourine_man wrote:
| >It's easy to fall in love with someone who is young, hip and
| all that
|
| >Here, I don't have to chase the next bride.
|
| Who hurt you? :)
| hinkley wrote:
| And yet, bash eventually replaced bourne shell (by having an
| sh-compatibility mode), and vim has replaced vi.
|
| If you get anywhere in the neighborhood of a proper superset
| of the old application, we do occasionally retire the old
| ones.
| jeppesen-io wrote:
| You can say that again. After using rg for a few years now, I
| can't imagine not having this tool, that I use weekly, if not
| daily
| [deleted]
| mmastrac wrote:
| This is an incredibly useful improvement over cut, thank you. The
| mental distance from cut to awk/sed is often just too high and
| having a more useful utility will drastically reduce how much I
| reach for those tools.
| omoikane wrote:
| I would still reach for awk/sed, for the fact that they tend to
| be preinstalled. I might fall back on perl/ruby/python if
| awk/sed were insufficient, which also tend to be preinstalled.
| gleenn wrote:
| This is awesome. Especially the ability to compress delimiters. I
| can't tell you how many times I want to grab a couple fields from
| the output of another command but can't get delimiters to work
| correctly or they're using custom spacing to align columns and it
| blows everything up and then I'm crying in awk land.
| linsomniac wrote:
| This is the "cut" I've always wanted!
| blackfede wrote:
| Thanks! I am already using it!
| nathants wrote:
| this is excellent.
| emmelaich wrote:
| Seems a bit of functionality overlap with lam/jot/rs; have you
| looked at those?
|
| They're from BSD originally. Included in macos, and in Linux
| distros as bsd-utils.
| gpvos wrote:
| In Ubuntu I can find rs and athena-jot, but not lam; bsdutils
| and bsdmainutils contain different tools.
| riquito wrote:
| I didn't know about them, I'll check them out
| alex_hirner wrote:
| Did you find any limitations of `pico-args` turned into a caveat
| for tuc?
| riquito wrote:
| > Did you find any limitations of `pico-args` turned into a
| caveat for tuc?
|
| Quite the opposite, early on I started with `clap`, then moved
| briefly to `argh` before settling with `pico-args`. Compilation
| time and size where the main driving factors, alonside support
| for non-spaced values (e.g. -d' ').
|
| Maybe if tuc had subcommands it would have been a different
| story, but I didn't find enough value in more blasoned arg
| libraries.
| matteo-biagetti wrote:
| awesome!
| [deleted]
| seanw444 wrote:
| Hacker News comes through again. Been looking for tool exactly
| like this.
| Aardwolf wrote:
| > cargo install tuc
|
| Slightly off topic question about this: in Linux, are rust
| programs always installed like this, or should these also be made
| available in the regular package manager of your distro?
| elsjaako wrote:
| In general you want to install it with a package manager.
|
| (But also it's your system. What's the point of Linux if you
| can't do it your way)
| steveklabnik wrote:
| The intention of "cargo install" is to provide a quick and easy
| way to distribute programs useful to other Rust programmers.
|
| In general, end users should use some other method that doesn't
| require having a Rust toolchain pre-installed, but doing that
| can take work, and so not every program pursues it.
| masklinn wrote:
| > or should these also be made available in the regular package
| manager of your distro?
|
| Getting a package in the official repositories is quite a high
| bar to clear.
|
| Plus it's... not great experience for early development, as
| many distros will lock in the program entirely, leaving you
| with a very long tail of extremely outdated installs.
|
| So generally the expectation is that once a program is popular
| or desirable enough, and is somewhat stable, it gets integrated
| into the base repos.
| Aardwolf wrote:
| Is there an equivalent for C/C++ programs?
|
| There's pip for python, npm for JS, cargo for rust
|
| For C/C++ all I know are multiple different possible build
| and make systems, but none works like a package manager, as
| far as I know
|
| Note that I don't always love when something installed with
| pip or npm puts files all over your OS or homedir without
| being managed by the package manager, though
| ploxiln wrote:
| ./configure --prefix=/usr/local && make && sudo make
| install
|
| (I consider it a feature that this doesn't automatically
| download and install hundreds to thousands of things I
| haven't even heard of)
| mprovost wrote:
| I mean, ignoring the fact that the configure script is
| often a larger program than the one you're trying to
| install, it (and make) can do anything to your system and
| unless you read the contents of each you're just taking
| it on faith that it isn't downloading and installing
| hundreds of things.
| ploxiln wrote:
| ... but if I'm installing _curl_ or _jq_ or similar, I 'm
| quite familiar with the provenance of the project, and of
| the tarball I'm running a configure script from.
|
| And maybe I need to install one or two dependencies,
| similarly they should be familiar, or small and
| comprehensible, and only downloaded and installed with my
| explicit actions.
|
| (And yeah autoconf generated configure scripts are crazy
| huge and baroque, and could easily be 1/10 the size for
| the needed functionality, but compared to "npm install"
| I'll take it.)
| db65edfc7996 wrote:
| Perhaps not everything you want, but in Python land there
| is pipx [0]. Pipx will create a virtual environment per
| binary program so that they are all isolated from each
| other and put things in a consistent location
| (~/.local/pipx). Then it is easy enough to do `pipx install
| black`, `pipx install cookiecutter`, whatever. Also has
| nice upgrade option in `pipx upgrade-all`
|
| [0] https://pypa.github.io/pipx/
| masklinn wrote:
| > Is there an equivalent for C/C++ programs? [...] For
| C/C++ all I know are multiple different possible build and
| make systems, but none works like a package manager, as far
| as I know
|
| Downloading the source by hand, trying to wrangle what
| dependencies it has not vendored (which may or may not be
| available through your system package managers, in versions
| which may or may not be recent enough), and trying to find
| out how to build it.
|
| Though do note that this issue can also hit when installing
| python, js, or rust package, if they ultimately have native
| dependencies. Their respective build systems will generally
| try to make it work out of the box, but if your
| configuration was not specifically tested / supported it
| can break with fun C-level compilation errors.
| nickstinemates wrote:
| The main solution to all of this complexity is another
| complex (but awesomely powerful) package manager called
| Portage. It's mainly used in Gentoo Linux
|
| It's awesome. And complicated.
| masklinn wrote:
| That, or Nix, maybe. Also awesome. Also complicated.
| speed_spread wrote:
| Stop! I'm starting to miss Funtoo... And I now have all
| these Ryzen cores idling, longing for a world update...
| Must... Resist...
| nickstinemates wrote:
| Yes, it's called portage and comes with Gentoo. :)
| overlordalex wrote:
| Just want to say thank you for the unlimited delimiters - this
| was something that always limited my usage of cut so just this
| feature alone makes tuc worth it
| donio wrote:
| Doesn't seem to fix the #1 thing missing from cut: an easy syntax
| for splitting on whitespace _sequences_ (rather than a single
| space), like awk does by default.
|
| (I see that it support splitting on regex but I was hoping for it
| to be the default or a single character switch)
| [deleted]
| vultour wrote:
| Isn't that what the ---greedy-delimiter is for?
| donio wrote:
| Nice, missed that, and it has a single character shorthand
| too: -g
|
| Edit: or maybe not, I think I'd still have to use --regex for
| real whitespace sequences that can be a mix of spaces and
| tabs.
| riquito wrote:
| As you figured out, -g (--greedy) matches the same
| delimiter multiple times (e.g. one or more spaces). If you
| want to match different delimiters (e.g. a mix of spaces),
| one or more times, you must use -e (--regex).
| [deleted]
| masklinn wrote:
| Seems similar in intent to choose
| (https://github.com/theryangeary/choose) as a cut which doesn't
| suck. The features outlined are very close, I just don't
| understand what "can split text into lines" is, do you mean that
| the selected fields can be split into lines?
|
| The main advantage of tuc seems to be "templated" outputs.
| totalperspectiv wrote:
| It's also very similar to my tool hck
| (https://github.com/sstadick/hck) which is in turn similar to
| choose, just faster, supports compression, and supports column
| selection via matching headers.
| riquito wrote:
| > I just don't understand what "can split text into lines" is,
| do you mean that the selected fields can be split into lines?
|
| Good question, I struggle to word it properly, any help is
| appreciated.
|
| Assume a file (we will call it "input"), such as
| first line here followed by second line
|
| You can use a delimiter and cut inside each line
|
| e.g. $ tuc -d ' ' -f 2 < input line
| by
|
| or you can cut it "by lines", practically considering the whole
| file as your single "line" and using newline as delimiter
| $ tuc -l -f 2 < input followed by second line
|
| If you want to remove a line, or keep something inbetween it
| can be more practical/intuitive than head/tail or sed
| forty wrote:
| This feature seems more like a replacement for head and tail
| (and combination of both) rather than cut.
|
| Maybe a good way to explain it would be to show how to
| achieve the same thing with those well known commands
| (comparison which should certainly be in favor of tuc ^^)
|
| EDIT: sorry you just said more or less the same thing, I need
| to read better :)
| I_complete_me wrote:
| Ref. your $ tuc -d ' ' -f 2 < input how is different from $
| cut -d ' ' -f 2 input ?
| riquito wrote:
| > Ref. your $ tuc -d ' ' -f 2 < input how is different from
| $ cut -d ' ' -f 2 input ?
|
| It's not. `tuc` is a superset of cut and in that particular
| example there's no difference. If you wanted instead to cut
| on multibyte delimiters, or on a random amount of spaces,
| `tuc` would work, while `cut` would fall short
| ChicagoBoy11 wrote:
| I've found this seriously cool!
|
| While not my primary role at my job, I often find myself dealing
| with lots of disparate data sets, usually needing to do some sort
| of manipulation, cleaning, searching, etc. Every now and then
| encounter something like this and it seems to me that there are
| potentially a nice set of command line tools/utilities that I
| should be adding to my belt. Anywhere I should particularly start
| taking a look? Like, if my goal is to because much better at
| wrangling CSV/text-delimited files, searching across folders of
| docs for numbers, etc., where is my first entry point into trying
| to become much more proficient at it?
| pdimitar wrote:
| Here's a recent gateway post:
| https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...
|
| There are even more. :)
| niccl wrote:
| I do a lot of that sort of thing, and my go-to tools are grep
| (with full regular expressions), sort, uniq, head, tail, sed,
| and, the 200 pound gorilla (not as heavy as it could be) awk.
| Then if all else fails, Python
| bradwood wrote:
| Awk?
| bitcraft wrote:
| This is great! I always felt like cut was really handicapped by
| lack of negative indexes.
| tyingq wrote:
| awk can do something like negative indexes if needed:
| $ echo "a b c" | awk '{print $(NF-1)}' b
| kseistrup wrote:
| The biggest handicap of cut for me has always been that it
| cannot split on blanks (TABs or SPACEs), you have to choose
| between TAB or SPACE. So I wrote an awk script that can print
| field ranges like cut, but recognizes blanks. Now I will see if
| I can get tuc worked into my muscle memory.
| layer8 wrote:
| I usually just pipe through sed to normalize the separators
| before applying cut.
| kseistrup wrote:
| That's also a little awkward, when there could easily be an
| option to split by all blanks.
| layer8 wrote:
| I'm not defending cut here, but using sed is also pretty
| straightforward and fits its purpose. I'd argue that
| using the existing general-purpose tools is better than
| creating custom narrow-purpose tools in simple cases like
| this one. Besides maintainability and familiarity, it
| also exercises your proficiency in applying the standard
| tools.
| kseistrup wrote:
| You have a point, of course.
| samwhiteUK wrote:
| When I want to use negative indices, I pipe the string through
| rev first, then do my cut, then rev again
| mprovost wrote:
| That's the classic solution but blows up when using multibyte
| characters since rev just reads the bytes in each line in
| reverse.
| pimlottc wrote:
| What do you mean by negative indexes?
| ratrocket wrote:
| I believe "negative index" means array[-1] is the last
| element in array, array[-2] is the second-to-last element,
| etc.
|
| In the context of "cut", it would mean being able to do
| something like:
|
| cut -d" " -f1--2
|
| the "-f1--2" (read: fields from 1 to minus 2; it's a range)
| means to select from the first field to the second-to-last
| field. (that double "--" is pretty awkward, to be sure!)
|
| Some programming languages (ruby is the one that I know) have
| this feature for accessing array elements.
| nijave wrote:
| It'd be nice if it could split and support escapes and quoted
| strings. I often run into issues with things like csv where
| fields might be quoted or quoted strings where quotes are escaped
___________________________________________________________________
(page generated 2022-06-13 23:00 UTC)