[HN Gopher] Linux Text Manipulation
___________________________________________________________________
Linux Text Manipulation
Author : zerojames
Score : 75 points
Date : 2024-03-28 12:08 UTC (10 hours ago)
(HTM) web link (yusuf.fyi)
(TXT) w3m dump (yusuf.fyi)
| bikingbismuth wrote:
| I enjoy reading things like this. It's posts like this that have
| helped me build my command line text processing skills over the
| years.
|
| If you are early in your career, I suggest you work on these
| types of skills. It is surprising how often I have found myself
| on a random box that I needed to parse application logs "by
| hand". This happens to me even in fancy, K8-rich environments.
| cassianoleal wrote:
| I agree.
|
| Just FYI, Kubernetes is abbreviated to k8s, not k8.
|
| > K8s as an abbreviation results from counting the eight
| letters between the "K" and the "s".
|
| https://kubernetes.io/docs/concepts/overview/
| tentacleuno wrote:
| This is very similar to "a11y"[0] and "i18n"[1]. The
| abbreviation of words using this technique has become
| surprisingly common in the software industry.
|
| [0]: https://www.wordnik.com/words/a11y [1]:
| https://www.wordnik.com/words/i18n
| nrabulinski wrote:
| It's called a numeronym
| https://en.m.wikipedia.org/wiki/Numeronym
| tentacleuno wrote:
| Thank you! I truly do learn something new every day on
| here.
| bregma wrote:
| Sometimes an n7m?
| pseingatl wrote:
| Better watch out with Arabic speakers, 7 is used for a
| sound in Arabic we don't have in English.
| baq wrote:
| i18n was a mystery for the longest time. a11y was just dumb
| for me until I learned what the numbers meant... last year,
| after what, two decades?
|
| Btw txn is similar, but they didn't bother with numbers,
| they just replaced the middle with an x.
| FergusArgyll wrote:
| a16z andreesen horowitz
| reidjs wrote:
| Is there something like leetcode for string manipulation
| exercises like this?
| keybored wrote:
| > If you are early in your career, I suggest you work on these
| types of skills. It is surprising how often I have found myself
| on a random box that I needed to parse application logs "by
| hand". This happens to me even in fancy, K8-rich environments.
|
| It's surprising how many times you have to ad hoc parse due to
| the tools being so poor. It's endemic.
| JohnMakin wrote:
| Some code challenge sites offer all their challenges in bash -
| I highly recommend working through these if you want to get
| better at this type of stuff. Some problems are surprisingly
| simple, others torturously difficult.
| tetris11 wrote:
| I am curious as to why not just use: playerctl
| metadata artist playerctl metadata title
|
| as provided by MPRIS
|
| https://wiki.archlinux.org/title/MPRIS
|
| > MPRIS (Media Player Remote Interfacing Specification) is a
| standard D-Bus interface which aims to provide a common
| programmatic API for controlling media players.
|
| > It provides a mechanism for discovery, querying and basic
| playback control of compliant media players, as well as a track
| list interface which is used to add context to the active media
| item.
| hifromwork wrote:
| Because it's an article about linux text manipulation, not
| about solving this specific problem.
| tyingq wrote:
| I'd prefer a general approach that used the first column as a
| key, and the rest as the value...into a dict/hash. Then if you
| need the Album title or something else later, it's easy to alter.
|
| I'm sure awk could do that, but with Perl: sp
| current | perl -nE '/(\S+)\s+(.*)/ and $d{$1}=$2;END{say
| "$d{Title} by $d{Artist}"}'
| Zhyl wrote:
| The omission of Perl from this post was pretty striking - not
| even mentioned in the final thoughts (instead thought to do the
| whole thing in awk??).
|
| Perl has fallen from grace as a general programming language
| and even as a systems administration language, but it's still
| absolutely the best and most ubiquitous tool for text
| manipulation.
| undershirt wrote:
| Can you write a Perl program to replace the content of
| anything found inside double-quotes with the result of piping
| it to the command `fmt`?
| shawn_w wrote:
| Easily, though you don't need to drag external programs
| like fmt into it - perl comes with a standard module for
| word wrapping.
|
| Text::Balanced (for extracting the quoted text) and
| Text::Wrap would be the core of such a program.
|
| https://perldoc.perl.org/Text::Balanced
|
| https://perldoc.perl.org/Text::Wrap
| undershirt wrote:
| Could you write it for me as a one-liner? I'd like to
| learn more Perl and this would help.
| kazinator wrote:
| $ txr by.txr spdata Wild World by Yusuf / Cat Stevens
| $ cat by.txr @(gather) Artist @artist Title
| @title @(end) @(output) @title by @artist
| @(end)
|
| For one-liner outputs, I often use @(do (put-line `@title by
| @artist`)).
| jiripospisil wrote:
| I was scratching my head for a bit before I realized the final
| script in the article produces a slightly different output than
| the previous diff shows (Yusuf/Cat vs Yusuf / Cat). Anyway,
| here's a Nushell version. There's likely a way to use "detect
| columns" here but it doesn't seem to like the repeated value or
| something. $ cat sp_out | lines | parse '{key}
| {value}' | str trim | transpose -rd | format pattern '{Title} by
| {Artist}' Wild World by Yusuf / Cat Stevens
|
| https://www.nushell.sh/cookbook/parsing.html
| elesiuta wrote:
| I use python more often than tools like awk, which I often forget
| the syntax of, so I made pyxargs to quickly run python code in
| the shell for small tasks like this sp current |
| pyxr -0 -g "(Artist)\s+(.+)\n(Title)\s+(.+)" -p "{3} by {1}"
| thomasahle wrote:
| I've been saving a lot of time in the terminal recently with
| shell-gpt (https://github.com/tbckr/sgpt): $
| sgpt -s "The command 'sp current' outputs > Album
| Tea For The Tillerman (Remastered 2020) > AlbumArtist
| Yusuf / Cat Stevens > Artist Yusuf / Cat Stevens
| > Title Wild World > I want just 'Wild World by
| Yusuf/Cat Stevens'" sp current | awk -F' +'
| '/Title/{title=$2} /Artist/{artist=$2} END{print title " by "
| artist}' [E]xecute, [D]escribe, [A]bort: A E
| Wild World by Yusuf / Cat Stevens
|
| Looking back at it, the awk command it uses is actually pretty
| clean.
| mhuffman wrote:
| Very cool! It reminds me of Microsoft Prose[1]. I guess they
| were way ahead of their time on that one.
|
| [1]https://www.microsoft.com/en-us/research/group/prose/
| arp242 wrote:
| It should use ^Title and ^Artist, otherwise something with
| "Artist" or "Title" will give wonky results.
|
| More importantly, you can get the same results by just one or
| two dbus-send commands instead of using that "sp" script +
| something to clean it up.
|
| The sp-metadata function uses tons of processes to clean the
| output[1], and sp-current launches a few more. If you're doing
| this in a loop for your WM status display this sort of stuff
| really adds up. Even on modern systems launching processes
| isn't free and relatively slow, and launching >20 of them every
| few seconds is going to use non-trivial amounts of CPU and will
| needlessly drain your battery.
|
| I don't really know what that dbus command outputs, but I bet
| that with you might go a long way with "dbus-send ... | grep -o
| ..." or something.
|
| So in general I'd say this is a classic case of "you're not
| even using the right solution, and no amount of GPT is going to
| help".
|
| [1]:
| https://gist.github.com/streetturtle/fa6258f3ff7b17747ee3#fi...
___________________________________________________________________
(page generated 2024-03-28 23:01 UTC)