[HN Gopher] Linux Text Manipulation
       ___________________________________________________________________
        
       Linux Text Manipulation
        
       Author : zerojames
       Score  : 75 points
       Date   : 2024-03-28 12:08 UTC (10 hours ago)
        
 (HTM) web link (yusuf.fyi)
 (TXT) w3m dump (yusuf.fyi)
        
       | bikingbismuth wrote:
       | I enjoy reading things like this. It's posts like this that have
       | helped me build my command line text processing skills over the
       | years.
       | 
       | If you are early in your career, I suggest you work on these
       | types of skills. It is surprising how often I have found myself
       | on a random box that I needed to parse application logs "by
       | hand". This happens to me even in fancy, K8-rich environments.
        
         | cassianoleal wrote:
         | I agree.
         | 
         | Just FYI, Kubernetes is abbreviated to k8s, not k8.
         | 
         | > K8s as an abbreviation results from counting the eight
         | letters between the "K" and the "s".
         | 
         | https://kubernetes.io/docs/concepts/overview/
        
           | tentacleuno wrote:
           | This is very similar to "a11y"[0] and "i18n"[1]. The
           | abbreviation of words using this technique has become
           | surprisingly common in the software industry.
           | 
           | [0]: https://www.wordnik.com/words/a11y [1]:
           | https://www.wordnik.com/words/i18n
        
             | nrabulinski wrote:
             | It's called a numeronym
             | https://en.m.wikipedia.org/wiki/Numeronym
        
               | tentacleuno wrote:
               | Thank you! I truly do learn something new every day on
               | here.
        
               | bregma wrote:
               | Sometimes an n7m?
        
               | pseingatl wrote:
               | Better watch out with Arabic speakers, 7 is used for a
               | sound in Arabic we don't have in English.
        
             | baq wrote:
             | i18n was a mystery for the longest time. a11y was just dumb
             | for me until I learned what the numbers meant... last year,
             | after what, two decades?
             | 
             | Btw txn is similar, but they didn't bother with numbers,
             | they just replaced the middle with an x.
        
             | FergusArgyll wrote:
             | a16z andreesen horowitz
        
         | reidjs wrote:
         | Is there something like leetcode for string manipulation
         | exercises like this?
        
         | keybored wrote:
         | > If you are early in your career, I suggest you work on these
         | types of skills. It is surprising how often I have found myself
         | on a random box that I needed to parse application logs "by
         | hand". This happens to me even in fancy, K8-rich environments.
         | 
         | It's surprising how many times you have to ad hoc parse due to
         | the tools being so poor. It's endemic.
        
         | JohnMakin wrote:
         | Some code challenge sites offer all their challenges in bash -
         | I highly recommend working through these if you want to get
         | better at this type of stuff. Some problems are surprisingly
         | simple, others torturously difficult.
        
       | tetris11 wrote:
       | I am curious as to why not just use:                   playerctl
       | metadata artist         playerctl metadata title
       | 
       | as provided by MPRIS
       | 
       | https://wiki.archlinux.org/title/MPRIS
       | 
       | > MPRIS (Media Player Remote Interfacing Specification) is a
       | standard D-Bus interface which aims to provide a common
       | programmatic API for controlling media players.
       | 
       | > It provides a mechanism for discovery, querying and basic
       | playback control of compliant media players, as well as a track
       | list interface which is used to add context to the active media
       | item.
        
         | hifromwork wrote:
         | Because it's an article about linux text manipulation, not
         | about solving this specific problem.
        
       | tyingq wrote:
       | I'd prefer a general approach that used the first column as a
       | key, and the rest as the value...into a dict/hash. Then if you
       | need the Album title or something else later, it's easy to alter.
       | 
       | I'm sure awk could do that, but with Perl:                 sp
       | current | perl -nE '/(\S+)\s+(.*)/ and $d{$1}=$2;END{say
       | "$d{Title} by $d{Artist}"}'
        
         | Zhyl wrote:
         | The omission of Perl from this post was pretty striking - not
         | even mentioned in the final thoughts (instead thought to do the
         | whole thing in awk??).
         | 
         | Perl has fallen from grace as a general programming language
         | and even as a systems administration language, but it's still
         | absolutely the best and most ubiquitous tool for text
         | manipulation.
        
           | undershirt wrote:
           | Can you write a Perl program to replace the content of
           | anything found inside double-quotes with the result of piping
           | it to the command `fmt`?
        
             | shawn_w wrote:
             | Easily, though you don't need to drag external programs
             | like fmt into it - perl comes with a standard module for
             | word wrapping.
             | 
             | Text::Balanced (for extracting the quoted text) and
             | Text::Wrap would be the core of such a program.
             | 
             | https://perldoc.perl.org/Text::Balanced
             | 
             | https://perldoc.perl.org/Text::Wrap
        
               | undershirt wrote:
               | Could you write it for me as a one-liner? I'd like to
               | learn more Perl and this would help.
        
       | kazinator wrote:
       | $ txr by.txr spdata       Wild World by Yusuf / Cat Stevens
       | $ cat by.txr       @(gather)       Artist @artist       Title
       | @title       @(end)       @(output)       @title by @artist
       | @(end)
       | 
       | For one-liner outputs, I often use @(do (put-line `@title by
       | @artist`)).
        
       | jiripospisil wrote:
       | I was scratching my head for a bit before I realized the final
       | script in the article produces a slightly different output than
       | the previous diff shows (Yusuf/Cat vs Yusuf / Cat). Anyway,
       | here's a Nushell version. There's likely a way to use "detect
       | columns" here but it doesn't seem to like the repeated value or
       | something.                 $ cat sp_out | lines | parse '{key}
       | {value}' | str trim | transpose -rd | format pattern '{Title} by
       | {Artist}'       Wild World by Yusuf / Cat Stevens
       | 
       | https://www.nushell.sh/cookbook/parsing.html
        
       | elesiuta wrote:
       | I use python more often than tools like awk, which I often forget
       | the syntax of, so I made pyxargs to quickly run python code in
       | the shell for small tasks like this                 sp current |
       | pyxr -0 -g "(Artist)\s+(.+)\n(Title)\s+(.+)" -p "{3} by {1}"
        
       | thomasahle wrote:
       | I've been saving a lot of time in the terminal recently with
       | shell-gpt (https://github.com/tbckr/sgpt):                   $
       | sgpt -s "The command 'sp current' outputs         > Album
       | Tea For The Tillerman (Remastered 2020)         > AlbumArtist
       | Yusuf / Cat Stevens         > Artist       Yusuf / Cat Stevens
       | > Title        Wild World         > I want just 'Wild World by
       | Yusuf/Cat Stevens'"              sp current | awk -F'  +'
       | '/Title/{title=$2} /Artist/{artist=$2} END{print title " by "
       | artist}'         [E]xecute, [D]escribe, [A]bort: A         E
       | Wild World by Yusuf / Cat Stevens
       | 
       | Looking back at it, the awk command it uses is actually pretty
       | clean.
        
         | mhuffman wrote:
         | Very cool! It reminds me of Microsoft Prose[1]. I guess they
         | were way ahead of their time on that one.
         | 
         | [1]https://www.microsoft.com/en-us/research/group/prose/
        
         | arp242 wrote:
         | It should use ^Title and ^Artist, otherwise something with
         | "Artist" or "Title" will give wonky results.
         | 
         | More importantly, you can get the same results by just one or
         | two dbus-send commands instead of using that "sp" script +
         | something to clean it up.
         | 
         | The sp-metadata function uses tons of processes to clean the
         | output[1], and sp-current launches a few more. If you're doing
         | this in a loop for your WM status display this sort of stuff
         | really adds up. Even on modern systems launching processes
         | isn't free and relatively slow, and launching >20 of them every
         | few seconds is going to use non-trivial amounts of CPU and will
         | needlessly drain your battery.
         | 
         | I don't really know what that dbus command outputs, but I bet
         | that with you might go a long way with "dbus-send ... | grep -o
         | ..." or something.
         | 
         | So in general I'd say this is a classic case of "you're not
         | even using the right solution, and no amount of GPT is going to
         | help".
         | 
         | [1]:
         | https://gist.github.com/streetturtle/fa6258f3ff7b17747ee3#fi...
        
       ___________________________________________________________________
       (page generated 2024-03-28 23:01 UTC)