https://github.com/bugen/pypipe

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session.
Dismiss alert
{{ message }}
bugen / pypipe Public

  * Notifications
  * Fork 2
  * Star 17

Python pipe command line tool

License

Apache-2.0 license
17 stars 2 forks Activity
Star
Notifications

  * Code
  * Issues 0
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

bugen/pypipe

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
1 branch 0 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/b]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone bugen/]

    Work fast with our official CLI. Learn more about the CLI.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@bugen
bugen Updated README.md
...
eb7b045 Oct 23, 2023
Updated README.md

Deleted Misc

eb7b045

Git stats

  * 8 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.vscode
Added the 'csv' command and more
October 21, 2023 03:07
docs
Update README.md and more
October 22, 2023 21:47
LICENSE
Update LICENSE
October 21, 2023 04:21
README.md
Updated README.md
October 23, 2023 16:52
pypipe.py
Update README.md and more
October 22, 2023 21:47
View code
[                    ]
pypipe Demo Quick links Installation Basic usage and Examples | ppp
line | ppp rec | ppp csv | ppp text | ppp file | ppp custom -N NAME
-c, --counter pypipe is a code generator. Print generated code. -p,
--print Save generated code to a file. -o PATH, --output PATH Main
codes Default main code Code wrappping Disable code wrappping. -n,
--no-wrapping Pre and Post codes. -b CODE, --pre CODE, -a CODE,
--post CODE Inner loop. -e CODE, --loop-head CODE, -f CODE, --filter
CODE Import modules. -i MODULE, --import MODULE

README.md

pypipe

$ echo "pypipe" | ppp "line[::2]"
ppp

pypipe is a Python command-line tool for pipeline processing.

Demo

Alt text

Quick links

  * Installation
  * Basic usage and Examples
  * pypipe is a code generator.

Installation

pypipe is a single Python file and uses only the standard library.
You can use it by placing pypipe.py in a directory included in your
PATH (e.g., ~/.local/bin). If execute permission is not already
present, please add it.

chmod +x pypipe.py

To make it easier to type, it's recommended to create a symbolic
link.

ln -s pypipe.py ppp

Note
pypipe requires Python 3.6 or later.

Basic usage and Examples

| ppp line

Processing line by line. You can get the line string as line or l and
the line number as i.

$ cat staff.txt |ppp 'i, line.upper()'
1       NAME    WEIGHT  BIRTH   AGE     SPECIES CLASS
2       SIMBA   250     1994-06-15      29      LION    MAMMAL
3       DUMBO   4000    1941-10-23      81      ELEPHANT        MAMMAL
4       GEORGE  20      1939-01-01      84      MONKEY  MAMMAL
5       POOH    1       1921-08-21      102     TEDDY BEAR      ARTIFACT
6       BOB     0       1999-05-01      24      SPONGE  DEMOSPONGE

| ppp rec

Split each line by TAB. You can get the list includes splitted
strings as rec or r and the record number as i..

cat staff.txt |ppp rec 'r[:3]'
Name    Weight  Birth
Simba   250     1994-06-15
Dumbo   4000    1941-10-23
George  20      1939-01-01
Pooh    1       1921-08-21
Bob     0       1999-05-01

Using the -l LENGTH, --length LENGTH option allows you to get the
values of each field as f1, f2, f3, ....

$ tail -n +2 staff.txt |ppp rec -l5 'f"{f1} is {f4} years old"'
Simba is 29 years old
Dumbo is 81 years old
George is 84 years old
Pooh is 102 years old
Bob is 24 years old

When using the -H, --header option, it treats the first line as a
header line and skips it. The header values can be obtained from a
list named header, and you can access the values of each field using
the format dic["FIELD_NAME"].

$ cat staff.txt |ppp rec -H 'rec[0], dic["Birth"]'
Simba   1994-06-15
Dumbo   1941-10-23
George  1939-01-01
Pooh    1921-08-21
Bob     1999-05-01

You can change the delimiter by using the -d DELIMITER, --delimiter
DELIMITER option.

$ cat staff.csv |ppp rec -d , -l6  f1
Name
Simba
Dumbo
George
Pooh
Bob

| ppp csv

csv is similar to rec, but the difference is that while rec simply
splits the line using the specified DELIMITER like this, 'line.split
(DELIMITER))', csv uses the csv library for parsing. Furthermore, rec
is tab-separated by default, whereas csv is comma-separated.

You can specify options to pass to csv.reader and csv.writer using
the -O NAME=VALUE, --csv-opt NAME=VALUE option.

$ cat staff.csv |ppp csv -O 'quoting=csv.QUOTE_ALL'
"Name","Weight","Birth","Age","Species","Class"
"Simba","250","1994-06-15","29","Lion","Mammal"
"Dumbo","4000","1941-10-23","81","Elephant","Mammal"
"George","20","1939-01-01","84","Monkey","Mammal"
"Pooh","1","1921-08-21","102","Teddy bear","Artifact"
"Bob","0","1999-05-01","24","Sponge","Demosponge"

| ppp text

In ppp text, the entire standard input is read as a single piece of
text. You can access the read text as text.

$ cat staff.txt | ppp text 'len(text)'
231

For example, ppp text is particularly useful when working with a
indented JSON file. Using the -j, --json option allows you to decode
the text into JSON. The decoded data can be obtained as a dic.

$ cat staff.json |ppp text -j 'dic["data"][0]'
{'Name': 'Simba', 'Weight': 250, 'Birth': '1994-06-15', 'Age': 29, 'Species': 'Lion', 'Class': 'Mammal'}

Note
You can also use -j, --json option in line and file.

| ppp file

In ppp file, it receives a list of file paths from standard input. It
then opens each received file path, reads the contents of the file
into text, and repeats this process for each received file path in a
loop. The received paths can be obtained as path.

$ ls staff.txt staff.csv staff.json staff.xml |ppp file 'path, len(text)'
staff.csv       231
staff.json      1046
staff.txt       231
staff.xml       1042

For example, ppp file is usuful, especially when processing a large
number of JSON files.

find . -name '*.json'| ppp file --json ...

| ppp custom -N NAME

You can easily create custom commands using pypipe. First, you define
custom commands. The definition file is, by default, located at ~
/.config/pypipe/pypipe_custom.py. You can change the path of this
file using the PYPIPE_CUSTOM environment variable.

The following is an example of defining custom commands xpath and
sum.

~/.config/pypipe/pypipe_custom.py

TEMPLATE_XPATH = r"""
from lxml import etree
{imp}

def output(e):
    if isinstance(e, etree._Element):
        print(etree.tostring(e).decode().rstrip())
    else:
        _print(e)

{pre}

tree = etree.parse(sys.stdin)
for e in tree.xpath('{path}'):
{loop_head}
{loop_filter}
{main}

{post}
"""

TEMPLATE_SUM = r"""
import re
import sys
{imp}

ptn = re.compile(r'{pattern}')
s = 0

def add_or_print(*args):
    global s
    rec = args[0]
    if len(args) == 2:
        if isinstance(args[1], int):
            i = args[1]
            if len(rec) >= i:
                s += rec[i-1]
        else:
            print(args[1])
    else:
        print(*args[1:])


for line in sys.stdin:
    line = line.rstrip('\r\n')
    rec = [{type}(e) for e in ptn.findall(line)]
    if not rec:
        continue
{loop_head}
{loop_filter}
{main}

print(s)
"""

custom_command = {
    "xpath": {
        "template": TEMPLATE_XPATH,
        "code_indent": 1,
        "default_code": "e",
        "wrapper": 'output({})',
        "options": {
            "path": {"default": '/'}
        }
    },
    "sum": {
        "template": TEMPLATE_SUM,
        "code_indent": 1,
        "default_code": "1",
        "wrapper": 'add_or_print(rec, {})',
        "options": {
            "pattern": {"default": r'\d+'},
            "type": {"default": 'int'}
        }
    },
}

You can use them as follows:

$ cat staff.xml |ppp custom -N xpath -O path='./Animal/Age'
<Age>29</Age>
<Age>81</Age>
<Age>84</Age>
<Age>102</Age>
<Age>24</Age>

$ seq 10000| ppp c -Nsum -f 'rec[0] % 3 == 0'
16668333

-c, --counter

Using the -c, --counter option allows for easy data aggregation. When
you specify the -c, --counter option, it creates an instance of
collections.Counter, which can be accessed as either counter or c.
The -c, --counter option is available for use in all commands.

An example of aggregating data by the 'Gender' and 'Hobby' fields.

$ cat people.csv |ppp csv -H --counter 'dic["Gender"], dic["Hobby"]'| head -n10
Female  Cooking 4
Male    Hiking  3
Female  Reading 3
Male    Gardening       3
Female  Traveling       3
Male    Playing Music   3
Female  Dancing 3
Female  Hiking  3
Female  Painting        2
Male    Photography     2

This is an example to aggregate data based on whether female
individuals are 30 years or older.

cat people.csv |ppp csv -H -c -f 'dic["Gender"] == "Female"' 'int(dic["Age"]) >= 30'
False   16
True    10

When using the -c, --counter option, it uses counter[{}] += 1 as the
wrapper. If you want to count in a different way, you can disable the
wrapping by using the -n, --no-wrapping option and add your own
counting code.

$ cat population.csv |ppp csv -H -c -n 'counter[dic["State"]] += int(dic["Population"])'
New York        8398748
Texas   7751480
California      7327731
Illinois        2705994
Arizona 1680992
Pennsylvania    1584138
Florida 903889
Ohio    892533
Indiana 876862
North Carolina  792862
Washington      753675
Michigan        673104

Information about Code wrapping.

pypipe is a code generator.

pypipe is a command-line tool for pipeline processing, but it can
also be thought of as a code generator. It generates code internally
using the given arguments and then executes the generated code using
the exec function. Therefore, instead of executing the generated
code, you have the option to print it to the standard output or save
it to a file.

Print generated code. -p, --print

To check the generated code, you can use the -p, --print option.

ppp file -m rb -i hashlib -b 'total = 0' -b '_p("PATH", "SIZE", "MD5")' -e 'size = len(text)' -f 'path.stem == "staff"' 'total += size' 'path, size, hashlib.md5(text).hexdigest()' -a 'print(f"Total size: {total}", file=sys.stderr)' -p

The generated code is output as follows.

# IMPORT
import sys
from functools import partial
import gzip
from pathlib import Path
import hashlib

def _open(path):
    if path.suffix == '.gz':
        return gzip.open(path, 'rb')
    else:
        return open(path, 'rb')

# PRE
_p = partial(print, sep="\t")  # ABBREV
I, S, B, L, D, SET = 0, "", False, [], {}, set()  # ABBREV

def _print(*args, delimiter='\t'):
    if len(args) == 1 and isinstance(args[0], (list, tuple)):
        print(*args[0], sep=delimiter)
    else:
        print(*args, sep=delimiter)

total = 0
_p("PATH", "SIZE", "MD5")

for i, line in enumerate(sys.stdin, 1):
    path = Path(line.rstrip('\r\n'))
    with _open(path) as file:
        text = file.read()
        # LOOP HEAD
        size = len(text)
        # LOOP FILTER
        if not (path.stem == "staff"): continue
        # MAIN
        total += size
        _print(path, size, hashlib.md5(text).hexdigest())

# POST
print(f"Total size: {total}", file=sys.stderr)

Check that there are no issues with the generated code and execute
it.

$ find . -type f |ppp file -m rb -i hashlib -b 'total = 0' -b '_p("PATH", "SIZE", "MD5")' -e 'size = len(text)' -f 'path.stem == "staff"' 'total += size' 'path, size, hashlib.md5(text).hexdigest()' -a 'print(f"Total size: {total}", file=sys.stderr)'
PATH    SIZE    MD5
my_zoo.csv      186     e091408cc9174f1da86b50ee8e2fba96
my_zoo.xml      888     9edd78d97e45eccbac2b80747bd9c70b
my_zoo.json     887     7f15b3b8a23b91b60184113a38fa3e19
my_zoo.txt      186     4581c312d81815c3662f785ba9e7bd50
Total size: 2147

Save generated code to a file. -o PATH, --output PATH

For writing more complex code, it's a good practice to create a
template code with pypipe and edit the templated code manually.
Here's the process you can follow:

 1. Create a template code with pypipe and save it to a file, for
    example:

    ppp line --output /tmp/pipe.py ...

 2. Edit the code in /tmp/pipe.py to suit your needs.
 3. Execute the modified code by piping input to it, for example:

    cat sample.txt | /tmp/pipe.py

Main codes

The main code is specified as positional arguments. You can specify
multiple main codes. The placement of the main code varies depending
on the command. In commands like line, rec, csv, and file, the main
code is added within the loop processing with proper indentation.
However, in the text command, where there is no loop processing, the
main code is added without indentation. In the custom command, the
main code is added according to the definitions provided in the
pypipe_custom.py file.

$ ppp text -pqrn "for word in text.split():"  "    print(word)"

import sys
from functools import partial

text = sys.stdin.read()
for word in text.split():  # <- HERE
    print(word)            # <- HERE

You can also write it with line breaks in the terminal as follows:

$ ppp text -pqrn '
> for word in text.split():
>     print(word)
> '

Default main code

If no main code is specified in the arguments, pypipe adds a
predefined default code. For example, the default code in Line mode
is 'line'.

ppp -pqr

import sys
from functools import partial


def _print(*args, delimiter='\t'):
    if len(args) == 1 and isinstance(args[0], (list, tuple)):
        print(*args[0], sep=delimiter)
    else:
        print(*args, sep=delimiter)


for i, line in enumerate(sys.stdin, 1):
    line = line.rstrip("\r\n")
    _print(line)  # Default code with wrappping.

Code wrappping

By default, pypipe wraps the last code specified in the arguments
with a predefined wrapper. For example, in ppp line, it uses '_print
({})' as the wrapper. However, if the -c, --counter option is
specified, it uses 'counter[{}] += 1' as the wrapper instead.

$ ppp line 'year = int(line)' year -pqr

import sys
from functools import partial


def _print(*args, delimiter='\t'):
    if len(args) == 1 and isinstance(args[0], (list, tuple)):
        print(*args[0], sep=delimiter)
    else:
        print(*args, sep=delimiter)


for i, line in enumerate(sys.stdin, 1):
    line = line.rstrip("\r\n")
    year = int(line)
    _print(year)  # Wrapping

Disable code wrappping. -n, --no-wrapping

If you want to disable the wrapping of the last code specified in the
arguments by a predefined wrapper, you can use the -n, --no-wrapping
option.

ppp line -n 'n = max(len(line), n)' -a 'print(n)' -pqr

import sys
from functools import partial


for i, line in enumerate(sys.stdin, 1):
    line = line.rstrip("\r\n")
    n = max(len(line), n)  # No wrapping

print(n)

Pre and Post codes. -b CODE, --pre CODE, -a CODE, --post CODE

The code specified with -b CODE, --pre CODE will be added before the
loop processing or the main code. This can be useful for declaring
variables or performing any necessary setup before entering a loop or
executing the main code. The code specified with -a CODE, --post CODE
will be added after the loop processing or the main code. This can be
useful for displaying aggregated results or performing any additional
actions after the loop or main code execution.

$ ppp rec --pqrn -b 'TOTAL = 0' -b 'MAX = 0'  'TOTAL += int(rec[0])' 'MAX = max(MAX, int(rec[0]))'  -a 'print(f"TOTAL: {TOTAL}")' -a 'print(f"MAX: {MAX}")'

import sys
from functools import partial


TOTAL = 0   # PRE
MAX = 0     # PRE

for i, line in enumerate(sys.stdin, 1):
    line = line.rstrip("\r\n")
    rec = line.split('\t')
    TOTAL += int(rec[0])
    MAX = max(MAX, int(rec[0]))

print(f"TOTAL: {TOTAL}")  # POST
print(f"MAX: {MAX}")      # POST

Inner loop. -e CODE, --loop-head CODE, -f CODE, --filter CODE

In the loop processing of line, rec, csv, and file commands, the code
is added in the following positions:

for ... :
    {loop_head}  # Added with the -e CODE, --loop-head CODE option.
    {filter}     # Added with the -f CODE, --filter CODE option.
    {main}       # The main code is added here.

"loop_head" is added using the -e CODE, --loop-head CODE option,
while "filter" is added using the -f CODE, --filter CODE option.
Please note that the "loop_head" code is added as-is, while the
"loop_filter" is wrapped with if not ({}): continue.

$ ppp line -pqrn -e 'line = line.replace("foo", "bar")' -e 'line = line.upper()' -f '"BAR" in line' 'print(line)'

import sys
from functools import partial

for i, line in enumerate(sys.stdin, 1):
    line = line.rstrip("\r\n")
    line = line.replace("foo", "bar")  # LOOP_HEAD
    line = line.upper()                # LOOP_HEAD
    if not ("BAR" in line): continue   # FILTER
    print(line)                        # MAIN

Import modules. -i MODULE, --import MODULE

By using the -i MODULE, --import MODULE option, you can import any
modules. If the value specified with --import is in the form of a
sentence, like import math or from math import sqrt, it will be added
as an import statement just as it is. If only the module name is
provided, like math, it will automatically be given an import
statement, such as import math.

ppp text -i zlib -i 'from base64 import b64encode' 'b64encode
(zlib.compress(text.encode()))'

$ ppp text -pqrn -i zlib -i 'from base64 import b64encode' 'print(b64encode(zlib.compress(text.encode())))'

import sys
from functools import partial
import zlib                    # <- HERE
from base64 import b64encode   # <- HERE

text = sys.stdin.read()
print(b64encode(zlib.compress(text.encode())))

Usage example.

$ seq 5 |ppp -i math 'line, math.sqrt(int(line))'
1       1.0
2       1.4142135623730951
3       1.7320508075688772
4       2.0
5       2.23606797749979

About

Python pipe command line tool

Resources

Readme

License

Apache-2.0 license
Activity

Stars

17 stars

Watchers

1 watching

Forks

2 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Languages

  * Python 100.0%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.