################################################################################
A script for reemoving HTML tags
- Willow Willis (2024-07-06)
###############################################################################
Just a dumb little bash script I wrote to help format a batch of articles from
my website in preparation for transferring them to gopherspace.
Features:
* Replaces header tags with various levels of hash marks
* Removes a lot of the common html special characters
* Converts all titles to uppercase
* Optionally adds extra newlines for every
or
I run it on a batch of .html docs at once:
find /posts -name "*.html" -exec ./stripHTML {} \;
Of course, the output still needs a little extra hand-formatting for
consistency, but this saved me a bunch of time regardless.
NOTE: this script does *not* call fold on the output, so the resulting .txt
files will be too wide. It's easy to add that, but I wanted to keep a backup
of each .txt file for my records before chopping them to 80 colums.
################################################################################
### SOURCE: ###
--------------------------------------------------------------------------------
#!/bin/bash
filepath=$1
dir="$(dirname $filepath)"
filename="$(basename $filepath)"
noext="${filename%.*}"
TXT="$dir/$noext.txt"
cp $filepath $TXT
sed -i "" 's/
//g' $TXT sed -i "" 's/<\/p>//g' $TXT sed -i "" 's/