* * * * * Adventures in Formatting If you are reading this via Gopher and it looks a bit different, that's because I spent the past few hours (months?) working on a new method to render HTML (HyperText Markup Language) into plain text. When I first set this up [1] I used Lynx [2] because it was easy and I didn't feel like writing the code to do so at the time. But I've never been fully satisfied at the results [Yeah, I was never a fan of that either –Editor]. So I finally took the time to tackle the issue (and is one of the reasons I was timing LPEG (Lua Parsing Expression Grammar) expressions [3] [DELETED-the other day- DELETED] [Nope. –Editor][DELETED- … um … the other week-DELETED] [Still nope. –Editor][DELETED- … um … a few years ago?-DELETED] [Last month. –Editor] [Last month? –Sean] [Last month. –Editor] [XXXX this timeless time of COVID- 19 –Sean] last month). The first attempt sank in the swamp. I wrote some code to parse the next bit of HTML (it would return either a string, or a Lua table containing the tag information). And that was fine for recent posts where I bother to close all the tags (taking into account only the tags that can appear in the body of the document,
,
tags cannot containtags while preserving whitespace (it's not in other tags). And check for the proper attributes for each tag. Great! I can now parse something like this: -----[ HTML ]-----
This is my blog. Is this not nifty?
Yeah, I thought so. -----[ END OF LINE ]----- into this: -----[ Lua ]----- tag = { [1] = { tag = "p", attributes = { }, block = true, [1] = "This is my ", [2] = { tag = "a", attributes = { href = "http://boston.conman.org/", }, inline = true, [1] = "blog", }, [3] = ". Is it not ", [4] = { tag = "em", attributes = { }, inline = true, [1] = "nifty?", }, }, [2] = { tag = "p", attributes = { }, block = true, [1] = "Yeah, I thought so.", }, } -----[ END OF LINE ]----- I then began the process of writing the code to render the resulting data into plain text. I took the classifications that the HTML 4.01 strict DTD uses for each tag (you can see the