[HN Gopher] Code Golfing in Commodore BASIC
       ___________________________________________________________________
        
       Code Golfing in Commodore BASIC
        
       Author : Two9A
       Score  : 50 points
       Date   : 2023-10-01 08:46 UTC (1 days ago)
        
 (HTM) web link (imrannazar.com)
 (TXT) w3m dump (imrannazar.com)
        
       | bump-ladel wrote:
       | If you enjoy this, then you should definitely checkout 8-Bit Show
       | And Tell's YouTube channel. The presenter, Robin, regularly does
       | deep dives into code optimisation and fixes on Commodore 64 and
       | other machines.
       | 
       | https://www.youtube.com/watch?v=jhQgHW2VI0o
        
       | lifthrasiir wrote:
       | I don't know anything about C64 or C64 BASIC, but would it be
       | possible to intentionally write a shorter binary which will break
       | the interpreter and do what we want instead? For example jump
       | directly to a middle of the kernel ROM routine (akin to ROP in
       | the modern days), or use a bad address in the "next line" offset
       | etc.
        
         | wkjagt wrote:
         | In Commodore BASIC there's already SYS, which lets you jump to
         | an arbitrary address anywhere in the 64k address space,
         | including ROM. You can even include raw bytes in a BASIC
         | program and have the CPU execute them as machine code.
        
           | p0w3n3d wrote:
           | however encoding such program in BASIC would take much more
           | amount of commands/bytes than writing it in BASIC itself. You
           | would need DATA statement and POKE FOR LOOP... In case of
           | such a small scenario BASIC wins
        
             | gabrielsroka wrote:
             | There are ways around that too... store bytes in a string
             | or a REM and then execute it directly. No DATA or FOR
             | needed.
             | 
             | There were some workarounds posted on of Robin's recent
             | video.
        
       | afro88 wrote:
       | BASIC defaults to the tape device if you leave off the device
       | number in LOAD/SAVE commands. So you can save another byte or two
       | by saving to tape instead.
        
       | p0w3n3d wrote:
       | remember when people didn't have FDDs but cassete drives instead,
       | because FDDs were too expensive? Pepperidge Farm remembers
        
         | cbm-vic-20 wrote:
         | PRESS PLAY ON TAPE
        
           | nwellnhof wrote:
           | Username checks out.
        
       | qiqitori wrote:
       | https://gkanold.wixsite.com/homeputerium/games-list-2023 Games
       | written in ten lines of vintage BASIC. (Not related to the
       | article but its title.)
        
         | boffinAudio wrote:
         | ^^^ An excellent event, which I hope more HN'ers will check
         | out!
         | 
         | (If you're in Vienna, Austria, you can go to the Retro Gaming
         | Museum and see the winning entries on a real computer, in
         | person..)
        
       | wazoox wrote:
       | Back in the 80s when type-in program magazines were common, in
       | France we had the wonderful "Hebdogiciel" with a perpetually
       | running BASIC programming contest called "deulignes" -- which
       | means "twolines".
       | 
       | "Deulignes" programs could target any platform, but must only
       | take 2 lines of BASIC (most implementations allow only a limited
       | line length, often 255 characters).
       | 
       | Some programs were really impressive; I remember one complete
       | breakout implementation in MSX-BASIC for instance. People
       | actually made whole (small) games in 2 lines of BASIC!
       | 
       | Here's an example page : https://archive.org/details/hebdogiciel-
       | french-098/page/n15/...
        
         | qsort wrote:
         | In a roundabout way BASIC on those machines is like modern
         | high-level languages.
         | 
         | Slow and inefficient for sure, but most of the magic is
         | happening directly at the hardware level (sprites, memory-
         | mapped IO, etc.), so there's a surprisingly large amount of
         | stuff you can do with very acceptable performance.
        
           | actionfromafar wrote:
           | That's actually really insightful somehow.
        
         | wiz21c wrote:
         | +1 for mentionning the best (objectively :-) ) computer
         | magazine of all time (in french, that is).
         | 
         | Et les dessins de Carali...
        
         | wkjagt wrote:
         | Do you think you'd be able to find that 2 line breakout? I'm
         | currently doing a breakout implementation on my Commodore 64.
         | In assembly though, so definitely more than two lines ;-)
         | Nevertheless, a two line breakout in Basic would probably give
         | lots of pointers on how to make things more compact.
        
           | wazoox wrote:
           | Well I don't remember in which issue it was, but the whole
           | collection is here:
           | 
           | https://archive.org/details/hebdogiciel-french
           | 
           | I'm pretty sure it's really difficult to convert MSX-BASIC to
           | 6502 assembly though... But there are lots and lots of great
           | C64 deulignes too :)
        
           | elvis70 wrote:
           | Not the GP but you can find it on the "HEBDOGICIEL, les
           | listings" Website [1]. Deulignes section, page 15 [2] (Casse-
           | briques by Laurent Auble). Published in issue 104, page 12
           | (second box) [3]:                   1 SPRITEON:IFK=2THENR=-2:
           | B=2*SGN(X-Z-4):RETURNELSEIFK=0THENCOLOR4,0,0:SCREEN2,1:DEFINT
           | A-Z:J=186:X=140:Y=80:VPOKE14336,128:VPOKE14344,248:LINE(80,40
           | )-(183,61),7,BF:LINE(78,7)-(184,J),2,B:LINE(Y,8)-(183,10),1,B
           | F:R=2:B=2:Z=X:ONSPRITEGOSUB1ELSEIFL>5THENRUN         2 K=2:S=
           | STICK(0):Z=Z+2*(S=3)*(Z<174)-2*(S=7)*(Z>80):PUTSPRITE1,(Z,161
           | ):Y=Y+R:X=X+B:PUTSPRITE0,(X,Y),11:P=POINT(X,Y):IFP=0THEN2ELSE
           | IFP=7THENR=-R:A=(X\4)*4:LINE(A,Y)-(A+3,Y+1),0,B:GOTO2ELSEIFP=
           | 1THENR=2:GOTO2ELSEIFY>180THENL=L+1:Y=80:K=1:GOTO1ELSEB=-B:GOT
           | O2
           | 
           | Direct link to the source code [4]
           | 
           | [1] http://www.hebdogiciel.free.fr/
           | 
           | [2] http://www.hebdogiciel.free.fr/2lignes_15.htm
           | 
           | [3] https://archive.org/details/hebdogiciel-
           | french-104/page/n11/...
           | 
           | [4] http://www.hebdogiciel.free.fr/hd-
           | roms/2lignes/2lignes_MSX_n...
        
             | unwind wrote:
             | Great find, thanks!
             | 
             | It didn't seem like any of your links was to a playable
             | version, so I had a Goog' and a paste and came up with [1].
             | Pretty impressive game for that amount of code, I'd say.
             | 
             | I've never written a line of code on an actual MSX machine
             | (even though I was around in the 80s), it's kind of amazing
             | the amount of emulation Magic Power we casually throw
             | around, these days. Massive thanks to all the emulation
             | authors and (of course) retro computer archivists.
             | 
             | Edit: typo, more compliments.
             | 
             | [1]: https://msxpen.com/codes/-NflxlFRyvOOYbih_5WK
        
             | ofrzeta wrote:
             | Ok, so no Whitespace. I wonder how the parser works that
             | handles RETURNELSEIFK or ONSPRITEGOSUB1ELSEIFL.
        
               | elvis70 wrote:
               | The lines of BASIC code are tokenized according to the
               | shortest words.
        
         | p0w3n3d wrote:
         | that's a nice amount of code there.
        
       | einr wrote:
       | Using line number 1 instead of 10 seems like an easy 1 byte save.
        
         | pgeorgi wrote:
         | they're stored as 16 bit little-endian word, so unless it's
         | used for goto/gosub (whose targets are stored in petscii) the
         | line number makes no difference.
        
           | dep_b wrote:
           | Are they stored as 16 bit words before or after parsing the
           | BASIC code?
        
             | vidarh wrote:
             | The BASIC code is only in it's full textual form on screen.
             | The moment you press return on it, it's tokenized, and it's
             | stored tokenized both in memory and when saved. Unlike
             | modern systems, the full textual representation of the code
             | is never stored anywhere.
        
               | actionfromafar wrote:
               | It's an accident of history this didn't continue. So many
               | code style wars could have been avoided over the eons.
        
               | vidarh wrote:
               | It wasn't that it didn't continue, but that this was
               | unique to a branch of languages that largely were
               | sidelined.
               | 
               | And the tokenization didn't prevent you from style
               | differences anyway - as the article points out it e.g.
               | keeps spaces etc. It only tokenized a few things, like
               | keywords and line numbers.
               | 
               | (EDIT: in the late 90's I worked on a project written in
               | Word BASIC.... It was also tokenized and that was used as
               | an opportunity to _translate the keywords_ in the
               | localised versions of Word. But someone had managed to
               | write a bunch of code in the Danish version and somehow
               | exported it as text and imported it into the Norwegian
               | version - the languages are similar enough that it was
               | really hard to tell (no syntax highlighting, and they 'd
               | edited a bunch before realising and I had the fun job of
               | untangling it... Yay...)
        
               | BayesianDice wrote:
               | I think I read about a tool for the Acorn BBC micros
               | which you could apply to your program to remove
               | unnecessary spaces etc. in the tokenised form and shave
               | off a few bytes.
               | 
               | This had the side-effect that you could still display and
               | (presumably with a bit more mental effort, read) the
               | program listing fine, but re-entering a line as shown in
               | that listing would fail because the computer depended on
               | the spaces to do the parsing, even if they were redundant
               | after the tokenisation happened.
        
               | cvcount wrote:
               | On the ZX Spectrum, numeric values were saved as both
               | text, and in a five-byte floating point format. So making
               | lines shorter often involved using keywords to avoid
               | that: NOT PI, SGN PI, VAL "2" etc.
        
               | kbelder wrote:
               | Atari BASIC tokenized source a bit more thoroughly...
               | eliminated spaces, parsed constants into their floating
               | point representation, etc. It did enforce formatting to a
               | degree.
               | 
               | I was always surprised that it did all that, but wasn't
               | any faster than Commodore BASIC.
        
               | dep_b wrote:
               | When I SAVE a program in C64 BASIC and LOAD it again the
               | syntax doesn't change no matter what I do, add spaces or
               | not, use shorthand or not, colons, etcetera. So I get the
               | feeling that my whole program gets saved as a string and
               | then parsed, not tokenized and saved.
               | 
               | Also there is a line limit in C64 BASIC that would
               | overflow if certain shorthand would be expanded and for
               | beginners to see their fully written keywords being
               | transformed to shorthand after loading would be even more
               | confusing.
        
       | dep_b wrote:
       | In a C64 BASIC program keywords like SAVE and PRINT can be
       | abbreviated:
       | 
       | https://www.c64-wiki.com/wiki/BASIC_keyword_abbreviation
       | 
       | That would shave off some more precious bytes!
        
         | masswerk wrote:
         | This is how the program is actually stored:
         | 10SAVE"4",8:PRINT4            0801  0F 08               link to
         | next line at $080F       0803  0A 00               line number
         | (16-bit binary): 10       0805  94                  token SAVE
         | 0806  22 34 22 2C 38 3A   ascii <<"4",8:>>       080C  99
         | token PRINT       080D  34                  ascii <<4>>
         | 080E  00                  -EOL-       080F  00 00
         | -EOP- (link = null)
         | 
         | As we may see, "SAVE" has been compressed already to a single
         | byte (0x94), as is "PRINT" (0x99). Moreover, the line number is
         | a 16-bit binary integer, meaning, the number of decimal digits
         | in the listing has no effect on the in-memory format.
         | 
         | BTW, abbreviations of BASIC keywords work, because of how
         | upper-case/shifted letters are encoded in the PETSCII character
         | set: they have their sign-bit set. (So normal letters are all
         | smaller than 0x80, and shifted characters are >= 0x80. We may
         | also note that codes > 0x80 are used exclusively for tokens in
         | the stored BASIC text, discriminating them from any other
         | text.) Now, the tokenizing routine uses a table, which also
         | uses a set sign-bit: as a marker on the last character on each
         | of the keywords, which are stored in a table. It will compute
         | the difference of each letter in an input word to the entries
         | in that table, and, if the difference is exactly 0x80 (the
         | sign-bit), this means, (a) we arrived at the end of the word
         | stored in the table, and (b) all the letters up until here did
         | match (otherwise, we would have already exited the loop, in
         | order to test the next keyword). We have a match! The routine
         | then adds 0x80 to the table index of that keyword, and voila,
         | there is your BASIC token.
         | 
         | Notably, if we're dealing with single-byte values, for a
         | difference of 0x80 it doesn't matter, which of the two bytes,
         | this is the difference of, holds the bigger value. It's
         | effectively unsigned and agnostic of which was the larger byte.
         | For our tokenizing routine, this means it will only "know" that
         | one character has the sign-bit set, while the other has not
         | (but is otherwise the same), but it will not "know" which of
         | the two this is. Therefore, adding the sign-bit to an input
         | character will fool the routine into assuming, it already went
         | over the entire keyword and hit the sign-bit set in the last
         | character of the table entry. And we achieve this by shifting
         | the character in the input text. And, voila, there is your
         | abbreviated BASIC keyword.
         | 
         | (We can also see how the length of the input keyword doesn't
         | contribute to the storage format, as it will be compressed to a
         | token, which is 0x80 + the table index of the keyword, anyways.
         | We may also see why "iN" matches "input#" but not "input",
         | because the longer version has to come first in the table, in
         | order to match at all, and it will be also the first to be
         | recognized by the erroneous match.)
        
         | Hackbraten wrote:
         | Doesn't BASIC tokenize those abbreviations to the exact same
         | in-memory bytes like the full keywords?
        
           | eesmith wrote:
           | Yes. Mentioned in the link as well: "As a program is typed
           | into the BASIC interpreter, it's tokenised: any keywords in
           | the line get replaced by token values before being stored in
           | memory. We can see in this line that SAVE has been replaced
           | by command token $94".
        
           | [deleted]
        
           | dep_b wrote:
           | Yes it does but the amount of bytes the author speaks about
           | regards the amount of characters used for storing the
           | program. That's why he uses the : and removes the spaces.
        
             | larschdk wrote:
             | The stored version also uses token code points (single
             | bytes >127), not literal tokens.
        
       ___________________________________________________________________
       (page generated 2023-10-02 23:02 UTC)