[HN Gopher] Code Golfing in Commodore BASIC
___________________________________________________________________
Code Golfing in Commodore BASIC
Author : Two9A
Score : 50 points
Date : 2023-10-01 08:46 UTC (1 days ago)
(HTM) web link (imrannazar.com)
(TXT) w3m dump (imrannazar.com)
| bump-ladel wrote:
| If you enjoy this, then you should definitely checkout 8-Bit Show
| And Tell's YouTube channel. The presenter, Robin, regularly does
| deep dives into code optimisation and fixes on Commodore 64 and
| other machines.
|
| https://www.youtube.com/watch?v=jhQgHW2VI0o
| lifthrasiir wrote:
| I don't know anything about C64 or C64 BASIC, but would it be
| possible to intentionally write a shorter binary which will break
| the interpreter and do what we want instead? For example jump
| directly to a middle of the kernel ROM routine (akin to ROP in
| the modern days), or use a bad address in the "next line" offset
| etc.
| wkjagt wrote:
| In Commodore BASIC there's already SYS, which lets you jump to
| an arbitrary address anywhere in the 64k address space,
| including ROM. You can even include raw bytes in a BASIC
| program and have the CPU execute them as machine code.
| p0w3n3d wrote:
| however encoding such program in BASIC would take much more
| amount of commands/bytes than writing it in BASIC itself. You
| would need DATA statement and POKE FOR LOOP... In case of
| such a small scenario BASIC wins
| gabrielsroka wrote:
| There are ways around that too... store bytes in a string
| or a REM and then execute it directly. No DATA or FOR
| needed.
|
| There were some workarounds posted on of Robin's recent
| video.
| afro88 wrote:
| BASIC defaults to the tape device if you leave off the device
| number in LOAD/SAVE commands. So you can save another byte or two
| by saving to tape instead.
| p0w3n3d wrote:
| remember when people didn't have FDDs but cassete drives instead,
| because FDDs were too expensive? Pepperidge Farm remembers
| cbm-vic-20 wrote:
| PRESS PLAY ON TAPE
| nwellnhof wrote:
| Username checks out.
| qiqitori wrote:
| https://gkanold.wixsite.com/homeputerium/games-list-2023 Games
| written in ten lines of vintage BASIC. (Not related to the
| article but its title.)
| boffinAudio wrote:
| ^^^ An excellent event, which I hope more HN'ers will check
| out!
|
| (If you're in Vienna, Austria, you can go to the Retro Gaming
| Museum and see the winning entries on a real computer, in
| person..)
| wazoox wrote:
| Back in the 80s when type-in program magazines were common, in
| France we had the wonderful "Hebdogiciel" with a perpetually
| running BASIC programming contest called "deulignes" -- which
| means "twolines".
|
| "Deulignes" programs could target any platform, but must only
| take 2 lines of BASIC (most implementations allow only a limited
| line length, often 255 characters).
|
| Some programs were really impressive; I remember one complete
| breakout implementation in MSX-BASIC for instance. People
| actually made whole (small) games in 2 lines of BASIC!
|
| Here's an example page : https://archive.org/details/hebdogiciel-
| french-098/page/n15/...
| qsort wrote:
| In a roundabout way BASIC on those machines is like modern
| high-level languages.
|
| Slow and inefficient for sure, but most of the magic is
| happening directly at the hardware level (sprites, memory-
| mapped IO, etc.), so there's a surprisingly large amount of
| stuff you can do with very acceptable performance.
| actionfromafar wrote:
| That's actually really insightful somehow.
| wiz21c wrote:
| +1 for mentionning the best (objectively :-) ) computer
| magazine of all time (in french, that is).
|
| Et les dessins de Carali...
| wkjagt wrote:
| Do you think you'd be able to find that 2 line breakout? I'm
| currently doing a breakout implementation on my Commodore 64.
| In assembly though, so definitely more than two lines ;-)
| Nevertheless, a two line breakout in Basic would probably give
| lots of pointers on how to make things more compact.
| wazoox wrote:
| Well I don't remember in which issue it was, but the whole
| collection is here:
|
| https://archive.org/details/hebdogiciel-french
|
| I'm pretty sure it's really difficult to convert MSX-BASIC to
| 6502 assembly though... But there are lots and lots of great
| C64 deulignes too :)
| elvis70 wrote:
| Not the GP but you can find it on the "HEBDOGICIEL, les
| listings" Website [1]. Deulignes section, page 15 [2] (Casse-
| briques by Laurent Auble). Published in issue 104, page 12
| (second box) [3]: 1 SPRITEON:IFK=2THENR=-2:
| B=2*SGN(X-Z-4):RETURNELSEIFK=0THENCOLOR4,0,0:SCREEN2,1:DEFINT
| A-Z:J=186:X=140:Y=80:VPOKE14336,128:VPOKE14344,248:LINE(80,40
| )-(183,61),7,BF:LINE(78,7)-(184,J),2,B:LINE(Y,8)-(183,10),1,B
| F:R=2:B=2:Z=X:ONSPRITEGOSUB1ELSEIFL>5THENRUN 2 K=2:S=
| STICK(0):Z=Z+2*(S=3)*(Z<174)-2*(S=7)*(Z>80):PUTSPRITE1,(Z,161
| ):Y=Y+R:X=X+B:PUTSPRITE0,(X,Y),11:P=POINT(X,Y):IFP=0THEN2ELSE
| IFP=7THENR=-R:A=(X\4)*4:LINE(A,Y)-(A+3,Y+1),0,B:GOTO2ELSEIFP=
| 1THENR=2:GOTO2ELSEIFY>180THENL=L+1:Y=80:K=1:GOTO1ELSEB=-B:GOT
| O2
|
| Direct link to the source code [4]
|
| [1] http://www.hebdogiciel.free.fr/
|
| [2] http://www.hebdogiciel.free.fr/2lignes_15.htm
|
| [3] https://archive.org/details/hebdogiciel-
| french-104/page/n11/...
|
| [4] http://www.hebdogiciel.free.fr/hd-
| roms/2lignes/2lignes_MSX_n...
| unwind wrote:
| Great find, thanks!
|
| It didn't seem like any of your links was to a playable
| version, so I had a Goog' and a paste and came up with [1].
| Pretty impressive game for that amount of code, I'd say.
|
| I've never written a line of code on an actual MSX machine
| (even though I was around in the 80s), it's kind of amazing
| the amount of emulation Magic Power we casually throw
| around, these days. Massive thanks to all the emulation
| authors and (of course) retro computer archivists.
|
| Edit: typo, more compliments.
|
| [1]: https://msxpen.com/codes/-NflxlFRyvOOYbih_5WK
| ofrzeta wrote:
| Ok, so no Whitespace. I wonder how the parser works that
| handles RETURNELSEIFK or ONSPRITEGOSUB1ELSEIFL.
| elvis70 wrote:
| The lines of BASIC code are tokenized according to the
| shortest words.
| p0w3n3d wrote:
| that's a nice amount of code there.
| einr wrote:
| Using line number 1 instead of 10 seems like an easy 1 byte save.
| pgeorgi wrote:
| they're stored as 16 bit little-endian word, so unless it's
| used for goto/gosub (whose targets are stored in petscii) the
| line number makes no difference.
| dep_b wrote:
| Are they stored as 16 bit words before or after parsing the
| BASIC code?
| vidarh wrote:
| The BASIC code is only in it's full textual form on screen.
| The moment you press return on it, it's tokenized, and it's
| stored tokenized both in memory and when saved. Unlike
| modern systems, the full textual representation of the code
| is never stored anywhere.
| actionfromafar wrote:
| It's an accident of history this didn't continue. So many
| code style wars could have been avoided over the eons.
| vidarh wrote:
| It wasn't that it didn't continue, but that this was
| unique to a branch of languages that largely were
| sidelined.
|
| And the tokenization didn't prevent you from style
| differences anyway - as the article points out it e.g.
| keeps spaces etc. It only tokenized a few things, like
| keywords and line numbers.
|
| (EDIT: in the late 90's I worked on a project written in
| Word BASIC.... It was also tokenized and that was used as
| an opportunity to _translate the keywords_ in the
| localised versions of Word. But someone had managed to
| write a bunch of code in the Danish version and somehow
| exported it as text and imported it into the Norwegian
| version - the languages are similar enough that it was
| really hard to tell (no syntax highlighting, and they 'd
| edited a bunch before realising and I had the fun job of
| untangling it... Yay...)
| BayesianDice wrote:
| I think I read about a tool for the Acorn BBC micros
| which you could apply to your program to remove
| unnecessary spaces etc. in the tokenised form and shave
| off a few bytes.
|
| This had the side-effect that you could still display and
| (presumably with a bit more mental effort, read) the
| program listing fine, but re-entering a line as shown in
| that listing would fail because the computer depended on
| the spaces to do the parsing, even if they were redundant
| after the tokenisation happened.
| cvcount wrote:
| On the ZX Spectrum, numeric values were saved as both
| text, and in a five-byte floating point format. So making
| lines shorter often involved using keywords to avoid
| that: NOT PI, SGN PI, VAL "2" etc.
| kbelder wrote:
| Atari BASIC tokenized source a bit more thoroughly...
| eliminated spaces, parsed constants into their floating
| point representation, etc. It did enforce formatting to a
| degree.
|
| I was always surprised that it did all that, but wasn't
| any faster than Commodore BASIC.
| dep_b wrote:
| When I SAVE a program in C64 BASIC and LOAD it again the
| syntax doesn't change no matter what I do, add spaces or
| not, use shorthand or not, colons, etcetera. So I get the
| feeling that my whole program gets saved as a string and
| then parsed, not tokenized and saved.
|
| Also there is a line limit in C64 BASIC that would
| overflow if certain shorthand would be expanded and for
| beginners to see their fully written keywords being
| transformed to shorthand after loading would be even more
| confusing.
| dep_b wrote:
| In a C64 BASIC program keywords like SAVE and PRINT can be
| abbreviated:
|
| https://www.c64-wiki.com/wiki/BASIC_keyword_abbreviation
|
| That would shave off some more precious bytes!
| masswerk wrote:
| This is how the program is actually stored:
| 10SAVE"4",8:PRINT4 0801 0F 08 link to
| next line at $080F 0803 0A 00 line number
| (16-bit binary): 10 0805 94 token SAVE
| 0806 22 34 22 2C 38 3A ascii <<"4",8:>> 080C 99
| token PRINT 080D 34 ascii <<4>>
| 080E 00 -EOL- 080F 00 00
| -EOP- (link = null)
|
| As we may see, "SAVE" has been compressed already to a single
| byte (0x94), as is "PRINT" (0x99). Moreover, the line number is
| a 16-bit binary integer, meaning, the number of decimal digits
| in the listing has no effect on the in-memory format.
|
| BTW, abbreviations of BASIC keywords work, because of how
| upper-case/shifted letters are encoded in the PETSCII character
| set: they have their sign-bit set. (So normal letters are all
| smaller than 0x80, and shifted characters are >= 0x80. We may
| also note that codes > 0x80 are used exclusively for tokens in
| the stored BASIC text, discriminating them from any other
| text.) Now, the tokenizing routine uses a table, which also
| uses a set sign-bit: as a marker on the last character on each
| of the keywords, which are stored in a table. It will compute
| the difference of each letter in an input word to the entries
| in that table, and, if the difference is exactly 0x80 (the
| sign-bit), this means, (a) we arrived at the end of the word
| stored in the table, and (b) all the letters up until here did
| match (otherwise, we would have already exited the loop, in
| order to test the next keyword). We have a match! The routine
| then adds 0x80 to the table index of that keyword, and voila,
| there is your BASIC token.
|
| Notably, if we're dealing with single-byte values, for a
| difference of 0x80 it doesn't matter, which of the two bytes,
| this is the difference of, holds the bigger value. It's
| effectively unsigned and agnostic of which was the larger byte.
| For our tokenizing routine, this means it will only "know" that
| one character has the sign-bit set, while the other has not
| (but is otherwise the same), but it will not "know" which of
| the two this is. Therefore, adding the sign-bit to an input
| character will fool the routine into assuming, it already went
| over the entire keyword and hit the sign-bit set in the last
| character of the table entry. And we achieve this by shifting
| the character in the input text. And, voila, there is your
| abbreviated BASIC keyword.
|
| (We can also see how the length of the input keyword doesn't
| contribute to the storage format, as it will be compressed to a
| token, which is 0x80 + the table index of the keyword, anyways.
| We may also see why "iN" matches "input#" but not "input",
| because the longer version has to come first in the table, in
| order to match at all, and it will be also the first to be
| recognized by the erroneous match.)
| Hackbraten wrote:
| Doesn't BASIC tokenize those abbreviations to the exact same
| in-memory bytes like the full keywords?
| eesmith wrote:
| Yes. Mentioned in the link as well: "As a program is typed
| into the BASIC interpreter, it's tokenised: any keywords in
| the line get replaced by token values before being stored in
| memory. We can see in this line that SAVE has been replaced
| by command token $94".
| [deleted]
| dep_b wrote:
| Yes it does but the amount of bytes the author speaks about
| regards the amount of characters used for storing the
| program. That's why he uses the : and removes the spaces.
| larschdk wrote:
| The stored version also uses token code points (single
| bytes >127), not literal tokens.
___________________________________________________________________
(page generated 2023-10-02 23:02 UTC)