NOVALOAD – Commodore Speed Load and Protection System Hacked.

MuckyPaws October 16, 2022 Leave a Comment

I may have mentioned, back in the day I used to be a ~~hacker~~ software security engineer. Protection Systems back then were evolving at a fast and steady rate.

The home computers of old were notorious for slow loading, with some programs and games taking over 30 minutes or more, especially if you had that dreaded “Tape Loading Error”. Enterprising folks created a niche market creating “Speed Loaders” which loaded programs (and more importantly Games) in significantly less time.

I remember the first time I encountered one, I thought the tape was faulty with flashing coloured bars on the screen, and quicker than expected was greeted with the game running in it’s full Eight Bit Glory.

Though Speed loaders resulted in less magnetic tape required for mastering cassettes, they were designed to make it more challenging for the casual “Tape to Tape Playground Pirate”, with the inevitable “Protection System” as standard to make it harder for friends of “Captain Jack Sparrow” to copy 😂

In today’s blog, I’m going to focus on one of the earliest and rather ingenious Speedloader for the Commodore range of computers, which became prolific back in the day. Additionally all these years later, I haven’t found any information breaking this protection system down for the archivist or casual 6502 coder. At the time I first encountered this protection system I was about 13 years old, and had to work all this out from first principals, not bad for a kids eh? Today, I’m revisiting this for nostalgia reasons, and testing my memory of old.

You will need a working knowledge of 6502 Assembly Language to follow through. My C16 knowledge is a tad rusty to say the least. It was my second home computer (Yet I knew how to pick lemons having forgone the Spectrum for the VIC-20, and the C64 choosing a C16 instead). I made up for that by owning probably the best 8-Bit home computer “The Amstrad CPC6128”. #ShotsFired…. #ISaidWhatISaid…. #YepPlayground8BitRivaryIsStillAThing

The Boring Stuff

This article can not be published in physical or electronic media without express permission from the author (Me!). If you’re sharing the link to the article via social media, please use the direct link as every click helps.

What you’ll need.

To follow along, I recommend the following:-

A working knowledge of 6502 Assembly Language Programming
VICE – https://vice-emu.sourceforge.io
YAPE – http://yape.homeserver.hu
CBM Program Studio – https://www.ajordison.co.uk/download.html
My GIT Repo: https://github.com/muckypaws/RetroCheats
Memory Map – https://github.com/franckverrot/EmulationResources/blob/master/consoles/commodore/C16%20Memory%20Map.txt

A copy of this book or something similar

https://www.amazon.co.uk/C16-Machine-Language-Absolute-Beginner/dp/1789827272

Novaload

The designers of this system were ahead of their time, or gifted developers, given that we didn’t have access to the wealth of information at our fingertips with today’s internet and archived manuals.

That’s a bold statement…

It certainly is, the designers understood the architecture of the machine very well, such that they created (What I believe to be) the first “Loader” that Auto-ran itself no matter how you tried to load the program.

Ordinarily, you type LOAD, the program loads allowing you to RUN or LIST to see the program. However, this little gem somehow managed to run the program without any intervention from the user. How?

The trick was elegant and stunningly simple. The loader code would overwrite the CPU Stack thus forcing execution of the code after the first block loaded. This was the corner stone of the “Protection System”, I’d only seen this technique repeated once on the Amstrad CPC Z80 processor with Richard Aplin’s “Fly Spy” Protection System.

What are you talking about Jason?

The Commodore Program Stack was fixed to Page 1 ($100 – $1FF) of memory, it was immovable. The Stack Pointer starts at $1FF and works its way downwards to $100 before cycling back to $1FF. Though if that happens you’ve got bigger problems with your program code.

Although the memory address is a fixed page range, the Stack Pointer consists of one byte, the “Page is Implied”. The Stack Pointer is set to the next free byte on the stack that data can be pushed to, popping data off the stack is always the address above.

The Current Stack Pointer is $F3, if we PUSHED the accumulator on the stack with PHA the data will be written to address $1F3 and the stack pointer decremented to $1F2.

When we POP the accumulator off the stack, the address $1F3 (The one above the SP is read) and the SP is incremented by one byte back to $1F3.

Essentially this system overwrote the area of Stack memory this taking control of the machine at the first RTS (Return instruction) since the stack contains a return address of it’s own intercept code.

The challenge the designers had was not only did they need to implement their loader code in just one block of data (256 Bytes) since that block will “Corrupt” the stack when written from cassette buffer to memory, they had to find a way to squeeze more bytes somewhere, simply because the whole loader code couldn’t reliably fit in 256 bytes, as the Stack would be unpredictable.

How did they get around this limitation? By hiding further code in the Cassette Data Header Structure!

Confused? Hopefully this will help illustrate the example.

Initial Loading

Let’s start at the very beginning… Thankfully using today’s tools like VICE or YAPE make our job much easier than it was in the 80’s

Throughout this example I’m going to use the Mastertronic Release of “Mr Puniverse”, the successor to “Big Mac”, for the following reasons. One I played this game to death back in the day, but also this features two turbo programs, the first in BASIC with a simple title page and the second the actual game.

Attach the “Mr Puniverse” Tape image to your emulator or real C16 if you’re going hardcore, type LOAD and as soon as the “FOUND MR PUNIVERSE” message appears, activate the emulator Monitor.

**As soon as the found message is displayed, enter MONITOR mode.**

The tape buffer location is found in memory address $B6 and $B7, a dump of these addresses reveal the Tape Buffer is located at $0333, and according to my rather dog eared copy of “C16 Machine Language for the Absolute Beginner” The Tape buffer is found between $0333 – $03F2 (192 Bytes).

The first two bytes, points to the Memory Address to LOAD the data, in this case $0140

**Little Endian Notation, $0333 = LSB, $0334 = MSB – Address = $0140**

The next two bytes ($0335/$0336) is the last address to store the data.

**Little Endian Notation, $0335 = LSB, $0336 = MSB – Address = $0140**

The architecture of the Commodore Range of machines means we know that the STACK is tied to PAGE1: $0100 – $01FF

**Current Content of the System Stack. The Stack Pointer is Currently Located at $F3**

Interestingly PAGE1 has additional uses, the first 16 bytes are used for Floating Point Conversion buffer space, Colour Luminance Table and Kernel Save Information. Leaving only $0124 – $01FF for the actual system stack.

Depending on when you activated the monitor, you may see different data from the Stack Pointer, in this example, $1F3 is the pointer to the next free space on the system stack, when the current routine has completed execution the data in Addresses $1F4 and $1F5 contain the return address-1.

To try and illustrate.

Wait, what? Return address -1? Yep, bit of a strange concept if you’re familiar with Z80/8080 or 80×86 assembly development where the program counter of the next instruction is pushed to the system stack. In 6502 land when a Subroutine is called, the program counter contains the final byte of the calling address. i.e.

$1200   20 18 15   JSR $1518
$1203   EA         NOP

$1202 is where the Program Counter (PC) was prior to the Jump to Subroutine (last byte read) therefore this address is pushed to the stack, and not Program Counter + 1 ($1203). Madness I say, but that’s architecture for you. Thankfully the 6502 when popping data off the stack with an RTS instruction internally increments the Program Counter + 1 for the next instruction. In the example of the stack dump above the data $03 $EA will return execution to address $EA04.

Still with me? And that’s why systems developers earned the big bucks!

We know the System Stack is located in PAGE1, the program being loaded will overwrite the stack area too.

**Return Stack has Bytes $02 overwriting the original Address.**

During the load process the Stack Pointer has been updated to point to $F4, meaning when the load is complete, the program counter of $0202 will be popped off the stack, which means a return address of $0203.

We can set a break point at $0203 to prove this, and resume execution.

BREAK $0203

**Breakpoint at $203 shows the real loader code.**

The Loader

If you haven’t guessed already, this is how the designers ensured that there program code would be executed without user intervention, and also bypassed simple firmware “LOAD” routines like :-

*=$1200
     LDA     #$01       
     LDX     #$01      
     LDY     #$FF      
     JSR     $FFBA     
     LDA     #$00      
     JSR     $FFBD
     LDX     #$40
     LDY     #$15
     JSR     $FFD5
     RTS

In this example the stack would always be “Corrupted” and the game would load as designed since program execution was forced to $0203 the moment an RTS instruction was encountered. Other protection systems we’ll cover in later articles utilised the Interrupt Vector to force their code to run, though this is a more thorough discussion in its own right.

Could the designers use a different address? Yes and No… This trick only works for the first block of data loaded into memory, the Stack pointer can not be guaranteed to point to an EVEN Address, as a hacker could push a value on the stack to try and defeat it. To guarantee auto run and minimal interference with the code, identical values need to be pushed. $0303 wouldn’t work as the kernel firmware loader only loads 256 byte blocks at a time, and 0101 would run the risk of corrupt data with firmware tables stored below $0122.

There’s still more to be done! Let’s take a look at the final code for loading the program.

(C:$0203) d
.C:0203  A9 01       LDA #$01
.C:0205  8D 3B 05    STA $053B      ; Set Active Colour To 1
.C:0208  20 84 FF    JSR $FF84      ; Initialise IO
.C:020b  A9 93       LDA #$93
.C:020d  20 D2 FF    JSR $FFD2      ; Print Screen Clear CHR$(147)
.C:0210  A9 8E       LDA #$8E       ; Switch to Upper Case CHR$(142)
.C:0212  4C 40 01    JMP $0140      ; Continue Loader Code

The initial code sets up the activate colour attribute to black, you can test this by typing

POKE 1339,1

You probably won’t see the difference, pokeing the value 2 results in a dark red colour ink.

The program code from $140 is interesting, as it performs some initialisation, and re-assembles the actual loader code!

.C:0140  20 D2 FF    JSR $FFD2			<-- Upper Case Select
.C:0143  A9 01       LDA #$01
.C:0145  8D 15 FF    STA $FF15			<-- Set Background Colour To Black
.C:0148  A2 0F       LDX #$0F			<-- Write MR PUNIVERSE to Screen
.C:014a  BD 37 03    LDA $0337,X		
.C:014d  29 3F       AND #$3F			<-- Ensure Printable ASCII
.C:014f  9D 4E 0D    STA $0D4E,X		<-- Store at Screen Location
.C:0152  A9 71       LDA #$71			<-- Set Colour Attribute
.C:0154  9D 4E 09    STA $094E,X		<-- Store Colour Attribute on Screen Memory 
.C:0157  BD 15 02    LDA $0215,X		<-- NOVALOAD Message
.C:015a  9D A4 0F    STA $0FA4,X		<-- Write to Screen
.C:015d  A9 51       LDA #$51			<-- Colour Attribute
.C:015f  9D A4 0B    STA $0BA4,X		<-- Store Colour Attribute on Screen Memory 
.C:0162  CA          DEX				<-- Decrement Counter
.C:0163  10 E5       BPL $014A			<-- Loop Until X = $FF

Set a breakpoint at $165 and continue execution of the code. If you look directly at screen memory address you’ll notice the message isn’t in ASCII but PETSCII, using the SC command will show what’s in screen memory

(C:$0ee0) sc
Displaying 40x25 screen at $0c00:
0c00                                          
0c28                                          
0c50                                          
0c78                                          
0ca0                                          
0cc8                                          
0cf0                                          
0d18                                          
0d40                mr puniverse              
0d68                                          
0d90                                          
0db8                                          
0de0                                          
0e08                                          
0e30                                          
0e58                                          
0e80                                          
0ea8                                          
0ed0                                          
0ef8                                          
0f20                                          
0f48                                          
0f70                                          
0f98              novaload n103106            
0fc0             

>C:0f98  20 20 20 20  20 20 20 20           
>C:0fa0  20 20 20 20  0e 0f 16 01       ....
>C:0fa8  0c 0f 01 04  20 0e 31 30   .... .10
>C:0fb0  33 31 30 36  20 20 20 20   3106    
>C:0fb8  20 20 20 20  20 20 20 20           
>C:0fc0  20 20 20 20  20 20 20 20           
>C:0fc8  20 20 20 20  20 20 20 20           
>C:0fd0  20 20 20 20  20 20 20 20           
>C:0fd8  20

The introductory message is out of the way and displayed, though your emulator will not update the screen at this point as the raster scan hasn’t run for a full cycle yet.

.C:0165  A2 14       LDX #$14
.C:0167  BD 29 02    LDA $0229,X		<-- Some of the Loader Code
.C:016a  9D 33 03    STA $0333,X		<-- Copy Code into Tape Header
.C:016d  CA          DEX
.C:016e  10 F7       BPL $0167			<-- Relocate Code from $229 to $333 for #$14 Bytes

.C:0170  A9 60       LDA #$60			<-- RTS Instruction
.C:0172  8D F2 03    STA $03F2			<- Store RTS instruction End of Tape Header
.C:0175  A9 33       LDA #$33			<-- Patch ILOAD Vector to $333
.C:0177  8D 2E 03    STA $032E
.C:017a  A9 03       LDA #$03
.C:017c  8D 2F 03    STA $032F			<-- ILOAD Vector
.C:017f  A9 25       LDA #$25
.C:0181  85 AF       STA $AF			<-- Pointer to current filename
.C:0183  A9 02       LDA #$02
.C:0185  85 B0       STA $B0			<-- Pointer to current filename $225
.C:0187  A9 04       LDA #$04
.C:0189  85 AB       STA $AB			<-- Length of Filename "NOVA"
.C:018b  20 3C 03    JSR $033C			<-- Load the Next Part

This next section is important. Remember earlier on we looked at the TAPE HEADER at address $0333

Did you notice something odd?

>C:0333  40 01 40 02  4d 52 20 50   @.@.MR P
>C:033b  55 4e 49 56  45 52 53 45   UNIVERSE
>C:0343  20 20 20 20  20 84 a2 84        .�.
>C:034b  a1 20 c7 03  10 fb 20 c5   � �..� �
>C:0353  03 c9 aa d0  f4 20 c5 03   .ɪ�� �.
>C:035b  c5 ab d0 ed  c4 ab f0 0a   ū��ī�.
>C:0363  20 be 03 d1  af d0 d7 c8    �.ѯ���
>C:036b  d0 f2 a0 fa  20 be 03 99   ���� �..
>C:0373  a1 FF c8 d0  f7 20 be 03   ����� �.
>C:037b  c5 a1 d0 1f  a5 a0 8d 19   š�.��..
>C:0383  FF f0 1a e6  9c c6 a0 d0   ��.�.Ơ�
>C:038b  06 a5 9f f0  10 85 a2 20   .�.�..� 
>C:0393  be 03 91 9b  c8 c4 a2 d0   �...�Ģ�
>C:039b  f6 f0 da a9  30 85 90 a9   ��ک0..�
>C:03a3  05 ca d0 fd  88 d0 fa e9   .���.���
>C:03ab  01 10 f6 a9  88 85 01 8d   ..��....
>C:03b3  fc 07 8d 3e  FF a6 9d a4   �..>��.�
>C:03bb  9e 58 60 18  a5 a1 65 a7   .X`.��e�
>C:03c3  85 a1 a9 7f  48 98 48 ad   .��.H.H�
>C:03cb  19 FF 49 7f  8d 19 FF a2   .�I...��
>C:03d3  8f a0 01 a9  10 24 01 f0   .�.�.$.�
>C:03db  fc 24 01 d0  fc ad 03 FF   �$.���.�
>C:03e3  8e 02 FF 8c  03 FF 0a 68   ..�..�.h
>C:03eb  a8 68 6a b0  d7 85 a7 00   �hj��.�.

(C:$0403) m$225
>C:0225  4e 4f 56 41  a6 ae e0 01   NOVA���.

Ordinarily this contains dead space after the filename (Filled with $20), however the designers utilised a trick of hiding/burying additional code in the actual tape header! Pretty neat, since a tape block is only 256 bytes, the designers took advantage of the fact that 448 Bytes (256 Byte Data Payload + 192 Cassette Header Payload) were available. You do lose some bytes as you need information on load address and end address along with Actual Filename. The designers reserved 17 bytes of Filename Space for their system to make it easier for developers to implement.

The code at $165 to $170 restore the missing part of the loader code for 20 bytes, and adds an RTS instruction at the end of the cassette buffer+1 ($3F2) to complete the actual real turbo loader!

Code from $175 to $17F Patches the Kernels Indirect Address to point to turbo loader code at address $032E/$032F. This ensures the Kernel calls $333 instead if a user types “LOAD” at the BASIC prompt. Making this all transparent to game developers.

We’ll go through the loader code later, the “Patched” Vector contains a check to determine if Turbo Load is enabled, if not, the original Kernel routines are used to load the next program instead. Pretty neat stuff!

Code from $17F to $189 sets the pointer to the FILENAME ($225) and Length of the FILENAME using memory addresses used by firmware. The filename is a length of 4 bytes and called “NOVA”.

Finally a call to $33C to load the program Turbo Stylee!

First Turbo Load Complete

After the call to $33C you’ll see the familiar flashing screen before control is returned to $18E

;
; On return X = LSB of Memory Address of Last Byte Loaded
;           Y = MSB of Memory Address of Last Byte Loaded
;
.C:018e  A5 90       LDA $90			<-- Load Last Kernel IO Status
.C:0190  D0 21       BNE $01B3			<-- If an Error - Destroy the loader and Quit
.C:0192  86 2D       STX $2D			<-- Store Last Byte of Data Loaded to start of BASIC Variables LSB
.C:0194  84 2E       STY $2E			<-- Store Last Byte of Data Loaded to start of BASIC Variables MSB
.C:0196  20 F2 03    JSR $03F2			<-- Seems Redundant - unless to ensure system unpatched.
.C:0199  20 E1 FF    JSR $FFE1			<-- Has the STOP Key Been Pressed?
.C:019c  F0 15       BEQ $01B3			<-- If So Destroy the code.
.C:019e  A2 05       LDX #$05
.C:01a0  86 EF       STX $EF			<-- Set Keyboard Queue to 5
.C:01a2  BD AD 01    LDA $01AD,X		<-- Load BASIC Command 
.C:01a5  9D 26 05    STA $0526,X		<-- Store to Keyboard Buffer
.C:01a8  CA          DEX				
.C:01a9  D0 F7       BNE $01A2			<-- Loop Five Times
.C:01ab  4C 0D 80    JMP $800D			<-- BASIC routine to process keyboard 


(C:$00f0) m$1ae
>C:01ae  52 55 4e 3a  0d a2 00 9d   RUN:.�..

Finally the code checks to see if the program loaded correctly by looking at the status in address $90, if non-zero a self destruct routine is initiated to remove the loader code from memory. Not very nice! The same routine is called if the user pressed the STOP key too, halting further load of the program/game.

.C:01b3  A2 00       LDX #$00
.C:01b5  9D 00 02    STA $0200,X
.C:01b8  CA          DEX
.C:01b9  D0 FA       BNE $01B5
.C:01bb  EE B7 01    INC $01B7
.C:01be  D0 F5       BNE $01B5
.C:01c0  6C FC FF    JMP ($FFFC)

On return from the LOADER code at $33C, the X and Y registers are set to the last memory address the payload was stored. This is set in the BASIC Variables memory space addresses at $2D/$2E.

Finally five bytes are copied from memory to the keyboard buffer to simulate the user typing “RUN:” and then calling the BASIC Kernel routine to process keyboard input. Effectively RUNing the basic program that was loaded.

If you tried to inject a RTS instruction here you’ll end up with corruption most likely. Why? Remember earlier the System Stack was deliberately Corrupted by the loader code?

Simple By Pass

The simple by pass if you want to see the basic program loaded in memory is to substitute the command “RUN:” with “END” or even “LIST“

(C:$027e) > $1AE "LIST"
(C:$027e) > $53B 2
(C:$027e) m $1ae
>C:01ae  4c 49 53 54  0d a2 00 9d   LIST.�..

Continuing execution of the program will result in blank screen? What gives? Ah… Remember the program set the background and ink colour to Black? That’s why…

You can either press CTRL+2 to change the INK colour or POKE 1339,2

The listing displays a simple Splash Screen and continues to LOAD the next program into memory.

**The initial BASIC Loader and Splash Screen**

But wait? it’s BASIC LOAD so why is it able to turbo load the game portion? Remember earlier in the code the ILOAD Indirect Vector was patched? Addresses $32E/$32F contain the actual address the Kernel requires to LOAD data. This was updated to point to the turbo loader at address $0333. The beauty of this meant that game developers and producers didn’t need to modify the original game code to account for the turbo and protection system implementations, allowing them to focus on writing games and not worrying about accommodating changes to original code!

**Continuing the LOAD Sequence in BASIC.**

The game code loaded as expected, however if you try to LIST the program you may encounter an out of memory error since the game code loaded to the top of memory at $3FFF on an unexpanded C16.

(C:$d90c) m $1001
>C:1001  0b 10 00 00  9e 36 39 30   .....690
>C:1009  30 00 00 00  00 00 00 00   0.......

If you type RUN the game will start as expected, though you’ll be wanting to know the actual start address of the game.

Enter monitor mode again and DUMP the memory at address $1001 – Start of BASIC

It’s obvious here that the start address is 6900 (decimal) $1AF4 but let’s break that down.

The first two bytes point to the next address of the next line of the BASIC program ($100B)

The next two bytes ($00 $00) is the BASIC Line number, in this case LINE 0

The remaining bytes are the BASIC command. SYS is tokenised as $9E followed by the address 6900

Every line is terminated with a NULL byte, and end of program is terminated with double NULL bytes.

Brief Recap

The short version is the NOVALOAD system overwrites the system stack when loading the initial block of data to force execution of the loader code without user intervention and try to fool crackers of the day into giving up!

The loader payload was stored in both the DATA portion of the cassette block AND the Cassette HEADER. Pretty ingenious for something in the early 80’s

NOVALOAD Contained a few basic checks to prevent tampering, resulting in the program wiping itself from memory.

It also fulfilled the brief of significantly improving loading times for games of the day and more beneficial than the default Kernel provided routines.

How Do We Bypass this Protection System

Every ~~hacker~~ security engineer develop their own techniques and tricks, you may want to write a routine to force loading the code to different memory address and that will work quite well. I’m still rusty compared to my skills back in the 80s on this machine, however I found that the Kernel routines refused to relocate/load the data at the address I specified. Maybe because I’m not addressing the ROM correctly? I’m sure smarter C16 users will remind me know how to achieve this.

My approach, and pretty much the same approachI used back in the 80s was (I believe) elegant and worked without relocating any of the original code on load. How? By simply shifting the Stack Pointer to below the start of Loader code to $13F

You can find the full code available on my Git Repository for Commodore Projects: https://github.com/muckypaws/RetroCheats

*=$3000
;
; Preserve Original stack
;
     SEI               ; Disable Interrupts
     STA     RestoreA+1
     STX     RestoreX+1
     STY     RestoreY+1; Preserve A,X,Y Registers
     TSX               ; Save SP to X
     STX     stack     ; Store Stack Pointer In Memory
     LDY     #$00      ; Set Offset Counter to 0
stackSave
     lda     $100,x    ; Get Stack Data
     sta     StackData,y ; Preserve Stack Data
     iny               ; Increment Y Offset
     inx               ; Increment X Stack Offset
     CPX     #$00      ; Has X Reached 0?
     BNE     stackSave ; Loop Until Stack Contents Saved
;
; Set Stack Below Novaloader Code
;
     LDX     #$3f      ; Set Stack Pointer to $3F
     TXS               ; Move X-> SP
;
; Now Load Code
;
     LDA     #$01
     LDX     #$01
     LDY     #$FF
     JSR     $FFBA     ; Set LFS
     LDA     #$00
     JSR     $FFBD     ; Set Filename to 0
     JSR     $FFD5     ; Load the Code
;
; Patch the Loader
;
     LDA     #$4C      ; JMP Op Code
     STA     $19E
     LDA     #<RestoreCode
     STA     $19F
     LDA     #>RestoreCode
     STA     $1a0
     JMP     $203      ; Execute the Loader
;
; Restore original stack
;
RestoreCode
     SEI               ; Ensure Interrupts Still Disabled
     LDX     stack     ; Restore the original Stack
     TXS               ; Set SP to Original
     LDY     #$00      ; Y Offset into Saved data
stackRestore
     LDA     stackData,y
     STA     $100,x    ; Copy Stack Byte for Byte
     iny
     inx               ; Increment Offsets
     cpx     #$00      ; Check if we've looped to $00
     bne     stackRestore
     LDA     #$02
     STA     $53B      ; Set INK Colour to 2
RestoreA
     LDA     #$00
RestoreX
     LDX     #$00
RestoreY
     LDY     #$00
     CLI               ; Enable Interrupts
     rts
stack byte    00       ; Byte to Store Original Stack Pointer
stackData

In it’s simplest form this code :-

Preserves the Stack and Registers, Sets the Stack Pointer below the code load location, LOADs the NOVALOAD Code in at address $140 as per the header, Patches the NOVALOAD to return control to our patch which

Disable Interrupts.
Preserve the registers A, X and Y into self modifying code.
Preserves the System Stack contents from Current Stack Pointer (Usually $F6) thru to $1FF.
Set the Stack Pointer to $3F ($13F In memory).
Use the Kernel to Load the NOVALOAD Code to Memory in it’s original location.
Patch the Loader to return control to our Code when the loader has loaded the next program (Usually a BASIC Loader).
Starts the NOVALOAD System at $203.
Control is returned to our program.
Set the INK Colour to 2 – To show the cursor when control returned.
Disable Interrupts Again
Restore the Stack contents to its original location.
Restore the Stack Pointer
Restore the registers A, X and Y
Enable Interrupts
Return control back to BASIC or the caller of the patch routine.

10 rem novaload by pass (c) jason brooks
20 sa = 12288
30 for n = 0 to 98
40 read a% : poke sa+n,a%: next n
50 sys sa
60 end 
70 data 120,141,91,48,142,93,48,140
80 data 95,48,186,142,98,48,160,0
90 data 189,0,1,153,99,48,200,232
100 data 224,0,208,244,162,63,154,169
110 data 1,162,1,160,255,32,186,255
120 data 169,0,32,189,255,32,213,255
130 data 169,76,141,158,1,169,66,141
140 data 159,1,169,48,141,160,1,76
150 data 3,2,120,169,2,141,59,5
160 data 174,98,48,154,160,0,185,99
170 data 48,157,0,1,200,232,224,0
180 data 208,244,169,0,162,0,160,0
190 data 88,96,0

This is the basic type in which implements the code above. If you want the code to work at a different address you’ll need to re-assemble the final code using CBM Program Studio.

**Control Returned to BASIC, ignore the error message.**

**First Part Loaded and ready to load the next part with a basic command**

**Now loading the main payload under our control!**

What Now?

Hopefully the above information will give you the general idea of how the protection system works and how to bypass it for your own needs, whether it’t to learn or patch an original cassette. It’s really for information since emulation these days means we can halt program code and modify it relatively easily. This gives you an idea/flavour of how we tackled protection systems back in the 80s.

I have to appreciate the skill and efforts of the original developers of this system since the tools we had available were extremely basic at best, usually having to develop our own to work a problem.

In the next section we’re going to delve right into the actual Speed-loading code itself and how it reads data from cassette into memory. I’m impressed with the tightness of the actual Speed-loading code itself.

Again you can find the latest up to date information on my Git Repo: https://github.com/muckypaws/RetroCheats

The Loader Code at $0333

The real Speed Load code is located at $333, I’ve disassembled it and annotated the code as I’ve interpreted it, it may be subject to change which will be on my Git repo. The full code is below and we’ll work on breaking it down, as it’s quite tight and optimised pretty well, I’m unsure any further bytes could be squeezed out of this loader, though nearly 40 years later there may be a trick or two up some gifted coders sleeve!

$0333  A6 AE       LDX $AE			; $AE = Flag for NOVA Load Installed
$0335  E0 01       CPX #$01			; If Set to 1, all good and NOVA Load used.
$0337  F0 03       BEQ $033C		; Checks if Nova Load Vector Initialised 
$0339  4C 4A F0    JMP $F04A		; If Not, Use the original Kernel LOAD  Routine
$033c  78          SEI				; Disable Interrupts
$033d  20 1B E3    JSR $E31B		; Displays "Press Play on Tape" and Waits for PLAY button to be pressed.

$0340  D8          CLD				; Clear Decimal Flag
$0341  A0 00       LDY #$00			; Reset to Zero
$0343  84 01       STY $01			; IO Port Last Port Read/Reset
$0345  8C 3F FF    STY $FF3F		; RAM Select BANK 0
$0348  84 A2       STY $A2			; Bytes to READ = 0 / 256
$034a  84 A1       STY $A1			; Integrity Check = 0
$034c  20 C7 03    JSR $03C7		; Read Cassette Data looking for Header Start
$034f  10 FB       BPL $034C		; Whilst Bit 7 = 0 Continue looking for a 1 Bit.
$0351  20 C5 03    JSR $03C5		; Read one Byte of Data from Cassette
$0354  C9 AA       CMP #$AA			; Is it #AA - Header ID?
$0356  D0 F4       BNE $034C		; No? Keep Looping until found.
$0358  20 C5 03    JSR $03C5		; Read Full Byte of Data from Cassette
$035b  C5 AB       CMP $AB			; Compare #$04 Set in init code (Length of Filename)
$035d  D0 ED       BNE $034C		; Sequence not #AA #04 then start again.
$035f  C4 AB       CPY $AB			; Has Y Reached end of Loop? 4
$0361  F0 0A       BEQ $036D		; If yes - Then Read Program Data
$0363  20 BE 03    JSR $03BE		; Read Next Byte
$0366  D1 AF       CMP ($AF),Y		; Compare next four bytes with $225 ($AF = $52 $02) - "NOVA"
$0368  D0 D7       BNE $0341		; If not header ID then continue looking for header.
$036a  C8          INY				; Y = Y + 1
$036b  D0 F2       BNE $035F		; While Y <> 0 Continue Reading Header
$036d  A0 FA       LDY #$FA			; Only need 6 bytes for Data Info
$036f  20 BE 03    JSR $03BE
$0372  99 A1 FF    STA $FFA1,Y		; Writes to $9B ->$A0 - Header Information
$0375  C8          INY
$0376  D0 F7       BNE $036F		; Loop until Y == 0
$0378  20 BE 03    JSR $03BE		; Read Next Bits 	
$037b  C5 A1       CMP $A1			
$037d  D0 1F       BNE $039E		; Quit Routine and Cleanup
$037f  A5 A0       LDA $A0			; Block Number Currently Loading
$0381  8D 19 FF    STA $FF19		; Set Border Colour to Blocks
$0384  F0 1A       BEQ $03A0		; If Block == 00 jump 3A0 (Finished Loading)
$0386  E6 9C       INC $9C			; Increment MSB of Memory Load Location
$0388  C6 A0       DEC $A0			; Number of Blocks Remaining
$038a  D0 06       BNE $0392		; Load Block of Memory 
$038c  A5 9F       LDA $9F			; Remaining Bytes to Read 1 - 255
$038e  F0 10       BEQ $03A0		; Finished Loading
$0390  85 A2       STA $A2			; Store Final Number of Bytes to Read.
$0392  20 BE 03    JSR $03BE		; Start Block Load 
$0395  91 9B       STA ($9B),Y		; Y Contains Byte to Write 
$0397  C8          INY				; Reached Y? Either EOF or 256 Bytes?
$0398  C4 A2       CPY $A2			; Reached the number of Bytes to Read?
$039a  D0 F6       BNE $0392		; If not continue reading Bytes
$039c  F0 DA       BEQ $0378		; Loop Back for Next Header and Continue Load Sequence 
;
; Forced Delay to allow Tape to overrun a smidge (GAP) before stopping the 
; motor and returning control to the caller.
;
$039e  A9 30       LDA #$30
$03a0  85 90       STA $90			; Kernel IO Status
$03a2  A9 05       LDA #$05			; Delay Loop before switching motor off
$03a4  CA          DEX
$03a5  D0 FD       BNE $03A4
$03a7  88          DEY
$03a8  D0 FA       BNE $03A4
$03aa  E9 01       SBC #$01
$03ac  10 F6       BPL $03A4		; Just Wasting Cycles 
;
; Switch Tape Motor Off, Returning control with X/Y = Last Byte of Data Written
;
$03ae  A9 88       LDA #$88
$03b0  85 01       STA $01			; PORT I/O 
$03b2  8D FC 07    STA $07FC		; Motor Lock
$03b5  8D 3E FF    STA $FF3E		; ROM Select
$03b8  A6 9D       LDX $9D
$03ba  A4 9E       LDY $9E			; $9D/9E = Last Byte Stored from Loader
$03bc  58          CLI				; Enable Interrupts
$03bd  60          RTS				; Return to Caller



$03be  18          CLC				; Clear Carry
$03bf  A5 A1       LDA $A1			; Integrity Check
$03c1  65 A7       ADC $A7			; Last Read Byte
$03c3  85 A1       STA $A1			; Store next result
$03c5  A9 7F       LDA #$7F			; Start of Bit Loading %01111111
$03c7  48          PHA				; Preserve A
$03c8  98          TYA
$03c9  48          PHA				; Preserve Y on stack 
$03ca  AD 19 FF    LDA $FF19		; Get Current Border Colour
$03cd  49 7F       EOR #$7F			; Invert bits 0-6
$03cf  8D 19 FF    STA $FF19		; Write Border Colour To Screen
$03d2  A2 8F       LDX #$8F			; TED Timer 2
$03d4  A0 01       LDY #$01			; TED Timer 3
$03d6  A9 10       LDA #$10			; A = %0001 0000
$03d8  24 01       BIT $01			; Read Input Register
$03da  F0 FC       BEQ $03D8		; While Bit 4 = 1 Loop 
$03dc  24 01       BIT $01			; Read Input Register
$03de  D0 FC       BNE $03DC		; Looking for Bit Toggle 1->0 on Bit 5
$03e0  AD 03 FF    LDA $FF03		; TED Timer
$03e3  8E 02 FF    STX $FF02		; Write $8F to Register - Timer 2
$03e6  8C 03 FF    STY $FF03		; Write $01 to register - Timer 3
$03e9  0A          ASL A			; Set Carry Flag With Timer 3 (1 or 0 bit)
$03ea  68          PLA				; 
$03eb  A8          TAY				; Restore Y from the stack
$03ec  68          PLA				; Restore A from stack
$03ed  6A          ROR A			; Rotate Bits Right - Bit 0 -> Carry
$03ee  B0 D7       BCS $03C7		; If Carry Set Rinse and repeat.
$03f0  85 A7       STA $A7			; Otherwise Store A in $A7 
$03f2  60          RTS				; Return Control

Remember the ILOAD Vector was updated to point to $0333 The turbo loader?

$0333 – $0339 Checks to see if the Turbo Loader is enabled via a Flag at $AE, if it’s not set to the value 1 then the loader returns control to the Kernel Load Routine at $E31B, doing so will of course destroy the NOVALOAD system as it resides in the Tape Buffer which will be overwritten by the Kernel.

$33C, The start of the real loader code.

$33C – $34A Initialises the Loader

Disables Interrupts
Display the “Press Play on Tape” Message and Wait for PLAY to be pressed (If it’s not already).
Resets the DATA PORT at $01 (Writes 00)
Set the RAM Bank to 0 (This is for PLUS Users or Memory Expansion Packs installed) to ensure the program loads into main memory only.
Resets the Last Byte Read from Cassette and Integrity Check Byte to Zero.

$34C-$378 Checks for the Tape Header Data.

Read a Bit from the Cassette Port and Search for BIT 1, Continue Looping whilst 0 Bits read.
- At this point the cassette is spinning up waiting to reach speed, data read on the port will be random using a real device at this point.
- The loop is waiting for it’s first hit on a 1 Bit Signal, which we’ll go into detail later.
Read One Full Byte from the Cassette Port
Check the Byte Read is $AA or %10101010
- This represents a “Pulse Train” used to help calibrate the binary 1’s and 0’s read from the datasette and to ensure the datasette is up to speed. The authors could have used any byte identifier, though if you’ve studied networking at the electrical signal level back in the 80s you’ll recognise pulse trains are very common methods for training and syncing signals.
If the Byte read wasn’t $AA we loop back to the beginning at $34C to search for another 1 Bit on the Datasette port.
Another Byte is read from the Datasette, this time looking for the Value $04 which is the length of the filename “NOVA”
- If the Sequence $AA $04 is not found (The File Header) then we loop back to beginning at $34C and repeat the process. The idea being that sometimes errant data as the datasette is speeding up could match one byte, two bytes less likely.
The next few instructions between $35F and $36D check to see if Y = $04 (Length of the filename)
- Y is Incremented if the subsequent bytes read match “NOVA”, if not the counter is reset to 0 by looping back to $34C to continue looking for the header.
- This helps mitigate reading data mid datablock to a large extent, of course as long as the program code doesn’t match the same sequence and the tape happened to be fast-forwarded to that exact location.
Essentially the entire Tape Header consists of the following six byte.
- $AA $04 “NOVA”
Next up Six bytes are read which consist of the Tape File Information.
- The Developer sets Y = $FA
- Incrementing Y results in a six times loop until Y = 0
- Data Read is stored at $FFA1,Y
  - Since the data is indexed by Y, the real location written to is $9B->$A0
During each Byte READ, A sanity check is performed to ensure integrity of the last byte read against the previous bytes, a kinda poor mans CRC if you will. If the check fails, the program self-destructs.
The NOVALOAD Tape Info Header information is shown in the table below.

Byte	Usage
$9B/$9C	Data Location Vector (Where to write data)
$9D/$9E	Last Byte Loaded Address
$9F	Number of Bytes Remaining in the Block, 0-255 Bytes
$A0	Number of Blocks to Load

NOVA Load Tape Info Header

$378-39C Check if Code Finished Loading

Check the Contents of $A0 to see if we’re at Block 0
Set the Screen Border Colour to the Block Number
If Block 0 Jump to $3A0 to start the code up.
If not Block 0
- Increment the MSB of the Tape loading address at $9C
- Decrement the number of blocks left to load at $A0
- Check remaining bytes to read
  - If Zero, then start the code up at $3A0
- Otherwise store the remaining bytes in $A2 to read
A loop is performed to continually load data and store at address vector ($9B) until all block bytes are read.
Loop to $378 to continue load process.

$39E-$3BE – Cleanup, Waste Cycles, Stop the Datasette Motor

This section is probably the easiest of the Turbo Loader mechanism. It switches the Kernel IO Status to completion, and sets up loop to add a delay of around 1 second before switching off the Datasette Motor.
X and Y contain the last address data was loaded.
Interrupts are re-enabled
Control is returned to the called or the tape loader, whether it was BASIC or Assembler language.

The Actual Loader.

That’s a lot of information to digest, plus this is the basics of the tape loader at a higher level, but… wait… there’s more!

The code needs to recognise which signals from the Datasette are 1’s and 0’s. We’ll break that down, though to help NOVALOAD utilised a neat trick from the C16 TED Chip. There’s a set of counters that continually count down located at $FF00 – $FF06. Two of these at $FF02 and $FF03 are meant for user utilisation.

You guessed it, these counters are set and used to time the length of 1 and 0 signals. For illustrations, a very crude diagram below, shows a “Perfect” signal over time and a voltage read on the Datasette Port. If you had an oscilloscope you would see a slant in the rise from 0v to 3v and an overshoot (Noise) from the datasette circuitry.

The code sets the user time to 399 which will continually count down on each cycle, for PAL machines 884Khz and NTSC Machines at 894Khz.

The developers had to take this variance into consideration. A Zero bit (The time between a rise and fall of the Signal from 0 to 1 to 0) needs to be less than 400 cycles, a 1 Bit will take more than 400 cycles of the counter. This results in a 1 Bit Duration forcing the TED Clock Timer to cycle backwards from $0189 -> $Fxxx

The trick is now simple, read the MSB of the Timer and shift the BIT7 into the Carry Flag. If the Timer took less, then the Carry Flag is Reset, and if it took longer, the Carry is set. I.e. $00 or $01 in $FF03 results in Carry Not Set, and $FF results in Carry Set.

The final trick is to rotate the carry flag into the data register for all eight bits before returning a full byte back.

Make sense?

Let’s take a look at the final leg of code.

The Actual Bits READ Routine

The main loader called different parts of the loader routine depending on whether it was looking for a Bit 1, Read a whole single byte, read a whole byte and update the integrity check byte as follows :-

$3C7 – Read the next BIT of data
$3C5 – Read a byte of data
$3BE – Read A Byte of Data and update the Integrity Check

$03be  18          CLC				; Clear Carry
$03bf  A5 A1       LDA $A1			; Integrity Check
$03c1  65 A7       ADC $A7			; Last Read Byte
$03c3  85 A1       STA $A1			; Store next result
$03c5  A9 7F       LDA #$7F			; Start of Bit Loading %01111111
$03c7  48          PHA				; Preserve A
$03c8  98          TYA
$03c9  48          PHA				; Preserve Y on stack 
$03ca  AD 19 FF    LDA $FF19		; Get Current Border Colour
$03cd  49 7F       EOR #$7F			; Invert bits 0-6
$03cf  8D 19 FF    STA $FF19		; Write Border Colour To Screen
$03d2  A2 8F       LDX #$8F			; TED Timer 2
$03d4  A0 01       LDY #$01			; TED Timer 3
$03d6  A9 10       LDA #$10			; A = %0001 0000
$03d8  24 01       BIT $01			; Read Input Register
$03da  F0 FC       BEQ $03D8		; While Bit 4 = 1 Loop 
$03dc  24 01       BIT $01			; Read Input Register
$03de  D0 FC       BNE $03DC		; Looking for Bit Toggle 1->0 on Bit 5
$03e0  AD 03 FF    LDA $FF03		; TED Timer
$03e3  8E 02 FF    STX $FF02		; Write $8F to Register - Timer 2
$03e6  8C 03 FF    STY $FF03		; Write $01 to register - Timer 3
$03e9  0A          ASL A			; Set Carry Flag With Timer 3 (1 or 0 bit)
$03ea  68          PLA				; 
$03eb  A8          TAY				; Restore Y from the stack
$03ec  68          PLA				; Restore A from stack
$03ed  6A          ROR A			; Rotate Bits Right - Bit 0 -> Carry
$03ee  B0 D7       BCS $03C7		; If Carry Set Rinse and repeat.
$03f0  85 A7       STA $A7			; Otherwise Store A in $A7 
$03f2  60          RTS				; Return Control

We’ll go through this line by line, starting at the very top, read a whole byte and update the Integrity Check Byte.

Clear the Carry Flag
Get the contents of $A1 – Integrity Check Byte
ADD the last byte read from that datasette, and store it back in $A1
Load the Bit Pattern %01111111
- The way the loop is constructed this will actually load all 8 bits of data using the final bit to know the byte load is complete.
Push A and Y onto the Stack.
- A Contains both the actual byte read and the number of bits remaining to load!
- Y contains the number of bytes remaining to read, not used here but preserved for the higher level loader.
Get the current Border Colour and Inverse the lower 6 bits, writing back to the Screen Border Colour Register
- This gives you the flashing border you’re familiar with.
X, Y and A are initialised.
- X = $8F the LSB of the TED Timer
- Y = $01 the MSB of the TED Timer
  - The ted Timer will be set to 399 however 400 cycles is required to count down to $FFFF
- A is set to Bit 4 High the register we’re interested in reading from $01 – Datasette Signal.
The code sits and waits checking the following
- While Bit 4 of $01 is set, loop until it becomes 0
- While Bit 4 of $01 is not set, loop until it becomes a 1
This essentially wastes cycles looking for a complete rise and fall signal from the datasette.
Read the contents of the MSB of the TED Timer.
Reset the TED Timer to contain the count 399
Left Shift the Accumulator, basically BIT 7 is transferred to the Carry Flag which gives us the BIT 0 or BIT 1 read from cassette.
Restore the original Accumulator and Y register.
Rotate Right the Accumulator which contains the number of bits remaining to load, and the actual bit read from cassette into BIT 7.
- Carry is shifted into BIT 7 and Bit 0 is shifted into Carry
- We’ll loop seven more times to complete the load, since the original mask %01111111 will result in CARRY Not Set on the eight final read.
Loop back to $3C7 to continue reading bits until the byte is complete.
Store the result in $A7
Return control to the caller.

Now that’s quite a lot to take in! Given there’s a lot going on, in a few bytes of code, I do recommend tracing this routine yourself using a modern emulator. Trying to go old-school on a real device will result in overruns as the tape will continue moving whilst you’re debugging.

Added Bonus!

If you’ve made it this far, and you too are a fan of Mr Puniverse, then I worked out a couple of cheats for you!

Infinite Air: POKE 13159,29

Or in Monitor Mode

> $3367 $1D

Infinite Lives: POKE 13327,13

or in Monitor Mode

> $33AB $0D

That’s all Folks!

There’s a lot of info here, if you spot an error, or area that needs clarification, why not drop me a line on the contact page or in the comments below. If you’d like more articles like this, then let me know.

I hope you found this useful, if you did please share the link on social media, via email and help spread the word, every little helps as they say.

Thanks for reading.

Jason x

Yeah it’s begging season, without your support maintaining this site is a hobby, but if you found this article useful, why not spot me a coffee?

Choose an amount

¤2.00

¤3.00

¤5.00

Or enter a custom amount

Your contribution is appreciated.

Donate

Useful Memory Addresses used by NOVALOAD.

$01          Data Read on PORT
$9B/$9C      Code Load Location Vector
$9D/$9E      Last Byte Load Location
$9F          Remaining Bytes to Read
$A0          Number of Blocks to load
$A1          Integrity Check, storing last byte read and adding that to next byte,
$A2          Number of Bytes to read - 00 = 256
$A7          Last Byte Read
$AB          Length of Filename - Typically 4, Filename = "NOVA"
$AE          NOVALOAD System loaded FLAG 1 = NOVALOAD
$B6/$B7      Tape Buffer Location, Usually Contains $0333
$100->$10F   Floating Point Conversion Buffer
$110->$112   Temp Locations for Saving Registers during File Operations
$114->$122   Colour/Luminance RAM Table
$124->$1FF   System Stack
$333->$3F2   Tape Buffer
$333/$334    First Byte of Payload Address
$335/$336    Last Byte of Payload Address
$337         Filename and remaining Buffer
$53B         Current INK Colour (Cursor/PEN)
$1001        BASIC Load Address (Typically)

www.muckypaws.com

From punch cards to port scans, still chasing the trace.

NOVALOAD – Commodore Speed Load and Protection System Hacked.

The Boring Stuff

What you’ll need.

Novaload

Initial Loading

The Loader

First Turbo Load Complete

Simple By Pass

Brief Recap

How Do We Bypass this Protection System

What Now?

The Loader Code at $0333

$33C – $34A Initialises the Loader

$34C-$378 Checks for the Tape Header Data.

$378-39C Check if Code Finished Loading

$39E-$3BE – Cleanup, Waste Cycles, Stop the Datasette Motor

The Actual Loader.

The Actual Bits READ Routine

Added Bonus!

Infinite Air: POKE 13159,29

Infinite Lives: POKE 13327,13

That’s all Folks!

Yeah it’s begging season, without your support maintaining this site is a hobby, but if you found this article useful, why not spot me a coffee?

Useful Memory Addresses used by NOVALOAD.

Translate

Categories

Leave a comment Cancel reply

NOVALOAD – Commodore Speed Load and Protection System Hacked.

The Boring Stuff

What you’ll need.

Novaload

Initial Loading

The Loader

First Turbo Load Complete

Simple By Pass

Brief Recap

How Do We Bypass this Protection System

What Now?

The Loader Code at $0333

$33C – $34A Initialises the Loader

$34C-$378 Checks for the Tape Header Data.

$378-39C Check if Code Finished Loading

$39E-$3BE – Cleanup, Waste Cycles, Stop the Datasette Motor

The Actual Loader.

The Actual Bits READ Routine

Added Bonus!

Infinite Air: POKE 13159,29

Infinite Lives: POKE 13327,13

That’s all Folks!

Yeah it’s begging season, without your support maintaining this site is a hobby, but if you found this article useful, why not spot me a coffee?

Useful Memory Addresses used by NOVALOAD.

Share this:

Related

Translate

Categories

Leave a comment Cancel reply