Old Non-x86 Architectures

B981 3762 1D30 CA05 E2C1 CD7F 3C68 C8DB CBA783EF

http://www.visual6502.org/images/pages/Motorola_68000.html

by Javantea
Oct 2-Nov 8, 2017
Batman's Kitchen Meeting
Nov 8, 2017
https://www.altsci.com/old_non-x86/
Slides || Talk video
Software: old_non-x86-0.6.tar.xz [sig]

Note: This paper was meant to be too verbose to read in one sitting. Watch the videos, read a few bullets, and then read a section when you're ready.

Introduction

In the realm of computing, there is a clear front-runner for desktop, laptop, and server processor: x86 architecture. In battery powered mobile devices such as phones, Raspberry Pi, and such, ARM is a clear front-runner because x86 is inefficient and bloated. ARM supports System-on-Chip (SoC) designs and powerful systems with limited power use. Because of this bias, you can go a long ways on just one architecture. If students focus their attention on x86 or ARM reverse engineering and exploitation, they will have a simple blind spot when coming up against challenges that involve other architectures. In this paper, we try to understand why. Qemu supports aarch64 alpha arm cris i386 lm32 m68k microblaze microblazeel mips mips64 mips64el mipsel moxie nios2 or1k ppc ppc64 ppcemb s390x sh4 sh4eb sparc sparc64 tricore unicore32 x86_64 xtensa xtensaeb architectures and MAME supports many more architectures. Some of these architectures are old, and some are so different that it would benefit a person to spend just one hour learning a bit about how these systems work. Another major benefit of learning to program and reverse engineer a different architecture is to interact with hardware in a way that on x86 and ARM only the bootloader, kernel, and ring zero hacker is allowed to do. In 10 lines of code, 1 second of compiling, and 1 second of emulation, a student can write their first kernel-level code. This will come in handy when a challenge requires them to understand the inner working of their system. In the environment of a CTF, challenges can use your bias against you!*

* See The cLEMENCy Architecture for example.

Old Non-x86 Architectures are too hard!

A common misconception is that old non-x86 architectures are too difficult to work with. They use different tools, they have significant limitations, but this should not make them significantly harder. In fact, if you know x86, you should be well on your way to becoming architecture agnostic. M68k Assembly was taught at University of Washington to Physics majors with no computer programming background in 2002. How did I pass that class before I was a hacker if M68k is too difficult to work with?*

* It may mean that I'm talented in assembly level programming, but I think it means that everyone has a chance at learning M68k assembly.

Aside #1: Don't Let Computers

Computers are an easy excuse to not socialize, don't use it! If you're having trouble getting your code to work, ask someone for help. If you can't understand something and you've tried Google, try someone you don't know. A person I was talking to at a local bar learned to program 6502 assembly as a teenager.

The goal in meeting someone is to like them.

What is the goal of this paper?

We want to be able to understand the fundamentals of working with old systems and how they can be used in contemporary systems. This will teach us about the nature of computers and kernel-level programming.

6502 -- NES
M68k -- Sega Genesis
Z80 -- Pacman
8051 -- Ruleta RE-900, coastermelt
6809 -- Robotron: 2084
8086 -- Boot sector

Write a Program.
Emulate it.
Debug it.
Disassemble it.
Reverse it.

While we're going to be emulating these systems, we are intending to model the function of an actual system with some function. If you want a real game or a console, I recommend Pink Gorilla in the U-district or in the Intl-district. They have knowledgeable staff and excellent selection and prices.

Please download these tools and my paper in case you're playing a CTF and need to hack a ROM.
Permalinks: Paper Software: old_non-x86-0.6.tar.xz [sig]

6507 (6502 architecture)

Pitfall was written in 6507 Assembly for the Atari 2600. 6507 is a 6502 architecture processor. The system was very limited in graphics and computer, which made the platform far less successful than its competitors, but the amazing games made surprised players and set the stage for improvements. A good example of a modern game written for Atari 2600 is Ultra SCSIcide by Joe Grand. It has binaries and source code if you'd like to try reversing it. The source code will tell you how well you did.

6502

The 6502 is often hailed as being one of the easiest architectures to program in assembly. The phenomenon of NES, C64, and Apple II programming in the 21st century provides some evidence to this. I am more apt to think that this is because of the ease of programming a limited system suits people better than a complex system like x86.

Systems that feature 6502 architecture:

NES
SNES
TurboGrafx-16
Apple II
Atari 2600
Atari 800
Tamagotchi
Commodore 64
Too many arcade systems
A lot of chips use this architecture and are not named 6502.

6502 Family Tree

                          6502
                           |
        +------+--------+--+--+-------+-------+
        |      |        |     |       |       |
      6510   deco16   6504   6509   n2a03   65c02
        |                                     |
  +-----+-----+                            r65c02
  |     |     |                               |
6510t  7501  8502                         +---+---+
                                          |       |
                                       65ce02   65sc02
                                          |
                                        4510

https://wiki.nesdev.com/w/index.php/CPU_memory_map

Before we work on how to understand assembly, let's focus on the the compiler. C code is a far more human language than assembly in that it can be understood as a set of functions, statements, and variables. Those functions, statements, and variables are widely used in C, C++, C#, Java, Python, JavaScript, PHP, and other easier procedural languages.

CC65

CC65 is a C compiler and assembler targeting multiple systems that use 6502 architecture. You can use it to compile code for any 6502 system, supported or not. What is a C compiler? A simple C compiler needs to take any valid C program (see listing 1) and turn it into a valid assembly program for a certain architecture.

Listing 1: C sample

    int main()
    {
        int i;
        char buf[10];
        for(i = 0; i < 10; ++i) {
             buf[i] = i;
        }
        return 0;
    }

With compilers it became clear that without a valid C library, you'd be up a creek trying to write your own, so each compiler should either be paired with a libc or it will be difficult to work with. But you don't necessarily need libc to write a cool program as we'll see later. CC65 comes with a libc for each system. In order to deal with text on a platform that does tiles and sprites and is very limited, they wrote a library that doesn't do graphics perfectly, but good enough for debug and a little bit of showing off. You can use it, but after a while you'll want to load sprites, create a nice scrolling background and so on or write your own engine. Yes, the NES can do some pretty amazing things. And CC65 has no limitation. How does an 8-bit system handle 32-bit ints? 64-bit ints? Floats?

int i;

So that's step 1: Write a program.*

* Use volatile and learn what it does. Trust me.

How does volatile work? The way that you talk to audio or visual hardware in the system is the same way you talk to RAM on a system. That is you can read and write to actual hardware by address. In 6502 assembly that looks like:

      LDA #4000
      STA #4000

How does the compiler know what to optimize out and what not to optimize out? In a C compiler, it can optimize anything it wants.

a = 43;
a = 42;

What is the value of a? It's 42, so why should the compiler write 43 to a? If a isn't volatile, the compiler won't (assuming it's optimizing correctly). If a is volatile, the compiler will. This makes it possible for you to talk to hardware in such a way that allows you to use the time dimension to give complicated sequences of data to a single address. We'll make that very clear as time goes forward. This is why it's much more reasonable to put a sleep into an embedded program than it is to put a sleep in a desktop program.

libc

_start libc_start_main
stdio.h: printf open read write printf
unistd.h: memcpy memcmp
stdlib.h: malloc free
string.h: strcpy strncpy strcat strncat
socket.h: socket bind htons ptoa itoa getaddrinfo
...

What would happen when you decide to turn off libc?

The first thing that would go wrong is printf. If you don't need to print or draw anything, lucky you. The second thing is file access. If you have no files, great. The third thing is user input. If you need to take in input, how do you get it? getch? Joystick? Where is the Joystick? In NES, the joystick can be read using memory-mapped IO. It was nice to have a joystick driver because I couldn't figure out how to read from the joystick in the time allotted. cc65 gives you a simple joy_read function to call. With a C compiler and an assembler, you can write your own libc. The reason that people don't is because it takes time. This becomes a theme. Given weeks of time, you could hack quite a lot of things you've never seen or heard of before, but in the time span of a CTF only the prepared will prevail.

Common Pitfalls

All programming has pitfalls, in writing this paper I ran into many. Here are some:

The program you write isn't exactly the assembly the compiler generates (compiler bugs).
The assembly you write isn't exactly the machine code the assembler generates (assembler bugs).
The machine code the assembler generates isn't exactly the machine code the linker writes (linker bugs).
The machine code the emulator executes isn't the same as the machine code that you give it (emulator bugs).
The hardware the emulator is emulating isn't the same as the hardware that you are targeting (emulator bugs).
System ROM might not be correct for all systems.
Compiling someone else's code for a new architecture can be fraught!
Endianness, Von Neumann, stack size, dependencies, colors, int size, floating point, speed, ROM
Think you can compress your executable with gzip? Only if you have enough space in RAM and you can exec there!
Encryption faces the same problem unless you have hardware.
Ask me why I wrote these roms from scratch besides init.

Why didn't I just copy someone else's code? If you copy something, there is no guarantee that you understand what's going on. This paper is about understanding what's going on.

How do we know what is going on? The first option is to guess what will happen and then test. If your guess is correct, then you are either lucky or you understand the process at some level. In order to prove that you understand the process, you should be able to predict many things and have a significant number turn out to be true. This type of trial and error programming is very widely used from web development to embedded systems. In the heart of programming, there must be some point of trial and error when you break new ground that you and other people have never done before.

It's worthwhile to take a look at where our guesses do not hold up to reality.

Does MAME accurate draw its emulated games frame by frame?
No. Billy Mitchell was caught cheating at Donkey Kong using MAME because the emulator draws screens incorrectly.
You can jump to anywhere in the code using the go command.
While this is true, it comes with the caveat that it will most likely break the game in odd ways. Try it for yourself.
Hacking old games is easier than hacking new games.
While there is certainly less memory to look at, finding the address of an important value is just as difficult on any system. Newer games are often more verbose due to lighter memory requirements, so there is a tradeoff.

Mednafen vs. MAME

Mednafen has a debugger and is nice to use. It appears to be designed to fill in the gap where a piece of code isn't working or you don't know how a piece of code works. The key bindings are quick and you can skip to a memory address quickly. An obvious downside of Mednafen's debugger is its lack of features. You can't set a breakpoint without visiting the address. You can't set a watchpoint. The sub-byte disassembly is clumsy.
MAME has a featureful debugger with file output and a scripting language (Lua). While it's interface is not very good, it's possible that the Qt debugger could be coming to Linux and improvements to MAME are frequent.
MAME's scripting language made significant improvements in a recent version in regards to automated debugging.
Both Mednafen and MAME give you step 2 (emulate it) and 3 (debug it).

Mednafen doesn't allow you to export disassembly, so it's a lot weaker than MAME. MAME is less easy to work with.
Some things you'll probably want to learn about a new architecture:

Where does it start?
Where are the interrupts?
When do you first see graphics?
What instruction causes graphics to change?
When you break somewhere, how long does it take to reach that break again?

Radare2

Radare2 supports a large set of non-x86 architectures. But Radare2 has bugs. It can help you reverse a lot, but only if it works. There are bugs in many old versions, so I heartily recommend compiling from git master or the most recent release.

A list of very useful commands:

  # Start radare disassembling a 6502 architecture file. 
  r2 -e asm.arch=6502 file.nes
  # Start radare disassembling a M68k architecture file.
  r2 -e asm=m68k file.md
  # Start radare disassembling a 8051 architecture file that is encoded in Intel hex format.
  r2 -e asm.arch=8051 ihex://harvard1.ihx
  # Start radare disassembling a z80 architecture file.
  r2 -e asm=z80 file.bin

  # Analyze all functions (it sometimes helps to run this twice)
  aaa
  # Print disassembly of the current function.
  pdf

  # write disassembly to a file
  pd >harvard1.dis
  # Visual mode
  V
  # Switch to the next visual mode
  p
  # Graph mode
  V
  # If it complains, define a function
  df
  # go to the top of anything
  g
  # go to the bottom of anything
  G
  # exit
  q

You might notice that it's trying to be vim. That means if you're comfortable moving around using hjkl, you can use that.

6502

Examples of 6502 in CTF:
Pwn Adventure Z from CSAW
Compromising a Linux desktop using... 6502 processor opcodes on the NES?!
Hacking Time from CSAW CTF 2015 [.kr]
Juniors CTF 2016 - Joy500 Oldschool NES Rom Write Up

If you look at the source code included with this paper, you'll find a C program compilable with CC65. It uses libraries specific to CC65 but should be portable to other compilers and libc implementations because CC65 is not too far from the C specification. In nes/src/nsf7l.c you see a simple main function where we call init() and then we play music by setting values in the APU in a while loop. Also in the while loop is a call to cprintf which prints characters to the screen. Also found in the nes/src/ is hello.s which is an assembly program that has a similar structure to nsf7l.c. music.s is just a stub, so hello.s will not actually play any music if you get it to compile and run.

Before writing nsf7l.c I wrote a few programs that worked at the 6502 machine code level. By concatenating bytes (using assembly and guides) I was able to create a working nsf file, which is an NES rom that executes 6502 instructions but only plays music. By writing the bytes I was able to bypass both assembler and compiler, providing myself a truly bootstrap experience, though instead of using a keypad to enter hex into memory or something like that, I used a fully functioning desktop computer with Python and gigabytes of RAM -- most of which was unnecessary except for convenience.

nsf7l.c plays music in 105 lines of C with a tiny libc. Can you do that in Linux, Windows, or OSX? By making x86 and ARM systems more powerful, we have also made it more difficult to write a proof of concept.

The bootstrap experience:
Once you write an assembler in machine code, the next step would be to write a compiler in assembly. You can then use your machine code assembler to compile assembly programs to machine code. You might also consider porting your machine code assembler to assembly since there might be bugs in your machine code assembler that you can't see because it's bytes in memory. Once you had a C compiler that could compile even the simplest subset of C, you could write your compiler and assembler in C and gain the benefits from there on. Then of course, you would endeavor to make your C compiler more complete as it's pretty obvious that you wouldn't have an operating system or robust file system once you had a compiler.

If you look at the function init() in nsf7l.c you can see

APU.status = 0xf;

What does this do? In nes.h, there is a line that describes APU as a struct at address 0x4000.

#define APU             (*(struct __apu*)0x4000)

This means that the write to APU.status will write the value 0xf to address 0x4015. At address 0x4015, there is an APU which controls the audio. By writing to that address, you are controlling a chip. While you may be writing to memory, don't expect that every memory address is memory.

Here are a handful of answers to the questions we asked at the beginning of this section.

Where does it start?
NES starts at 8000.

Where are the interrupts?
In nes.cfg you find hardware vectors which are at the end of the 2nd 8K ROM.

    # Hardware Vectors at End of 2nd 8K ROM
    ROMV:   file = %O, start = $FFFA, size = $0006, fill = yes;
...
    
    VECTORS:  load = ROMV,            type = rw;

In reset.s, you can see that these values are set to functions in reset.s:

.segment "VECTORS"

    .word nmi	;$fffa vblank nmi
    .word start	;$fffc reset
   	.word irq	;$fffe irq / brk

When do you first see graphics?
You see graphics after you write to addresses around 0x2000.
What instruction causes graphics to change?
A write to address 0x2007.
When you break somewhere, how long does it take to reach that break again?
For most points in our main function, one frame.

M68k aka M68000

Motorola's 68k or 68000 is one of the best CPUs in my opinion, the Genesis alone was enough to convince me to work on this paper. Since it was my favorite, I spent a lot of time on it.
The M68k has a ton of history. I won't get into it but if you take a look at the systems that feature it below and you search the web for current uses of the architecture, you'll see that it plays an important part in the historical computing market.

Systems that feature M68k architecture:

Sega Genesis
Atari Jaguar
Apple Macintosh
Neo Geo
Amiga
Atari ST
TI-89
SUN workstation
Too many arcade systems

As of the writing of this paper, I found no examples of M68k in CTF.

In src/md/loop2mustestcm.c you find my finest creation of this talk: an almost functional music system with 16 steps that allows you to change the registers of the YM2612. It also draws to the screen with a custom puts function that I had to write in assembly. It has functions like play_note and sleep. If you look close, these building blocks are what you need to turn C without a good libc into a musical graphical demo! But how?! Let's take a very close look at our code and try to understand the patterns in use.

Features:

puts(const char *data) and sleep(void) in 68k assembly (loop2mustestc.s)
music_init(void), play_note(uint16_t freq), stop_music(void) in C
clearScreen(void), WaitVBlankStart(void), WaitVBlankEnd(void), btohex(char *dst, uint8_t src) in C

volatile uint32_t * const VDP_DATA = (volatile uint32_t *)0x00C00000;
volatile uint32_t * const VDP_CTRL = (volatile uint32_t *)0x00C00004;
volatile uint16_t * const VDP_STATUS = (volatile uint16_t *)0x00C00004;

0xc00000

0xc00004

	// from https://bigevilcorporation.co.uk/2012/03/23/sega-megadrive-4-hello-world/
	//mov.l #0x40000003, VDP_CTRL
	*VDP_CTRL = 0x40000003;
	//mov.w #0x8F02, VDP_CTRL   // Set autoincrement to 2 bytes
	*VDP_STATUS = 0x8F02;
	//mov.l #0xC0000003, VDP_CTRL // Set up VDP to write to CRAM address 0x0000
	*VDP_CTRL = 0xC0000003;
	//lea Palette.l, %a0          // Load address of Palette into a0
	//mov.l #0x07, %d0         // 32 bytes of data (8 longwords, minus 1 for counter) in palette
	for(i = 0; i < 16; ++i)
	{
		//VDPLoop:
		// mov.l (%a0)+, VDP_DATA // Move data to VDP data port, and increment source address
		*VDP_DATA = Palette32[i];
		// dbra %d0, VDPLoop
	}

	//mov.w #0x8705, VDP_CTRL  // Set background colour to palette 0, colour 0
	*VDP_STATUS = 0x8705;

big evil corporation

*VDP_CTRL = 0x40000003;

mov.l #0x40000003, VDP_CTRL

mov, dbra

At this point, we've figured out most of what's going on, but let's cement the understanding with a practical example of M68k.

      puts("loop2 by Javantea");
      newline();
      puts("Sept 23 - Oct 28, 2017");
      newline();

As you can see, this code is comprised of 4 function calls. A C compiler is allowed to inline as much as it wants, so you might not see this in assembly. So let's look at the disassembly. Objdump works on this, but let's go for Radare2 today.

r2 -e asm.arch=m68k loop2mustestcm.bin
Copyright: SEGA MEGA DRIVE (C)---- 2017.SEP
DomesticName: loop2
OverseasName: l00p2
ProductCode: GM 31337B17-01
Checksum: 0xb481
Peripherials: JD
SramCode: JUE
ModemCode: JUE
CountryCode: JUE
[0x00000200]> aaa

At this point you'll probably go into visual mode with v and then switch through the different modes looking for assembly, which is p.

|           0x0000073a      487900001903   pea.l 0x1903.l
|           0x00000740      45f9000005d0   lea.l fcn.000005d0, a2                                                                                               
|           0x00000746      4e92           jsr (a2)                    ;[2]                                                                                     
|           0x00000748      47f90000047e   lea.l 0x47e.l, a3                                                                                                    
|           0x0000074e      4e93           jsr (a3)                    ;[3]                                                                                     
|           0x00000750      487900001915   pea.l 0x1915.l                                                                                                       
|           0x00000756      4e92           jsr (a2)                    ;[2]                                                                                     
|           0x00000758      4e93           jsr (a3)                    ;[3]

The first question is how did I know that this is the actual code? I noticed the pattern of jsr after I found a few constants nearby. The first constant was 0x40020003 and the second constant was 0x100. With these two constants, it was clear that this is the correct code. That said, we should decompile this code to make certain. 0x1903 is "loop2 by Javantea" and 0x1915 is "Sept 23", so this is correct. Note that Radare2 couldn't automatically detect the strings.

8051 aka Intel MCS-51

Epilepsy warning! This video contains flashing at 30 Hz.

8051 (aka Intel MCS-51) is the most different architecture in this list. Like the others it's still in use in certain applications. One of the most interesting is a USB Bluray drive which was reversed and hacked by Scanlime with full video discussion. 8051 is an 8-bit Harvard architecture microprocessor. Harvard is different from Von Neumann (which is in use in all other processors in this list as well as x86 and ARM) in a very interesting way - there are two address spaces, one for executable program data and another for data. By separating the address spaces, you can't jump to data like you can on Von Neumann architecture CPUs*. To create an 8051 program, I took a look at MAME's supported systems, focusing on systems that have just one CPU which is in the 8051 family. I found re900 which is a roulette wheel (like in a gambling casino) with a video screen. I found documents and the machine code in MAME to be very instructive. It took a while to actually understand how to initialize the video system, but once I had, it was showing all kinds of colors.

* Does this mean it's impossible to exploit a buffer overflow on Harvard architecture? Of course not. If you understand modern x86 and ARM memory protection mitigations against exploitation, you know that you can set program memory to be read only and that is still exploitable using ROP and other techniques where data is used to control program flow. This is true of Harvard architecture CPU programs as well. Many software design patterns result in function pointers being popped off the writable stack and then used as the instruction pointer. The difficulty of exploitation is significantly higher on Harvard architectures, but if you have trouble exploiting, you might understand why if you know that it uses two address spaces.

Popular in arcade and embedded applications
8-bit
Harvard

Systems that feature 8051 architecture:

re900
barata
Samsung SE-506CB external Blu-Ray burner
Many more

While it is popular in systems, it is not easy to get ahold of an 8051 that isn't attached to another system. Why are 8051s used at all? If you look at Digikey's list of microcontrollers with 8051 core you can see that it is cheap, fast, competitive, and partially compatible with code written in 1980. It's not an ARM, but it is a viable architecture. How do you program an 8051? You'll find that information in the documentation.

volatile __xdata __at(0xe000) unsigned char vreg;
volatile __xdata __at(0xe001) unsigned char vram;
volatile __xdata __at(0xe002) unsigned char watchdog;

The above shows how simple the interface to the I/O is. Note that we're using a different C compiler -- SDCC to convert our C code to a binary.

You can see how our game loop works below.

	// Game loop. Write random stuff to video memory?
	while(1)
	{
		vreg = 0;
		vreg = 0x40;
		for(i = 0; i < 0x1000; ++i)
		{
			vram = fair_roll[i % 500];
		}
		/*while(1)
		{
		}*/
	}

It's pretty simple in this case we aren't using the *v = x; mechanism, but instead we're using v = x; because the way the compiler handles memory. The mechanism is the same. Why is this rom so large if the code is so small? This question remains unanswered, can you answer it?

Zilog Z80

Zilog's Z80 was very popular because it compatible with Intel's famed 8080 and was pretty well designed for the time. It was used on many arcade systems. Just grepping MAME for z80 will give you some insight.

Zilog Z80

Systems that feature Z80 architecture:

Sega Genesis (sound coprocessor*)
Sega Master System
Sega SG-1000
Pacman, Donkey Kong
Too many arcade systems

Examples of Z80 in CTF:
Let's Disassemble from SECCON CTF 2014

Zilog Z80

My experiment: Writing a Pacman ROM in 10 hours.
What I spent that time learning: Video RAM picks tiles. (1024 bytes)
Color RAM picks two palettes per tile. (1024 bytes)
Tile ROM is a weird packing of bits. (4096 bytes == 256 tiles)
Palette ROM is a list of colors. (32 bytes)
Program ROM is your code (4096 bytes * 4 == 16kB)
Sound ROM is where you put your music and sfx (?)
ASCII tiles are inefficient, but useful.

In this part I want to introduce a few new concepts. The first is a watchdog. A watchdog is a mechanism on the computer which will count down (when activated) toward resetting the computer. You have to poke it or it will reset. Once you poke it, you're fine until the next interval. The way this helps is if you have a bug in your code that causes an infinite loop, you don't want to have to unplug the computer and plug it back in. Instead of executing undefined behavior, it prefers to reboot. A good reason that important computers should have watchdogs (or similar mechanisms which can cause a computer to exit a while loop if it goes too long) is the ocurrence of bit flipping in hardware. Whether it's caused by overheating, bad hardware, or cosmic rays, your code could do things that you have proven it not to.

You can see a watchdog in use in the code snippet below:

  watchdog = 0;
  // Writing msg to the entire contents of video memory starting at 0.
  video_mem[k & 0x3ff] = msg_trans[(((unsigned int)k) & 0x7fff) % sizeof(msg)];
  //watchdog = 0;
  if((k & 0x3ff) == 0)

Yes, all you have to do is set watchdog = 0 every once in a while.

The second part that I want to draw your attention to is the fact that I'm using unsigned ints on z80. How big is an unsigned int on z80? I'm compiling with sdcc. Unsigned ints are guaranteed to be at least 16 bits even if your processor is 8 bit. See how C compilers are useful? You can use 16-bit ints on an 8-bit cpu.

The third part that I want to draw your attention to is the fact that I'm using if(k & 0x3ff) == 0) to delay some branch of the code. Simply put, a person writing a rom could delay printing the flag until the player has beaten the kill screen. A person writing a rom could prevent the player from accessing the switch that turns on god mode by using a sequence of button presses. This is the concept of how a game master gets to choose how the players interact with the world. But as a hacker, you don't have to play the game. You can modify the game so that instead of branches occurring when the game master wants them to, they occur when you want them to. How do you do this? Learn assembly, overwrite a branch. If a game master is really smart, they can make it so that if you try to overwrite a branch, the system will fail. In my Pacman ROM, there is a simple anti-cheat mechanism. See if you can find it and defeat it. It's not obvious from the ROM, so take a look at the source if you don't see it in the ROM. If you can modify the ROM substantially (say hook the start so that it says your name across the top) and there is no change in the gameplay, you win.

I didn't have much trouble with Z80 because it is pretty similar to other hardware and x86. I hope you also enjoy Z80.

M6809

Robotron

Systems that feature 6809 architecture:

Williams Arcade
Robotron

Examples of Robotron in CTF:
Church of Robotron at Toorcamp!

How do we go about actually using a fully reversed ROM? In this section, we'll take a look at the excellent reverse engineering effort of Robotron by Scott Tunstall. If you search for KILL in the reverse engineered source, you can see that there's a function called KILL_PLAYER at position 30EF. That function is called by PLAYER_COLLISION_DETECTION. But notice how the code is organized. It doesn't return (RTS) and the function that calls it jumps to it (BNE) instead of calling it (BSR). So that's not a function, so much as a labeled branch of the code.

In 6809, you have registers A and B just like 6502 and you read and write using LDA, LDB, STA, STB, and so on.

So now that we're clear on what's going on, let's try to use this information to subvert the activity of this ROM. First, we can break on the kill branch. That will stop execution once we touch something instead of executing the kill branch, it will stop execution.

The first think we can do in mame and many other debuggers is memdump. For Robotron, you will get RAM, which is valuable in hacking processes. Not only can you understand the usage of memory, but you can take a look at what you can control. As you write your exploit, you will probably want to check as time goes on breaking closer and closer toward your final goal. In this case (getting killed), you might not actually gain anything from a memdump.

Another incredibly useful command for hacking and reverse engineering is statesave which will make it possible for you to recreate sometimes incredibly nuanced behavior in ROMs. In this case, you'd end up back at the kill state, which is probably not useful unless you want to save where you have N lives and try again and again until you have made it to the level where the flag is printed. That is a valuable hack, but there are better ways to hack Robotron.

An incredibly useful command that you will almost immediately need is source. Because the Qt debug gui is in progress, it doesn't work on all systems. Since that is true, you'll possibly be working with the imgui interface which doesn't have cut, copy, and paste. A way to fix this is to use a text editor of your choice to create a set of commands you want to run, then use source to execute them. This allows you to run incredibly complex commands once you've got a hang of the debugger.

A very cool way to fix our death state is to save data (saved) and then modify it with a hex editor and then load it (loadd). While this won't stop you from dying, you can give yourself as many lives as you need to complete the game, causing the main problem with the game (difficult) to decrease considerably. That isn't a good way to hack this if you simply aren't coordinated enough to finish the game with many lives (see Pacman), but should be sufficient if you have someone who just needs 20 more lives to get to the flag.

What you want to exploit this situation is to use the do command. The command do is very special because it modifies the emulated system -- down to any value you wish to change. What kind of power you can wield with this command? Let's put a stopper in death.

bpset 30ef,1,do pc=30ec

That should do it. But when we do that, it crashes and reboots. No fair. Let's find out what triggers that reboot. My guess is a watchdog. Yes, if you quickly press F5, you can get into a state where you can't be killed and can still continue. Let's try to beat the game.

So the watchdog is at address CBFF according to the williams.cpp. Let's watch that address. But more important of course is to be able to control this sytem with more precision. Can we get it to continue after it has modified pc?

bpset 30ef,1,{do pc=30ec;g}

As you might tell, this command changes the pc and then continues as fast as it can. But this just gives us an unwinnable state. We want winnable state.

Back to the source. Let's look for someone calling this function. We search for 30B3, the start of PLAYER_COLLISION_DETECTION and we find nothing. So we assume that control flows from above. Above PLAYER_COLLISION_DETECTION we see CHECK_IF_ANOTHER_OBJECT_PRESENT at 3085. Searching for 3085 returns two results. One is a comment of a JSR and the other is a JMP. You can see the JSR goes to 26C6 which is the JMP. So why would they JSR to a JMP? This convention might be necessary for short jumps, but that's just my guess. Remember that 6809 is a 6502, so it has parts that are 8-bit. Looking at the bytes of the instruction JSR, we see that it could indeed JSR to 3085 instead of 26C6. What other reason could explain this behavior? Since it won't win the game for us, we can push this onto our learning stack.

Now we're at 028B. We see that this call is occurring in a function called ANIMATE_FAMILY_MEMBER which is actually not called (so it isn't a function either). This might become a theme, if you understand assembly and the common themes of programming in 1982, you might understand that procedural programming was not in the state it is today. If we look upward, we see that GET_FAMILY_MEMBER_FROM_LIST is called, so that is the function we're in. The naming of this is not great, but let's see how it works.

When you're just starting out without any reverse engineering information, take a look at the command hotspot.

To get information about these commands without wasting CPU and electricity, open up src/emu/debug/debughlp.cpp in a text editor. You're welcome.

To hack Robotron, we look at cheats that have been produced for it. We see in robotron.xml there is a straightforward invincibility cheat. It might look difficult, so let's just take a quick look at what is actually going on.

    <script state="run">
      <action>maincpu.mb@130C2=00</action>
      <action>maincpu.mb@130CE=00</action>
      <action>maincpu.mb@130DA=00</action>
    </script>

The first line writes to memory address 130c2 the value 00. If we look at the assembly, we can quickly understand what this does:

30C1: 26 2C       BNE   $30EF                ; if collision, goto $30EF, KILL_PLAYER

So instead of branching to $30EF, which kills the player, it continues on. There are many ways to do this, but their solution is elegant.

Robotron Cheat

Aside #2: Computers are Wonderful

Computers are able to do things you didn't expect.
At this point, you might be thinking: computers are wonderful! They are. Let's dig into this for a quick moment. Computers can be simple as a puts("Hello world") with no RAM, computers can be mainframes, tablets, phones, or even refrigerators. But that doesn't mean much. It's often used as a replacement for transistors, resistors, and such. So when you replace a transistor with a chip, you are not just creating a glorified transistor, you're enabling complex functionality that can be unlocked with a sequence of actions. https://exploitee.rs/

So you want to implement a new architecture in MAME? http://arcadehacker.blogspot.com.au/2014/11/capcom-kabuki-cpu-intro.html

8086

Is 8086 an x86 architecture? Technically, yes. But it's antiquated and pre-32-bit x86!

x86 boots into real-mode.
Real mode is 8086 16-bit compatibility mode with 32-bit extensions available.
You can't go back to 8086 after you switch to protected-mode.
You can emulate 8086 with Qemu.
Demoscene
Boot sector (MBR)
BIOS
DOS
PC98

When you run Qemu, what is actually occurring? Let's take the case of our 8086 code in the boot sector. Qemu is creating an emulator for the x86 real-mode. It executes the BIOS, SeaBIOS which is an open source BIOS for the PC that can correctly interface with Qemu's virtual peripherals. What type of peripherals exist in an IBM PC clone? Keyboard, mouse, VGA, APIC, IDE, ISA, PCI, and in newer systems: USB, SATA, and other special interfaces. To print characters to the screen, the BIOS needs to be able to completely interact with the VGA on the system. That might be as easy as writing to video RAM, or it might be as complicated as what we found in other systems. That is why a BIOS is assumed on every IBM PC. If you want to write a BIOS for the PC, you have to go one level deeper. But today, we're going to assume a BIOS is supplied. In our boot sector we can start executing 8086 instructions. The very first that you want to execute is mov ax, 0x7c0. This is Intel syntax, so destination is first. This command sets the register ax to 0x7c0. This sets up the second instruction mov ds, ax. This sets the data segment equal to ax (0x7c0). The data segment register tells the system what segment we should use for data. This decides how many further instructions work. Data is the most commonly used segment in x86, so be very careful to understand data segment before you try to hack a boot sector. To get the memory address of a data segment, multiply by 16. If you are smart, you know that in hex multiplying by 16 is just a left shift of 4, so the memory address that is associated with data segment equal to 0x7c0 is 0x7c00. Make sense?

Now let's discuss the real important part of this boot sector, the music!

mov si, music
mov cx, 0x1a9
beep:
   ; Choose a frequency for the PIT
   ; cx is the period
   mov al, 0xb6
   out 0x43, al
   mov al, cl
   out 0x42, al
   mov al, ch
   out 0x42, al
   
   ; connect the pit to the pc speaker
   in al, 0x61
   or al, 3
   out 0x61, al

   ; TODO: stored duration of note
   mov ax, 0xfff
   call sleep

You can see in the above "function" beep that we mov and we use this instruction out. You might wonder what out is. The out instruction is how we communicate with the PIT. It is Port I/O which is the other way to do I/O on systems besides Memory Mapped I/O (MMIO). Intel created 8086 with Port I/O in the expectation that it would be worthwhile. It turns out that Memory Mapped I/O is the way to go and Port I/O is actually just a vestige of the IBM PC.

Check out osdev.org wiki for a new insight into your computer's boot process.
Qemu Advent Calendar
DOS viruses
Malware
Boot sector viruses
SECT CTF 2017 PWN300 The gibson [b64]
ForbiddenBITS CTF 2013 – Old 50
DEF CON CTF Qualifier 2014 - dosfun4u
DEF CON CTF Qualifier 2014 - dosfun4u round 2
Honorable mentions:
Ghost in the Shellcode 2014 - DOS attack

Questions?

My paper with downloads and links
https://www.altsci.com/old_non-x86/
https://sono.us/mame

Radare2 is open source, free, and supported.
You might have missed Portland Retro Gaming Expo, but remember it for next year.
JRSFuzz is open source, free, and supported.

jvoss@altsci.com

Small Wide World

JavRE is open source and free.
JavRE

The State of the World

We have the opportunity to do things that I originally thought were fantasy. This will repeat, let us be clear, more times than you will wish. Don't let that be your excuse to not write code. Find something that benefits you or someone else and take a look from the perspective of: this is possible by means of effort.

Old non-x86 Architectures

Introduction

Old Non-x86 Architectures are too hard!

Aside #1: Don't Let Computers

What is the goal of this paper?

6507 (6502 architecture)

6502

CC65

libc

Common Pitfalls

Mednafen vs. MAME

Radare2

6502

M68k aka M68000

8051 aka Intel MCS-51

Zilog Z80

Zilog Z80

Zilog Z80

M6809

Aside #2: Computers are Wonderful

8086

Questions?

The State of the World