B981 3762 1D30 CA05 E2C1 CD7F 3C68 C8DB CBA783EF

Old non-x86 Architectures

Motorola 68000 die

http://www.visual6502.org/images/pages/Motorola_68000.html

by Javantea
Oct 2-Nov 8, 2017
Batman's Kitchen Meeting
Nov 8, 2017
https://www.altsci.com/old_non-x86/
Slides || Talk video
Software: old_non-x86-0.6.tar.xz [sig]

Note: This paper was meant to be too verbose to read in one sitting. Watch the videos, read a few bullets, and then read a section when you're ready.

Introduction

In the realm of computing, there is a clear front-runner for desktop, laptop, and server processor: x86 architecture. In battery powered mobile devices such as phones, Raspberry Pi, and such, ARM is a clear front-runner because x86 is inefficient and bloated. ARM supports System-on-Chip (SoC) designs and powerful systems with limited power use. Because of this bias, you can go a long ways on just one architecture. If students focus their attention on x86 or ARM reverse engineering and exploitation, they will have a simple blind spot when coming up against challenges that involve other architectures. In this paper, we try to understand why. Qemu supports aarch64 alpha arm cris i386 lm32 m68k microblaze microblazeel mips mips64 mips64el mipsel moxie nios2 or1k ppc ppc64 ppcemb s390x sh4 sh4eb sparc sparc64 tricore unicore32 x86_64 xtensa xtensaeb architectures and MAME supports many more architectures. Some of these architectures are old, and some are so different that it would benefit a person to spend just one hour learning a bit about how these systems work. Another major benefit of learning to program and reverse engineer a different architecture is to interact with hardware in a way that on x86 and ARM only the bootloader, kernel, and ring zero hacker is allowed to do. In 10 lines of code, 1 second of compiling, and 1 second of emulation, a student can write their first kernel-level code. This will come in handy when a challenge requires them to understand the inner working of their system. In the environment of a CTF, challenges can use your bias against you!*

* See The cLEMENCy Architecture for example.

Old Non-x86 Architectures are too hard!

Atari 2600 Pitfall

A common misconception is that old non-x86 architectures are too difficult to work with. They use different tools, they have significant limitations, but this should not make them significantly harder. In fact, if you know x86, you should be well on your way to becoming architecture agnostic. M68k Assembly was taught at University of Washington to Physics majors with no computer programming background in 2002. How did I pass that class before I was a hacker if M68k is too difficult to work with?*

* It may mean that I'm talented in assembly level programming, but I think it means that everyone has a chance at learning M68k assembly.

Aside #1: Don't Let Computers

Computers are an easy excuse to not socialize, don't use it! If you're having trouble getting your code to work, ask someone for help. If you can't understand something and you've tried Google, try someone you don't know. A person I was talking to at a local bar learned to program 6502 assembly as a teenager.

The goal in meeting someone is to like them.

What is the goal of this paper?

We want to be able to understand the fundamentals of working with old systems and how they can be used in contemporary systems. This will teach us about the nature of computers and kernel-level programming.

  1. Write a Program.
  2. Emulate it.
  3. Debug it.
  4. Disassemble it.
  5. Reverse it.

While we're going to be emulating these systems, we are intending to model the function of an actual system with some function. If you want a real game or a console, I recommend Pink Gorilla in the U-district or in the Intl-district. They have knowledgeable staff and excellent selection and prices.

Please download these tools and my paper in case you're playing a CTF and need to hack a ROM.
Permalinks: Paper Software: old_non-x86-0.6.tar.xz [sig]

6507 (6502 architecture)

Pitfall was written in 6507 Assembly for the Atari 2600. 6507 is a 6502 architecture processor. The system was very limited in graphics and computer, which made the platform far less successful than its competitors, but the amazing games made surprised players and set the stage for improvements. A good example of a modern game written for Atari 2600 is Ultra SCSIcide by Joe Grand. It has binaries and source code if you'd like to try reversing it. The source code will tell you how well you did.

6502

The 6502 is often hailed as being one of the easiest architectures to program in assembly. The phenomenon of NES, C64, and Apple II programming in the 21st century provides some evidence to this. I am more apt to think that this is because of the ease of programming a limited system suits people better than a complex system like x86.

Systems that feature 6502 architecture:

6502 Family Tree
                          6502
                           |
        +------+--------+--+--+-------+-------+
        |      |        |     |       |       |
      6510   deco16   6504   6509   n2a03   65c02
        |                                     |
  +-----+-----+                            r65c02
  |     |     |                               |
6510t  7501  8502                         +---+---+
                                          |       |
                                       65ce02   65sc02
                                          |
                                        4510
https://wiki.nesdev.com/w/index.php/CPU_memory_map

Before we work on how to understand assembly, let's focus on the the compiler. C code is a far more human language than assembly in that it can be understood as a set of functions, statements, and variables. Those functions, statements, and variables are widely used in C, C++, C#, Java, Python, JavaScript, PHP, and other easier procedural languages.

CC65

CC65 is a C compiler and assembler targeting multiple systems that use 6502 architecture. You can use it to compile code for any 6502 system, supported or not. What is a C compiler? A simple C compiler needs to take any valid C program (see listing 1) and turn it into a valid assembly program for a certain architecture.

Listing 1: C sample

    int main()
    {
        int i;
        char buf[10];
        for(i = 0; i < 10; ++i) {
             buf[i] = i;
        }
        return 0;
    }

With compilers it became clear that without a valid C library, you'd be up a creek trying to write your own, so each compiler should either be paired with a libc or it will be difficult to work with. But you don't necessarily need libc to write a cool program as we'll see later. CC65 comes with a libc for each system. In order to deal with text on a platform that does tiles and sprites and is very limited, they wrote a library that doesn't do graphics perfectly, but good enough for debug and a little bit of showing off. You can use it, but after a while you'll want to load sprites, create a nice scrolling background and so on or write your own engine. Yes, the NES can do some pretty amazing things. And CC65 has no limitation. How does an 8-bit system handle 32-bit ints? 64-bit ints? Floats?

int i;

So that's step 1: Write a program.*

* Use volatile and learn what it does. Trust me.

How does volatile work? The way that you talk to audio or visual hardware in the system is the same way you talk to RAM on a system. That is you can read and write to actual hardware by address. In 6502 assembly that looks like:

      LDA #4000
      STA #4000
How does the compiler know what to optimize out and what not to optimize out? In a C compiler, it can optimize anything it wants.
a = 43;
a = 42;
What is the value of a? It's 42, so why should the compiler write 43 to a? If a isn't volatile, the compiler won't (assuming it's optimizing correctly). If a is volatile, the compiler will. This makes it possible for you to talk to hardware in such a way that allows you to use the time dimension to give complicated sequences of data to a single address. We'll make that very clear as time goes forward. This is why it's much more reasonable to put a sleep into an embedded program than it is to put a sleep in a desktop program.

libc

What would happen when you decide to turn off libc?

The first thing that would go wrong is printf. If you don't need to print or draw anything, lucky you. The second thing is file access. If you have no files, great. The third thing is user input. If you need to take in input, how do you get it? getch? Joystick? Where is the Joystick? In NES, the joystick can be read using memory-mapped IO. It was nice to have a joystick driver because I couldn't figure out how to read from the joystick in the time allotted. cc65 gives you a simple joy_read function to call. With a C compiler and an assembler, you can write your own libc. The reason that people don't is because it takes time. This becomes a theme. Given weeks of time, you could hack quite a lot of things you've never seen or heard of before, but in the time span of a CTF only the prepared will prevail.

Common Pitfalls

All programming has pitfalls, in writing this paper I ran into many. Here are some:

Why didn't I just copy someone else's code? If you copy something, there is no guarantee that you understand what's going on. This paper is about understanding what's going on.

How do we know what is going on? The first option is to guess what will happen and then test. If your guess is correct, then you are either lucky or you understand the process at some level. In order to prove that you understand the process, you should be able to predict many things and have a significant number turn out to be true. This type of trial and error programming is very widely used from web development to embedded systems. In the heart of programming, there must be some point of trial and error when you break new ground that you and other people have never done before.

It's worthwhile to take a look at where our guesses do not hold up to reality.

Mednafen vs. MAME

Mednafen has a debugger and is nice to use. It appears to be designed to fill in the gap where a piece of code isn't working or you don't know how a piece of code works. The key bindings are quick and you can skip to a memory address quickly. An obvious downside of Mednafen's debugger is its lack of features. You can't set a breakpoint without visiting the address. You can't set a watchpoint. The sub-byte disassembly is clumsy.
MAME has a featureful debugger with file output and a scripting language (Lua). While it's interface is not very good, it's possible that the Qt debugger could be coming to Linux and improvements to MAME are frequent.
MAME's scripting language made significant improvements in a recent version in regards to automated debugging.
Both Mednafen and MAME give you step 2 (emulate it) and 3 (debug it).

Mednafen doesn't allow you to export disassembly, so it's a lot weaker than MAME. MAME is less easy to work with.
Some things you'll probably want to learn about a new architecture:

Radare2

Radare2 supports a large set of non-x86 architectures. But Radare2 has bugs. It can help you reverse a lot, but only if it works. There are bugs in many old versions, so I heartily recommend compiling from git master or the most recent release.

A list of very useful commands:

  # Start radare disassembling a 6502 architecture file. 
  r2 -e asm.arch=6502 file.nes
  # Start radare disassembling a M68k architecture file.
  r2 -e asm=m68k file.md
  # Start radare disassembling a 8051 architecture file that is encoded in Intel hex format.
  r2 -e asm.arch=8051 ihex://harvard1.ihx
  # Start radare disassembling a z80 architecture file.
  r2 -e asm=z80 file.bin

  # Analyze all functions (it sometimes helps to run this twice)
  aaa
  # Print disassembly of the current function.
  pdf

  # write disassembly to a file
  pd >harvard1.dis
  # Visual mode
  V
  # Switch to the next visual mode
  p
  # Graph mode
  V
  # If it complains, define a function
  df
  # go to the top of anything
  g
  # go to the bottom of anything
  G
  # exit
  q
  

You might notice that it's trying to be vim. That means if you're comfortable moving around using hjkl, you can use that.

6502

Examples of 6502 in CTF:
Pwn Adventure Z from CSAW
Compromising a Linux desktop using... 6502 processor opcodes on the NES?!
Hacking Time from CSAW CTF 2015 [.kr]
Juniors CTF 2016 - Joy500 Oldschool NES Rom Write Up

If you look at the source code included with this paper, you'll find a C program compilable with CC65. It uses libraries specific to CC65 but should be portable to other compilers and libc implementations because CC65 is not too far from the C specification. In nes/src/nsf7l.c you see a simple main function where we call init() and then we play music by setting values in the APU in a while loop. Also in the while loop is a call to cprintf which prints characters to the screen. Also found in the nes/src/ is hello.s which is an assembly program that has a similar structure to nsf7l.c. music.s is just a stub, so hello.s will not actually play any music if you get it to compile and run.

Before writing nsf7l.c I wrote a few programs that worked at the 6502 machine code level. By concatenating bytes (using assembly and guides) I was able to create a working nsf file, which is an NES rom that executes 6502 instructions but only plays music. By writing the bytes I was able to bypass both assembler and compiler, providing myself a truly bootstrap experience, though instead of using a keypad to enter hex into memory or something like that, I used a fully functioning desktop computer with Python and gigabytes of RAM -- most of which was unnecessary except for convenience.

nsf7l.c plays music in 105 lines of C with a tiny libc. Can you do that in Linux, Windows, or OSX? By making x86 and ARM systems more powerful, we have also made it more difficult to write a proof of concept.

The bootstrap experience:
Once you write an assembler in machine code, the next step would be to write a compiler in assembly. You can then use your machine code assembler to compile assembly programs to machine code. You might also consider porting your machine code assembler to assembly since there might be bugs in your machine code assembler that you can't see because it's bytes in memory. Once you had a C compiler that could compile even the simplest subset of C, you could write your compiler and assembler in C and gain the benefits from there on. Then of course, you would endeavor to make your C compiler more complete as it's pretty obvious that you wouldn't have an operating system or robust file system once you had a compiler.

If you look at the function init() in nsf7l.c you can see

APU.status = 0xf;

What does this do? In nes.h, there is a line that describes APU as a struct at address 0x4000.

#define APU             (*(struct __apu*)0x4000)

This means that the write to APU.status will write the value 0xf to address 0x4015. At address 0x4015, there is an APU which controls the audio. By writing to that address, you are controlling a chip. While you may be writing to memory, don't expect that every memory address is memory.

Here are a handful of answers to the questions we asked at the beginning of this section.

Questions?

My paper with downloads and links
https://www.altsci.com/old_non-x86/
https://sono.us/mame

Radare2 is open source, free, and supported.
You might have missed Portland Retro Gaming Expo, but remember it for next year.
JRSFuzz is open source, free, and supported.

jvoss@altsci.com

Small Wide World Logo
Small Wide World

JavRE is open source and free.
JavRE

The State of the World

We have the opportunity to do things that I originally thought were fantasy. This will repeat, let us be clear, more times than you will wish. Don't let that be your excuse to not write code. Find something that benefits you or someone else and take a look from the perspective of: this is possible by means of effort.