Advanced Code Relationship Mapping Toorcon 11 San Diego 2009

Advanced Code Relationship Mapping

Who: Joel R. Voss aka. Javantea, AltSci Concepts
What: Code is beautiful, read it. Find vulnerabilities, soothe worries.
Where: Toorcon 11 San Diego 2009
When: Now!
Why: Code analysis changes the software security analysis game.
How: Make a list, filter out the stuff that's done correctly.
Video here

participate You'll get time at the end for questions, comments and arguments. But still instead of holding your arguments to the end, briefly give hand signals so that I know what I might be doing wrong.


We're going to be talking about relationships and how they foster bugs. I'm going to discuss the vulnerabilities that I am looking one by one. For your benefit and mine, I'll be releasing two bugs that ought to get your hearts racing as fast as mine. Alas I can never prove that my software is secure, or can I?

Where to Start?

bluez-4.? - 4.56 dund.c
char buf[10];
...
sprintf(buf, "%d", channel);
Reproduce:
./compat/dund -s -n -P -1294967196
# Connect to the channel remotely.
bluez dund map

bluez is the Linux bluetooth stack. It is a necessary part of the system for users. There is a kernel part and a userland part. dund is part of the userland tools. It's legacy but certain mobile systems use it.

I recommend writing some bluetooth code. There are some incredible developers working on this project. You can write code in Python, C, bash, and many other languages.

How do we define a Vulnerability?

We all know the three big ones:

These aren't all of them, but I hope to expand my code to cover the less likely code too.

What are the specs of array buffer overflow?

char buf[10];
buf[i] = 23;

A buffer can allocated, modified, read, and deallocated.

Stack

What about this one?

char *test(int i)
{
	char buf[10];
	buf[i] = 23;
	char *tmp = buf;
	return tmp;
}

int main()
{
	char *a = test(2);
	a[1] = 42;
	return 0;
}

Heap

Heap gives us a good place to put our stuff. It's more difficult because you actually have to free() properly. Any strangeness in the code path can cause a double free, a memory leak, or a null dereference.

Your Task: Relate

As a software security researcher your task is two-fold:

  1. Follow every buffer from its index to its allocation.
  2. Follow every index of the buffer to it's definite ranges and values.
Remember that strdup is an alloc.
Simply relate each buffer to it's allocation.

The two simplest relationships are the for loop with the array index and the integer array index with the integer array allocation.

integer buffer relationship
char a[24];
a[23] = 0;

for buffer relationship
char a[24];
for (i = 2; i < 24; i++) {
	a[i] = x;
}

You can find these and count them as vetted and so can AltSci's parser. The next simplest, and most common relationship is a single layer of passing arrays to a function that has a for loop or integer index:
function buffer relationship
int write_acc(char *a)
{
	char r = 0;
	for (i = 2; i < 24; i++) {
		int x = (a[i] ^ 0xAA) + 7;
		a[i] = x;
		r += x;
	}
	return r;
}

int main()
{
	char a[24];
	write_acc(a);
	return 0;
}

The reason I'm showing you these relationships is to see how many things you actually have to write down to ensure that your code is deterministic. The more complex, the more relationships, but the number of things to vet only increases with actual relationships.

But What About Real Code?

Relationships make software security assessment a difficult job.
ProjectLinesDerefsArraysForsBuffer CallsAltSci
bluez104k33491868564159k
TiMidity++183k3586457476807159k
ntp164k2895168104823112k
mDNS78k?????

all too common sight, 1000 relationships
The truth is that the most common stuff is passing variables two levels deep, so any human can do it. Any machine can do it. Recursive function calls and strange code require careful consideration, but it is quite possible.

In fact, as a rule I consider any code that can't be easily vetted to be bad code. That means each time you get stuck, you get unstuck.

Passing Pointers Causes Edge Cases

It's not that you can't have an edge case without passing pointers. It's that you have a new edge case that you didn't predict when you wrote the function.

TiMidity++-2.13.2
/*
 * convert 16bit PCM to JACK float [-1,1]
 */
static void convert_stream_16(struct tm_jack *ctx, int c, int size, short *buf)
{
        int i;
        jack_default_audio_sample_t *inbuf = ringbuf_get_writebuf(&ctx->rbuf, c);
        for (i = 0; i < size; i++, buf += ctx->channels, inbuf++) {
                /* well, we can use ftol() in C99 but here let's leave
                 * the optimization for the compiler...
                 */
                jack_default_audio_sample_t val;
                val = (jack_default_audio_sample_t)*buf / 32768.0;
                *inbuf = val;
        }
}
Reproduce:
timidity -Oj -iA &
museseq
# Play music, exit from muse.

inbuf relationship map

TiMidity++ is the Linux midi synthesizer. Midi is essential to music production and there are some pretty awesome programs that use TiMidity++ as their backend. TiMidity++ is mature software and hasn't been edited for 5 years. This bug proves that it could use some love or perhaps a new feature you want to work on. Midi is an enjoyable programming topic.

Jinkies, this map isn't actually useful! This map shows a perfect relationship. What's missing is the free. The function that calls the free is never called. Because TiMidity++ uses a plugin system for its output an object is created for the output function methods. write_jack gets called after destroy_jack() which is rather difficult to find via static analysis. However, what is easy is a free function being called by a function that is never called. Lucky for us, this type of bug is not nearly as common as you might expect.

How can you solve all null pointer dereferences?

It's not easy when the relationship tree looks like this:
deref relationship map

Start here:

Library Calls, Allocs, Derefs, and Edges, oh my!


	[d] sdp_data_t * sdpdata = NULL ;
	[ds] sdpdata = pData ;
	[ds] sdpdata = sdpdata -> next ;
		[r] pData (passed)
			sdp_data_printf() :
			[r] sdpdata (passed)
				print_tree_attr_func() :
				sdp_data_t *sdpdata = NULL;
				sdpdata = (sdp_data_t *)value;
				[r] value (passed)
					[n]
				[r] possible null
		[r] possible null

In this map, we are actually able to solve the derefs to a strange function.

Why if (var) doesn't solve null derefs

if (var) {
	var->x = y;
}

Can we really vet a whole project?

Complexity is inherent in software, but the solution to buffer overflows and similar vulnerabilities is in full vetting. I know that other people have tried, some have failed, and some have decent systems. My friends have often reminded me that logic can fail without a buffer overflow or null dereference. Proper coding at this level is a lot easier than the lower level (pointer math). By proper vetting and code quality at all levels we can improve our C code to make code execution go away.

The less time we spend vetting pointer math, the more time we can spend vetting logic and testing our protocols against DoS. The more people we can encourage to write proper code the more likely we will have valid logic.

I am currently working toward a logical proof of every issue that we can automatically vet. Everything we cannot vet automatically, we can detect as being not deterministic so that it can be manually vetted. Manual vetting is not perfect, but this simplifies the problem to a simple understanding of the issues involved.

Questions? Comments? Disagreements? Flamewar?

I have planned to leave half the time for questions. Ask away.

Find this talk online at https://www.altsci.com/concepts/

Thanks to:
Morgan, meee, m33p, guerrilla, neg9, bluez, TiMidity++, dataworm, Hikari, the viewers, and all those who have questioned my assumptions. This wouldn't have been possible without you.

11 one louder

AltSci Concepts is working on vetting the entire open source stack one project at a time. If you'd like to support or join development of AltSci Concepts' flagship product you may contact jvoss at altsci com.

If there aren't any questions, I have 3 questions for you:

  1. Who here thinks that static analysis can solve buffer overflows?
  2. Who thinks that they can find vulnerabilities in code I have vetted?
  3. Who fits in a third category?