Advanced Code Relationship Mapping

Who: Joel R. Voss aka. Javantea, AltSci Concepts
What: Code is beautiful, read it. Find vulnerabilities, soothe worries.
Where: Toorcon 11 San Diego 2009
When: Now!
Why: Code analysis changes the software security analysis game.
How: Make a list, filter out the stuff that's done correctly.
Video here

participate You'll get time at the end for questions, comments and arguments. But still instead of holding your arguments to the end, briefly give hand signals so that I know what I might be doing wrong.

We're going to be talking about relationships and how they foster bugs. I'm going to discuss the vulnerabilities that I am looking one by one. For your benefit and mine, I'll be releasing two bugs that ought to get your hearts racing as fast as mine. Alas I can never prove that my software is secure, or can I?

Where to Start?

sprintf()

Everyone knows about buffer functions.
Until it's their code.
It's very cheap to look for.
The relationship here is when a function calls sprintf with a passed variable.
A lot of people use sprintf(a, "%s %d", input, i), which is difficult to vet.
Even worse, using a deref adds another relationship.
But check it whether or not it is difficult.
If you're lazy, just make it a snprintf().

bluez-4.? - 4.56 dund.c

char buf[10];
...
sprintf(buf, "%d", channel);

Reproduce:

./compat/dund -s -n -P -1294967196
# Connect to the channel remotely.

bluez is the Linux bluetooth stack. It is a necessary part of the system for users. There is a kernel part and a userland part. dund is part of the userland tools. It's legacy but certain mobile systems use it.

I recommend writing some bluetooth code. There are some incredible developers working on this project. You can write code in Python, C, bash, and many other languages.

How do we define a Vulnerability?

We all know the three big ones:

array buffer overflow
```
buf[i] = 23;
```
null dereference
```
if (a) {
    b = &x;
}
b->y = 23;
```
buffer functions
```
memcpy(buf, input, sizeof(input));
```

These aren't all of them, but I hope to expand my code to cover the less likely code too.

What are the specs of array buffer overflow?

char buf[10];
buf[i] = 23;

A buffer can allocated, modified, read, and deallocated.

Stack

What about this one?

char *test(int i)
{
	char buf[10];
	buf[i] = 23;
	char *tmp = buf;
	return tmp;
}

int main()
{
	char *a = test(2);
	a[1] = 42;
	return 0;
}

Heap

Heap gives us a good place to put our stuff. It's more difficult because you actually have to free() properly. Any strangeness in the code path can cause a double free, a memory leak, or a null dereference.

Your Task: Relate

As a software security researcher your task is two-fold:

Follow every buffer from its index to its allocation.
Follow every index of the buffer to it's definite ranges and values.

Remember that strdup is an alloc.
Simply relate each buffer to it's allocation.

The two simplest relationships are the for loop with the array index and the integer array index with the integer array allocation.

char a[24];
a[23] = 0;

char a[24];
for (i = 2; i < 24; i++) {
	a[i] = x;
}

You can find these and count them as vetted and so can AltSci's parser. The next simplest, and most common relationship is a single layer of passing arrays to a function that has a for loop or integer index:
function buffer relationship

int write_acc(char *a)
{
	char r = 0;
	for (i = 2; i < 24; i++) {
		int x = (a[i] ^ 0xAA) + 7;
		a[i] = x;
		r += x;
	}
	return r;
}

int main()
{
	char a[24];
	write_acc(a);
	return 0;
}

The reason I'm showing you these relationships is to see how many things you actually have to write down to ensure that your code is deterministic. The more complex, the more relationships, but the number of things to vet only increases with actual relationships.

But What About Real Code?

Relationships make software security assessment a difficult job.

Project Lines Derefs Arrays Fors Buffer Calls AltSci

bluez 104k 3349 186 85 641 59k

TiMidity++ 183k 3586 457 476 807 159k

ntp 164k 2895 168 104 823 112k

mDNS 78k ? ? ? ? ?

Project	Lines	Derefs	Arrays	Fors	Buffer Calls	AltSci
bluez	104k	3349	186	85	641	59k
TiMidity++	183k	3586	457	476	807	159k
ntp	164k	2895	168	104	823	112k
mDNS	78k	?	?	?	?	?

all too common sight, 1000 relationships
The truth is that the most common stuff is passing variables two levels deep, so any human can do it. Any machine can do it. Recursive function calls and strange code require careful consideration, but it is quite possible.

In fact, as a rule I consider any code that can't be easily vetted to be bad code. That means each time you get stuck, you get unstuck.

Passing Pointers Causes Edge Cases

It's not that you can't have an edge case without passing pointers. It's that you have a new edge case that you didn't predict when you wrote the function.

TiMidity++-2.13.2

/*
 * convert 16bit PCM to JACK float [-1,1]
 */
static void convert_stream_16(struct tm_jack *ctx, int c, int size, short *buf)
{
        int i;
        jack_default_audio_sample_t *inbuf = ringbuf_get_writebuf(&ctx->rbuf, c);
        for (i = 0; i < size; i++, buf += ctx->channels, inbuf++) {
                /* well, we can use ftol() in C99 but here let's leave
                 * the optimization for the compiler...
                 */
                jack_default_audio_sample_t val;
                val = (jack_default_audio_sample_t)*buf / 32768.0;
                *inbuf = val;
        }
}

Reproduce:

timidity -Oj -iA &
museseq
# Play music, exit from muse.

inbuf relationship map

TiMidity++ is the Linux midi synthesizer. Midi is essential to music production and there are some pretty awesome programs that use TiMidity++ as their backend. TiMidity++ is mature software and hasn't been edited for 5 years. This bug proves that it could use some love or perhaps a new feature you want to work on. Midi is an enjoyable programming topic.

Jinkies, this map isn't actually useful! This map shows a perfect relationship. What's missing is the free. The function that calls the free is never called. Because TiMidity++ uses a plugin system for its output an object is created for the output function methods. write_jack gets called after destroy_jack() which is rather difficult to find via static analysis. However, what is easy is a free function being called by a function that is never called. Lucky for us, this type of bug is not nearly as common as you might expect.

Library Calls, Allocs, Derefs, and Edges, oh my!


	[d] sdp_data_t * sdpdata = NULL ;
	[ds] sdpdata = pData ;
	[ds] sdpdata = sdpdata -> next ;
		[r] pData (passed)
			sdp_data_printf() :
			[r] sdpdata (passed)
				print_tree_attr_func() :
				sdp_data_t *sdpdata = NULL;
				sdpdata = (sdp_data_t *)value;
				[r] value (passed)
					[n]
				[r] possible null
		[r] possible null

In this map, we are actually able to solve the derefs to a strange function.

Why `if (var)` doesn't solve null derefs

if (var) {
	var->x = y;
}

overflows
You can prevent null derefs this way, but not overflows and such.
pointer math
Also, any math that adds to a null pointer without dereferencing will skip over if (var).
disrupts logic
Thirdly, it's often bad technique to simply put an if around a deref since other parts of the code may rely on it. In fact, Java, C#, and even Python have this issue. It's what we call a language independent bug. There are multiple classes of bugs devoted to this mistake. Most of them are related to business logic which is bad news.

Can we really vet a whole project?

Complexity is inherent in software, but the solution to buffer overflows and similar vulnerabilities is in full vetting. I know that other people have tried, some have failed, and some have decent systems. My friends have often reminded me that logic can fail without a buffer overflow or null dereference. Proper coding at this level is a lot easier than the lower level (pointer math). By proper vetting and code quality at all levels we can improve our C code to make code execution go away.

The less time we spend vetting pointer math, the more time we can spend vetting logic and testing our protocols against DoS. The more people we can encourage to write proper code the more likely we will have valid logic.

I am currently working toward a logical proof of every issue that we can automatically vet. Everything we cannot vet automatically, we can detect as being not deterministic so that it can be manually vetted. Manual vetting is not perfect, but this simplifies the problem to a simple understanding of the issues involved.

Questions? Comments? Disagreements? Flamewar?

I have planned to leave half the time for questions. Ask away.

Find this talk online at https://www.altsci.com/concepts/

Thanks to:
Morgan, meee, m33p, guerrilla, neg9, bluez, TiMidity++, dataworm, Hikari, the viewers, and all those who have questioned my assumptions. This wouldn't have been possible without you.

11 one louder

AltSci Concepts is working on vetting the entire open source stack one project at a time. If you'd like to support or join development of AltSci Concepts' flagship product you may contact jvoss at altsci com.

If there aren't any questions, I have 3 questions for you:

Who here thinks that static analysis can solve buffer overflows?
Who thinks that they can find vulnerabilities in code I have vetted?
Who fits in a third category?

Advanced Code Relationship Mapping Toorcon 11 San Diego 2009

Advanced Code Relationship Mapping

Where to Start?

bluez-4.? - 4.56 dund.c

How do we define a Vulnerability?

What are the specs of array buffer overflow?

Stack

Heap

Your Task: Relate

But What About Real Code?

Passing Pointers Causes Edge Cases

TiMidity++-2.13.2

How can you solve all null pointer dereferences?

Library Calls, Allocs, Derefs, and Edges, oh my!

Why `if (var)` doesn't solve null derefs

Can we really vet a whole project?

Questions? Comments? Disagreements? Flamewar?

Advanced Code Relationship Mapping Toorcon 11 San Diego 2009

Advanced Code Relationship Mapping

Where to Start?

bluez-4.? - 4.56 dund.c

How do we define a Vulnerability?

What are the specs of array buffer overflow?

Stack

Heap

Your Task: Relate

But What About Real Code?

Passing Pointers Causes Edge Cases

TiMidity++-2.13.2

How can you solve all null pointer dereferences?

Library Calls, Allocs, Derefs, and Edges, oh my!

Why if (var) doesn't solve null derefs

Can we really vet a whole project?

Questions? Comments? Disagreements? Flamewar?

Why `if (var)` doesn't solve null derefs