Vulnerabilities in Code Easily Mapped

by Javantea
Nov 14, 2008

INTRODUCTION

On Feb 24, 2008 I wrote specs and a simple parser for my own programming language. The first level of parsing returned a list of types. The second level (currently nearly finished) would sort the types into statements that could be executed line by line using an interpreter or that could be translated into assembly (see ASLang2). The language was designed to compile in automatic bounds checking and would not allow non-deterministic code to compile. This is not an outrageous goal, in fact most interpreted languages do the same. Funny examples of non-deterministic code being executable exist in Python, PHP, and Perl. To a developer, though a magic bullet would be nice. In fact, it would be nice if the compiler was able to print out a list of possible boundary breaks. But before I finished my language, I decided that it would be possible and easy to simply write the same compiler for C/C++. I would get a list of variables, find all pointers in the code, and count whether it's possible to overwrite anything.

Easy, huh? In fact, it has been a pretty easy task. I'm not finished by a long shot and my first example isn't finished, but I can easily explain how it works and how I'm the first static code analysis to do this correctly with not much of a budget. (If someone has done this, I haven't heard of it being sold or given away for any price I'm paying.)

The goal of this project is to turn this example code into a security report:

int main(int argc, char **argv)
{
	if(argc < 2) return 1;
	char buf[24];
	sprintf(buf, "%s\n", argv[1]);
	return 0;
}

The example code is a trivial stack overflow. sprintf() is quite often a source of buffer overflows. To exploit this, simply overwrite the end of the buffer which allows you to point to the code you wish to execute.

METHOD

How do we know for sure whether the above code overwrites the buffer? We must write code that follows these rules:
1) char buf[24]; allocates 24 bytes to buf.
2) sprintf(a, b, c) writes a number of bytes to a depending on the values of b and c.
3) if number of bytes written is greater than bytes allocated, print a warning.

My code's method finds all allocations and is capable of understanding size, so the first rule is met. My code finds all reads and writes of variables which satisfies the first half of the second rule. I will write the specific code for sprintf(a,b,c) here for you to see. I will also write the specific code for write greater than alloc and I will draw the warning in the way that my code will draw when it is completed.

Function that checks how much writing sprintf is doing:

def checkFunc(f, line_no):
	# sprintf(a,b,c)
	if f.name == 'sprintf':
		a = f.values[0]
		b = f.values[1]
		c = f.values[2:]
		if b.__class__ == CString:
			size = len(f.values[1].value)
			# Approximate replace of %s with length of the string.
			# The size code is almost smart enough to know whether 
			# it's zero terminated depending on other attributes.
			for varg in c:
				size += varg.size - 2
			#next varg
		else:
			# use precalculated value if possible
			size = f.values[1].size
			for varg in c:
				size += varg.size - 2
			#next varg
		#end if
		print 'write', a, 'size', size
		return ('w', a, size, line_no)
	#end if
	return None
#end def checkFunc(f, line_no)

The printed output of this function is:
write buf size 4096000000
because argv[1] is well known to be any size 4 GB is a bit less than 2^32 and is easily larger than most buffers.

Function that checks whether the write is larger than the alloc

def checkBounds(var, uses):
	alloc_use = None
	max_use = None
	for use1 in uses:
		if use1[0] == 'a':
			# It's an alloc
			alloc_use = use1
		elif use1[0] == 'r' or use1[0] == 'w':
			if (max_use == None) or (use1[2] > max_use[2]):
				max_use = use1
			#end if
		#end if
	#next use1
	if alloc_use[2] < max_use[2]:
		print "Warning: check your bounds:", max_use
	#end if
#end def checkBounds(var, uses)

The output of this function is obviously pretty lame because it just gives you the line number and that it found the bound failure, but it gets the point across.

The printed output of this function is:
Warning: check your bounds: ('w', buf, 4096000000, 5)

ANALYSIS

The simplest example of a buffer overflow has been figured out. What happens when I put all open source code I can possibly find through this wringer? Open source code that is in common use has been definitely tested against sprintf, but lesser known programs such as the ones you find on this website are definitely ripe picking for these trivial vulnerabilities. With my nearly complete C/C++ parser, incredibly simple logic, and a lot of work describing reads and writes for every function I expect to be able to find buffer overflows in dozens or hundreds of projects.

The first thing that people want to harp on when they see static code analysis is that it involves a lot of false positives and/or a lot of false negatives. In my view false positives are fine whereas false negatives are bad news. My system lists all allocations, reads, writes, and deallocations currently in a very easy to see manner. This gets around the obvious problem of user error (not seeing the forest through the trees). Secondly, there can be no flaw without an alloc, read, write, or dealloc, so the problem of false negatives is actually not a problem. Thus we are left with a very easy system: narrowing down as many false positives as possible. The way to verify whether something is or is not a buffer overflow is to ensure deterministically that all items are never written beyond their bounds. In the above example, I was able to figure out that a write was completely not bounded. So how do we check bounds? Using if statements. Simply, comparing the length of the input variables (plus any extra stuff in the b variable) with the size of the buffer (24) would solve this issue. If the code has that section added, we need another function. That function will do the math of reads and writes of buffers using if statements, for loops, while loops, and basic arithmetic. The first version of this code will only work against static and simple non-math-related dynamic buffers, but the final version of the code will handle all deterministic code including functions, structs, and preprocessor includes (which could be parsed using gnu cpp if I wished).

CONCLUSION

The obvious end result of my software being run on all open source software as well as all software that has purchased my analysis service is the elimination of code execution vulnerabilities as we know it. Off by one errors, concatenation, truncation, and basic security concepts will be programmatically fixed instead of being found by security professionals. Developers will be able to reign in their code and release knowing that their number one security issue is solved. Programming languages such as Python, PHP, Perl, and Java that are interpreted by code written in C will be safe from these same issues for the known future. Security professionals will need to find the simpler processing bugs that are currently plaguing interpreted languages. This is already occurring since buffer overflows have been solved for most software companies. Companies that are using bounds-safe languages still hire security professionals, but at a much lower rate.

If you are interested in static code analysis or advanced programming languages, feel free to contact me.

Permalink