AI Coder for Fuzzing Scripting Languages

by Javantea aka. Joel R. Voss
Nov 9-10, 2006
AI Coder 1 version 0.1 Source [sig]

Introduction

Scripting languages have become an important part of programming functionality. Often, scripting languages are run in a sandbox with a specific piece of memory available and all code interpreted by a program written to execute only a subset of the computer's functionality. This has allowed a system on the web where client-based code can be run on the clients for quick reaction time and specific features. Most of the actual code (data retrieval, calculation, and storage) needs to be run on the server, so it leaves the client code to be nearly all real-time display. In fact, most webpages need no actual client-side code to be perfectly functional. However, more and more sites are relying on increasingly complex scripting libraries including AJAX, math, and data handling. Running a SHA1 hash on a client-side may be useful for many purposes, but many problems arise with these systems.

Browsers must handle a large amount of useful script as well as a large amount of invalid script without detracting from user experience. Compliance with standards is also an important factor in writing a browser. With these factors in mind, obvious security questions arise. Many of these have been addressed by browser developers, yet many have not. Javascript is obviously headed to be the leading cause of DoS if it is not already. It has also become a major threat to user privacy due to Cross-site scripting (XSS) attacks as well as malicious phishing attacks.

When developing client-side applications, developers often find bugs in browsers, inconsistencies, and security holes. Usually, a developer will overlook these to finish the project s/he is being paid for. However, it makes sense that a developer report these as bugs at very least to aid security researchers looking to fix bugs in the browser.

Testing browsers for bugs is a difficult security challenge. However, fuzzing has proven to be an excellent way to quickly test browsers. Using server-side scripting to generate random html pages and script, a tester can automatically test a very large cross-section of components. The fact that browsers are quite resistant against bad code is helpful, but also harms a test. A general rule is that a browser will run until it reaches a fatal error. A few examples can be shown to test this system.

Method

The simplest AI Coder is a random code generator that knows simply that A requires B before it, optional C between B and A, E after it, and optional D between A and E. This can be seen in Figure 1.

[B [C]] A [[D] E]

Figure 1. Design of a block of code.

For example, if A is:

if(a == 1) {

then, it would require a close curly bracket (}) after it (position E).

A more useful system allows child statements. Any statement can become a child statement if put between a statement's B and C and D and E. The design then becomes the system found in Figure 2.

[B [s1] [C]] A [[D] [s2] E]

Figure 2: Design of a block of code with subcode.

This system can be described by a database designed like this:

create table code (
	id int NOT NULL auto_increment,
	data text NOT NULL DEFAULT '',
	unique key data (data(32)),
	primary key (id));

create table reqs (
	code int NOT NULL DEFAULT -1,
	code_req int NOT NULL DEFAULT -1,
	req_type enum('B', 'C', 'D', 'E'));

I built two interfaces to this database: input and output.

The first interface simply outputs code in the method described in Figure 2. Variables of length and depth are used to ensure that the recursive function is not called forever.

A very simple test of the output code contained can be found at this site. Since you can't see the output and the page is running arbitrary code, it might be more useful looking at the source instead. Both pages reload every second.

The second interface is split into two parts. The first part simply inputs lines of javascript code typed into a textarea control. A minor improvement checks for "{" at the end of the line (as in the first example) so to add the obvious requirement of an end curly bracket "}". The second part allows the user to link lines as the A, B, C, D, and E of a full block of code.

The curly bracket system is the first use of this system. However, more uses are immediately obvious. Javascript is very picky about variables being used before being set, so a requirement for any code that uses a variable would be for that variable to be set.

For example, if A is:

if(b == 1) {

then, it would require either

b = 1;

b = 2;

before it (B).

Data

Sample Code generated by AI Coder 0.1

    if (rsIE > 0) {
	rsMSIE = true;		// 67 92
	this.screenWidth = screen.width;	// 142 168

	for (var i = 0; i < p.length; i++) {	// 92 117
	}			// 27

	var rsMSIE = false;
	var rsIE6 = false;
	var rsOSXP = false;
	var rsIE6XP = false;	// 51 76

	{			// 36
	}			// 5 27

	var sTmp = oImage.src;	// 162 188

	this.flash = (parseInt(sPlugin.slice(16)));	// 136 162

	sellist.style.display = 'none';	// 4 17

	window.open(fullurl, 'sendafriend', 'width=450,height=550,resizable=yes,location=no,menubar=no,scrollbars=no,personalbar=no,status=yes');	// 34 59

	this.flash = -1;	// 138 164

	sSelImg = sSelectedImg["x"];	// 159 185

    }				// 27

Browser DoS Code found with AI Coder

while(1) {
    self.close();
}

Firefox 2.0 Crash code from http://lcamtuf.coredump.cx/ffoxdie_orig.html
<body onload="javascript:foo()"> <script>  </script> <iframe id=foo> </iframe>

Analysis

The function self.close() when run pops up a dialog box. This halts testing. Further analysis of this function shows that an infinite while loop harms usage of the browser. Both Firefox 2.0 and Konqueror 3.5.4 respond by giving a message that says the script is slow. Firefox 2.0 stops the script successfully while Konqueror 3.5.4 exits when the script is stopped.

The Firefox 2.0 Crash code requires an IFRAME, but this can easily be added to any proper test system. The setTimeout function is a quite common command in javascript and the AI Coder can have variations such as are seen in this code. The required element in the crash code is an xml file, which is not possible for AI Coder. This means that the AI Coder would not be able to detect this advanced security flaw.

Conclusion

AI Coder 0.1 is able to find security flaws by writing many variations of code that are unexpected by developers and impossible to find by testers. By writing quite random code with a rather large source base, a large surface area of code can be tested.

Javascript will continue to have vulnerabilities well into the future. Finding the bugs before someone else does is a constant arms race for security researchers. Many interesting developments are sure to come as a result of this development in security and in artificial intelligence. The advancement of this same code may become a very interesting project for researchers in AI. Being able to test large amounts of code without user intervention is a perfect system for AI coders.

There are many automatic and manual functions that could be added to make AI Coder 0.1 more useful. However, it is quite obvious that a system of line-by-line script creation is not a good solution for an AI Coder. The second version of AI Coder will use Context-Free Grammar to read and design code from the ground up. Using finite state machines, data-based code, eval functions, and different languages better suited for AI would speed up this system. This way much more interesting code can be written and tested.

Permalink

Computer Journal