Help


from Wikipedia
« »  
Formally, the problem of supervised pattern recognition can be stated as follows: Given an unknown function ( the ground truth ) that maps input instances to output labels, along with training data assumed to represent accurate examples of the mapping, produce a function that approximates as closely as possible the correct mapping.
( For example, if the problem is filtering spam, then is some representation of an email and is either " spam " or " non-spam ").
In order for this to be a well-defined problem, " approximates as closely as possible " needs to be defined rigorously.
In decision theory, this is defined by specifying a loss function that assigns a specific value to " loss " resulting from producing an incorrect label.
The goal then is to minimize the expected loss, with the expectation taken over the probability distribution of.
In practice, neither the distribution of nor the ground truth function are known exactly, but can be computed only empirically by collecting a large number of samples of and hand-labeling them using the correct value of ( a time-consuming process, which is typically the limiting factor in the amount of data of this sort that can be collected ).
The particular loss function depends on the type of label being predicted.
For example, in the case of classification, the simple zero-one loss function is often sufficient.
This corresponds simply to assigning a loss of 1 to any incorrect labeling and is equivalent to computing the accuracy of the classification procedure over the set of test data ( i. e. counting up the fraction of instances that the learned function labels correctly.
The goal of the learning procedure is to maximize this test accuracy on a " typical " test set.

1.831 seconds.