Page "Pattern recognition" Paragraph 11
from
Wikipedia
Formally, the problem of supervised pattern recognition can be stated as follows: Given an unknown function ( the ground truth ) that maps input instances to output labels, along with training data assumed to represent accurate examples of the mapping, produce a function that approximates as closely as possible the correct mapping.
( For example, if the problem is filtering spam, then is some representation of an email and is either " spam " or " non-spam ").
In order for this to be a well-defined problem, " approximates as closely as possible " needs to be defined rigorously.
In decision theory, this is defined by specifying a loss function that assigns a specific value to " loss " resulting from producing an incorrect label.
The goal then is to minimize the expected loss, with the expectation taken over the probability distribution of.
In practice, neither the distribution of nor the ground truth function are known exactly, but can be computed only empirically by collecting a large number of samples of and hand-labeling them using the correct value of ( a time-consuming process, which is typically the limiting factor in the amount of data of this sort that can be collected ).
This corresponds simply to assigning a loss of 1 to any incorrect labeling and is equivalent to computing the accuracy of the classification procedure over the set of test data ( i. e. counting up the fraction of instances that the learned function labels correctly.
Page 1 of 1.
1.831 seconds.