The dataset you provided is not linearly shared in the space covered by a1 and a2, so the perceptron will not do. You need a multilayer perceptron (MLP). In general, MLPs are used more often because they can do everything a single-layer perceptron can do (look for a universal approximation theorem). The radial base function will also do the job. Noli hinted at something interesting, but more complicated - the data set becomes linearly separable with a high probability if projected onto a very very multidimensional space (cover theorem). This is the motivation for using vector support machines.
Thus, there is no natural choice; this is a completely specific problem. Experiment. My lecturer said "cross valid is your friend"
mbatchkarov
source share