Extended Features in Logistic Regression ======================================== A logistic regression classifier in basic form finds a (hyper) plane in feature space that best separates the two classes. Consider the data shown in the figure below. .. ipython:: python :okwarning: from matplotlib.pylab import * m = 100 X0 = 4*randn(m//2, 2) + (9, 9) y0 = zeros((m//2)) X1 = 4*randn(m//2, 2) + (1, 1) y1 = ones((m//2)) X = vstack((X0, X1)) y = hstack((y0, y1)) print(X0.shape, X1.shape, X.shape) scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired); @savefig scatterlogregr.png show() We will use a logistic regression classifier from sklearn. We set C=10000 to effectively switch off regularization. .. ipython:: python :okwarning: from sklearn.linear_model import LogisticRegression logregr = LogisticRegression(C=10000, fit_intercept=True) logregr.fit(X,y) xmin = X[:,0].min() - 0.5 xmax = X[:,0].max() + 0.5 ymin = X[:,1].min() - 0.5 ymax = X[:,1].max() + 0.5 mx, my = meshgrid( arange(xmin, xmax, 0.1), arange(ymin, ymax, 0.1) ) Z = logregr.predict(c_[mx.ravel(), my.ravel()]) Z = Z.reshape(mx.shape) pcolormesh(mx, my, Z, cmap=cm.Paired); scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired); @savefig logregrdecision.png show() The figure above shows a well-known toy problem for logistic regression. The two classes are nicely linear seperable (with the exception of a few 'outliers'). But now consider the classical XOR problem (well, we have added some noise to it...) .. ipython:: python :okwarning: from matplotlib.pylab import * m = 100 X0 = 1.5*randn(m//2, 2) + (9, 9) y0 = zeros((m//2)) X1 = 1.5*randn(m//2, 2) + (1, 1) y1 = zeros((m//2)) X = vstack((X0, X1)) y = hstack((y0, y1)) X0 = 1.5*randn(m//2, 2) + (1, 9) y0 = ones((m//2)) X1 = 1.5*randn(m//2, 2) + (9, 1) y1 = ones((m//2)) X = vstack((X,X0,X1)) y = hstack((y,y0,y1)) scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired); @savefig scatterlogregrXOR.png show() Evidently this is not a linearly seperable classification dataset. We can however add non linear (monomial) features to our dataset. Instead of only $x_1$ and $x_2$ we add $x_1^2$, $x_1 x_2$ and $x_2^2$ (note that the bias is added automagically by sklearn. .. ipython:: python :okwarning: Xe = append(X,(X[:,0]**2)[:,newaxis],axis=1) Xe = append(Xe,(X[:,1]**2)[:,newaxis],axis=1) Xe = append(Xe,(X[:,0]*X[:,1])[:,newaxis],axis=1) logregr.fit(Xe,y) mxr = mx.ravel() myr = my.ravel() Z = logregr.predict(c_[mxr, myr, mxr**2, myr**2, mxr*myr]) Z = Z.reshape(mx.shape) pcolormesh(mx, my, Z, cmap=cm.Paired); scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired); @savefig logregrdecisionXOR.png show()