6.3.5. Extended Features in Logistic Regression

A logistic regression classifier in basic form finds a (hyper) plane in feature space that best separates the two classes. Consider the data shown in the figure below.

In [1]: from matplotlib.pylab import *

In [2]: m = 100

In [3]: X0 = 4*randn(m//2, 2) + (9, 9)

In [4]: y0 = zeros((m//2))

In [5]: X1 = 4*randn(m//2, 2) + (1, 1)

In [6]: y1 = ones((m//2))

In [7]: X = vstack((X0, X1))

In [8]: y = hstack((y0, y1))

In [9]: print(X0.shape, X1.shape, X.shape)
(50, 2) (50, 2) (100, 2)

In [10]: scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired);

In [11]: show()

We will use a logistic regression classifier from sklearn. We set C=10000 to effectively switch off regularization.

In [12]: from sklearn.linear_model import LogisticRegression

In [13]: logregr = LogisticRegression(C=10000, fit_intercept=True)

In [14]: logregr.fit(X,y)
Out[14]: LogisticRegression(C=10000)

In [15]: xmin = X[:,0].min() - 0.5

In [16]: xmax = X[:,0].max() + 0.5

In [17]: ymin = X[:,1].min() - 0.5

In [18]: ymax = X[:,1].max() + 0.5

In [19]: mx, my = meshgrid( arange(xmin, xmax, 0.1),
   ....:                    arange(ymin, ymax, 0.1) )
   ....: 

In [20]: Z = logregr.predict(c_[mx.ravel(), my.ravel()])

In [21]: Z = Z.reshape(mx.shape)

In [22]: pcolormesh(mx, my, Z, cmap=cm.Paired);

In [23]: scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired);

In [24]: show()

The figure above shows a well-known toy problem for logistic regression. The two classes are nicely linear seperable (with the exception of a few ‘outliers’). But now consider the classical XOR problem (well, we have added some noise to it…)

In [25]: from matplotlib.pylab import *

In [26]: m = 100

In [27]: X0 = 1.5*randn(m//2, 2) + (9, 9)

In [28]: y0 = zeros((m//2))

In [29]: X1 = 1.5*randn(m//2, 2) + (1, 1)

In [30]: y1 = zeros((m//2))

In [31]: X = vstack((X0, X1))

In [32]: y = hstack((y0, y1))

In [33]: X0 = 1.5*randn(m//2, 2) + (1, 9)

In [34]: y0 = ones((m//2))

In [35]: X1 = 1.5*randn(m//2, 2) + (9, 1)

In [36]: y1 = ones((m//2))

In [37]: X = vstack((X,X0,X1))

In [38]: y = hstack((y,y0,y1))

In [39]: scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired);

In [40]: show()

Evidently this is not a linearly seperable classification dataset. We can however add non linear (monomial) features to our dataset. Instead of only \(x_1\) and \(x_2\) we add \(x_1^2\), \(x_1 x_2\) and \(x_2^2\) (note that the bias is added automagically by sklearn.

In [41]: Xe = append(X,(X[:,0]**2)[:,newaxis],axis=1)

In [42]: Xe = append(Xe,(X[:,1]**2)[:,newaxis],axis=1)

In [43]: Xe = append(Xe,(X[:,0]*X[:,1])[:,newaxis],axis=1)

In [44]: logregr.fit(Xe,y)
Out[44]: LogisticRegression(C=10000)

In [45]: mxr = mx.ravel()

In [46]: myr = my.ravel()

In [47]: Z = logregr.predict(c_[mxr, myr, mxr**2, myr**2, mxr*myr])

In [48]: Z = Z.reshape(mx.shape)

In [49]: pcolormesh(mx, my, Z, cmap=cm.Paired);

In [50]: scatter(X[:,0], X[:,1], c=y, edgecolors='k', cmap=cm.Paired);

In [51]: show()