BLOG - SAVAN VISALPARA

[ Musings on Machine Learning ]
Posted on 18 June, 2017

Practical Machine Learning With Python - Part 3

In part-1 and part-2, I explained various supervised learning algorithms. In this part, we will see few unsupervised learning algorithms and the popular supervised learning algorithm called Neural Networks. Check out Github repository of this series for more.

Index

  • K-nearest neighbors

  • K-means Clustering

  • Principal Component Analysis

  • Neural Networks

K-nearest neighbors

K-nearest neighbors(KNN for short) is one of the simplest Machine Learning algorithm. KNN is a supervised learning algorithm which can be used for both classification and regression. This is slightly different from the algorithms that we have seen so far. Let me explain this algorithm with an example of classification problem.

First step in KNN is to plot training data in a feature space. For simplicity, let us plot a sample training data in two dimensional space as shown below. Please note, in this example we have two possible classes-green and red.

title

There is no explicit training phase in KNN. Our model simply memorizes these points. KNN is a lazy learner, that is, it does not use training data to do any generalization. Given any data point in a feature space we classify that point by taking into account the class of k nearest data points. For example, if we want to classify blue point as shown in following figure- we consider k nearest data points and we assign a class by majority role.

title

Let us take k = 3 as shown below.

title

As we can see, there are two data points with green class and one data point with red class . Hence, we assign green class to new point(blue).

Just to make it concrete let us take an another example. Let us change the position of new point(blue point) as shown below.

title

If we take k = 5 then we get four neighbors with red class and one neighbor with green class. Hence, new point will be classified as red point.

title

In case of regression(when output/dependent variable is any real value), output will be an average of K nearest neighbors.

But wait, How do we decide the value for K? and How to measure the distance between data points?</i>. A small value of k means that noise will have a higher influence on the result and large value makes it computationally expensive. A rule of thumb is to set k equals to the square root of total number of data points. I will recommend to test your model for various K values. You can use cross-validation to find out best k value. For more on choosing best value of k, refer this stackoverflow thread.

There are various options available for distance metric such as euclidian or manhattan distance. Generally used metric is Euclidian distance.

title

Minkowski is the generalization of Euclidian and Manhattan distance. Now, let us implement KNN algorithm using scikit-learn.

Load the dataset

In [1]:
from sklearn.datasets import load_iris
dataset = load_iris()
X = dataset.data
y = dataset.target

Perform preprocessing

In [2]:
#standardize the data to make sure each feature contributes equally to the distance
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X_processed = ss.fit_transform(X)
In [3]:
#split the dataset into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.3, random_state=42)

Fit the dataset

In [4]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors = 5, metric="minkowski", p=2) #p=2 for euclidian distance
model.fit(X_train, y_train)
Out[4]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')
In [5]:
model.score(X_test, y_test)
Out[5]:
1.0

Before moving on to the next topic, let me briefly explain parametric and non-parametric models. In parametric model, we continously update parameters(finite) to learn a function which can classify new data point without requiring the training data(i.e Logistic Regression). In non-parametric model, number of parameters grows with the number of training data(i.e KNN).

K-means Clustering

K-means is one of the simplest unsupervised learning algorithm used for clustering problem. Clustering is a process that finds groups of similar objects. So, in clustering our goal is to group objects based on their features similarity. K-means clutering is very easy to understand, very easy to implement and computationally efficient clutering algorithm. Now, let us see how it works.

Basic idea behind K-means is, we define k centroids, that is, one for each cluster. Here, k is the hyperparameter and we should be very careful about it. Usually, you should try range of values to determine best value of k. Where do we place them initially? Common choice is to place them as fas ar possible. Now, assign each data point to the nearest centroid. Once each data point has been assigned to one of the centroids, our next step is to recalculate k new centroids. How do we do that? We do it by moving centroid(old) to the center of the data samples that were assigned to it. And how do we do find center? We find it by taking the mean of data points in a particular cluster.

K-means clutering aims to find positions $ \mu_i $, i=1,2,..,k of the clusters the minimize the distance from the data points to the cluster. Mathematically, we can write this as:

title

If you are curious and want to know more about K-means, check this out.

In [6]:
from sklearn.datasets import load_digits
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

digits = load_digits()
dataset = digits.data

#standardize
ss = StandardScaler()
dataset = ss.fit_transform(dataset)

model = KMeans(n_clusters= 10, init="k-means++", n_init=10)
model.fit(dataset)
Out[6]:
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=10, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)
In [7]:
model.labels_   #assigned label(cluster) to each data point
Out[7]:
array([0, 9, 9, ..., 9, 3, 3])
In [8]:
model.inertia_  #sum of distances of samples to their closest centroid
Out[8]:
69422.941195144551
In [9]:
model.cluster_centers_
Out[9]:
array([[  0.00000000e+00,  -3.10238752e-01,  -2.06177764e-01,
          3.03168840e-01,  -1.40191740e-01,  -5.03966912e-01,
         -3.99585759e-01,  -1.25022923e-01,  -5.90775571e-02,
         -3.46211086e-01,   4.17672645e-01,   3.49905412e-01,
          2.38843406e-01,   5.24264003e-01,  -2.45478180e-01,
         -1.30433381e-01,  -4.46250733e-02,   3.20429312e-01,
          7.71457454e-01,  -2.98969956e-01,  -8.06601146e-01,
          6.93198664e-01,   5.32051362e-01,  -1.14221844e-01,
         -3.33797263e-02,   9.07972853e-01,   5.85143100e-01,
         -1.15219864e+00,  -1.57661191e+00,   2.57254863e-01,
          1.12870850e+00,  -4.72323823e-02,   0.00000000e+00,
          1.02784152e+00,   6.15056355e-01,  -1.30867433e+00,
         -1.73098189e+00,   5.32680477e-03,   1.20443628e+00,
          0.00000000e+00,  -6.13436689e-02,   6.54131511e-01,
          9.81663232e-01,  -8.72074591e-01,  -9.95308509e-01,
          5.43552357e-01,   5.58944660e-01,  -8.87416172e-02,
         -3.54332626e-02,   5.98517672e-02,   9.90363911e-01,
          7.42005066e-02,   1.70542961e-01,   7.50982414e-01,
         -2.66237735e-01,  -2.09785127e-01,  -2.35964589e-02,
         -2.93066651e-01,  -2.70076231e-01,   3.39317333e-01,
          3.10906680e-01,  -2.21901899e-01,  -4.38354242e-01,
         -1.96007519e-01],
       [  0.00000000e+00,  -2.48778896e-01,  -3.13284386e-02,
          3.39832350e-01,   5.99812732e-01,   9.35938069e-01,
          8.99193990e-01,   1.38929014e-01,  -5.90775571e-02,
         -3.30279093e-01,   1.00567490e-01,  -1.71842521e-01,
          2.56998894e-02,   7.56591107e-01,   7.90766582e-01,
         -6.29367542e-02,  -4.46250733e-02,  -4.44824847e-01,
         -8.49075514e-01,  -8.62592381e-01,   3.66182329e-02,
          6.91304073e-01,   2.39737629e-01,  -1.01480897e-01,
         -3.33797263e-02,  -4.95622131e-01,  -7.40331236e-01,
         -3.50533492e-01,   4.84185824e-01,   7.63431257e-01,
          6.39878057e-01,  -4.72323823e-02,   0.00000000e+00,
         -2.53306066e-01,   1.79860699e-01,   7.13318185e-01,
          7.80760752e-01,   3.31083390e-01,   3.35288112e-01,
          0.00000000e+00,  -6.13436689e-02,  -1.67713548e-01,
         -2.69907308e-01,   6.90699724e-01,   5.46958552e-01,
         -7.81320934e-01,  -6.58921821e-01,  -8.87416172e-02,
         -3.54332626e-02,  -3.74772625e-01,  -8.15521711e-01,
          5.81033571e-01,  -6.16048738e-01,  -1.37385503e+00,
         -7.56299870e-01,  -2.09785127e-01,  -2.35964589e-02,
         -2.39270403e-01,   1.67753635e-01,   4.60652992e-02,
         -1.92748243e+00,  -1.10687162e+00,  -5.02937578e-01,
         -1.96007519e-01],
       [  0.00000000e+00,  -3.35016487e-01,  -8.46210715e-01,
         -1.59056185e-01,  -5.22688625e-01,  -7.37822792e-01,
         -3.89783668e-01,  -1.25022923e-01,  -5.90775571e-02,
         -6.06718516e-01,  -5.92447335e-01,   6.42912808e-01,
         -8.74210689e-01,  -1.19770976e+00,  -4.99581795e-01,
         -1.30433381e-01,  -4.46250733e-02,  -5.14397141e-01,
          4.54022264e-01,   4.11727360e-01,  -1.00915576e+00,
         -1.22966332e+00,  -5.47110181e-01,  -1.14221844e-01,
         -3.33797263e-02,  -3.16327985e-02,   7.24247900e-01,
         -1.41536524e-01,  -9.79865866e-01,  -9.44647267e-01,
         -6.00412809e-01,  -4.72323823e-02,   0.00000000e+00,
          3.31165374e-01,   1.09505245e+00,   6.06567113e-01,
          3.30092354e-01,   2.94959312e-01,  -3.21559847e-02,
          0.00000000e+00,  -6.13436689e-02,   9.51539446e-02,
          1.16095274e+00,   5.20523051e-01,  -3.74936250e-01,
          3.51437353e-01,   1.35310502e+00,   6.66437638e-01,
         -3.54332626e-02,  -2.95968375e-01,   4.79004872e-01,
          5.68186611e-01,  -7.91471944e-01,   4.08411624e-01,
          1.47024181e+00,   3.23542269e-01,  -2.35964589e-02,
         -2.99081347e-01,  -8.01313959e-01,  -3.07119724e-01,
          6.61459243e-01,   1.07772785e+00,   6.22429399e-01,
         -1.45500719e-01],
       [  0.00000000e+00,  -1.24305799e-02,   4.96215806e-01,
          3.53110675e-01,   2.25110973e-01,   1.45186407e-01,
         -1.86854539e-01,  -1.20398307e-01,   1.72982487e-02,
          3.95933280e-01,   6.05534479e-01,  -7.50472654e-01,
         -9.00234554e-02,   3.84274417e-01,  -1.27052264e-01,
         -1.21741369e-01,  -6.16403890e-03,   2.41565240e-01,
         -6.89695804e-02,  -4.91924477e-01,   2.35053016e-01,
          3.32778512e-01,  -1.43356518e-01,  -1.14221844e-01,
         -3.33797263e-02,  -2.47650723e-01,  -2.89404192e-01,
          3.47192052e-01,   5.98664065e-01,   3.64367477e-01,
         -1.98137481e-01,  -4.72323823e-02,   0.00000000e+00,
         -5.71745867e-01,  -7.98446331e-01,  -5.56382786e-01,
         -3.04325677e-01,   5.38022381e-01,   2.05355506e-01,
          0.00000000e+00,  -6.13436689e-02,  -4.21073208e-01,
         -7.57850219e-01,  -9.31789899e-01,  -8.72707166e-01,
          6.00317179e-01,   6.83862309e-01,  -5.75237469e-02,
         -3.54332626e-02,   7.44843465e-02,   4.40267471e-02,
         -7.47974689e-01,  -5.86595988e-01,   6.69258586e-01,
          4.13314571e-01,  -1.51302880e-01,  -2.35964589e-02,
         -5.26086224e-02,   4.53369124e-01,   4.59424900e-01,
          4.75279385e-01,   3.91235600e-01,  -9.28352914e-02,
         -1.80532731e-01],
       [  0.00000000e+00,  -3.35016487e-01,  -1.02349003e+00,
         -1.08368694e+00,  -2.70686387e-03,  -6.73153100e-01,
         -3.77566606e-01,  -1.25022923e-01,  -5.90775571e-02,
         -6.22040945e-01,  -1.30752320e+00,   4.27477083e-01,
         -3.59648672e-01,  -1.08398573e+00,  -3.69394370e-01,
         -1.07637350e-01,  -4.46250733e-02,  -5.71086697e-01,
          1.15407860e-01,   7.82932031e-01,  -3.97866477e-01,
         -4.28594764e-01,   4.52786577e-01,   2.15680534e-01,
         -3.33797263e-02,   6.86425522e-01,   8.89489684e-01,
         -5.12869232e-01,  -4.98418812e-01,   4.78121868e-01,
          9.85484363e-01,   8.62213771e-02,   0.00000000e+00,
          1.90343228e+00,   1.12191439e+00,   2.06604193e-02,
          4.16729297e-01,   9.40555466e-01,   7.61949372e-01,
          0.00000000e+00,   6.31955533e-01,   1.78739457e+00,
          8.30262341e-01,   8.71889915e-01,   1.16596208e+00,
          4.48888962e-01,  -3.94454641e-01,  -8.87416172e-02,
          2.72614986e-01,   2.95370360e-01,  -7.50765880e-01,
         -2.77711921e-01,   8.65951660e-01,  -7.79816114e-01,
         -7.52320510e-01,  -2.09785127e-01,  -2.35964589e-02,
         -2.72147614e-01,  -1.00678474e+00,  -9.40714876e-01,
          8.20484658e-02,  -8.47054070e-01,  -5.05669803e-01,
         -1.96007519e-01],
       [  0.00000000e+00,  -3.35016487e-01,  -8.62803144e-01,
         -8.13748649e-01,   1.07849667e-01,   7.20269184e-01,
          2.71201430e+00,   5.49412561e+00,  -5.90775571e-02,
         -6.24009262e-01,  -8.14913254e-01,   6.38217592e-01,
         -2.31512181e-01,  -1.60123608e-01,   3.34172841e+00,
          6.82707232e+00,  -4.46250733e-02,  -6.11909866e-01,
          1.09585655e-02,   1.37964360e-01,  -9.54168284e-01,
         -1.32684507e-02,   3.53542215e+00,   4.84024910e+00,
         -3.33797263e-02,  -6.16118082e-02,   3.19483322e-01,
         -3.56593140e-01,  -4.53495445e-01,   1.15711870e+00,
          1.52310111e+00,   2.14785187e+00,   0.00000000e+00,
          7.09179688e-02,   1.56248714e-01,  -6.09785663e-02,
          2.74690644e-01,   8.30918431e-01,   1.13409990e-01,
          0.00000000e+00,  -6.13436689e-02,  -3.69338169e-01,
         -6.62431970e-01,  -3.35299947e-01,   7.13624800e-01,
         -1.50545088e-01,  -6.94738263e-01,  -8.87416172e-02,
         -3.54332626e-02,  -4.03574986e-01,  -1.26311183e+00,
         -4.26541030e-01,   5.06963863e-01,  -1.05799840e+00,
         -7.57435810e-01,  -2.09785127e-01,  -2.35964589e-02,
         -2.99081347e-01,  -8.39292479e-01,  -3.19970182e-01,
         -8.00202306e-01,  -1.04727319e+00,  -5.05669803e-01,
         -1.96007519e-01],
       [  0.00000000e+00,   2.59745481e+00,   1.76223182e+00,
          6.64774795e-01,  -4.17933058e-01,  -6.16494880e-02,
          7.87533419e-02,  -1.11345865e-01,   9.15073651e-02,
          1.83751850e+00,   7.56458635e-01,  -1.23238110e-01,
          1.31437143e-01,  -5.23650986e-01,  -2.08382876e-01,
         -1.30433381e-01,  -4.46250733e-02,   9.54499054e-01,
         -2.48508472e-01,  -2.88502541e-01,   3.85039272e-01,
         -7.64320498e-01,  -4.59580706e-01,  -1.14221844e-01,
         -3.33797263e-02,  -3.00366196e-01,  -5.92172043e-01,
          1.06343772e-01,   1.17940916e-01,  -8.77939124e-01,
         -5.38449632e-01,  -4.72323823e-02,   0.00000000e+00,
         -5.35803675e-01,  -5.56433248e-01,   4.06155802e-01,
         -1.32148445e-01,  -1.14183825e+00,  -7.72555939e-01,
          0.00000000e+00,  -6.13436689e-02,  -3.64743958e-01,
         -2.35772902e-01,   4.99822651e-01,  -9.38233839e-02,
         -8.67348714e-01,  -5.78777939e-01,  -6.56603194e-02,
         -3.54332626e-02,   7.86809120e-01,   6.34094933e-01,
          4.26127321e-01,   2.06466167e-01,  -9.52137793e-02,
          1.78473542e-01,  -1.01686293e-01,  -2.35964589e-02,
          2.45718511e+00,   1.77858141e+00,   4.14240682e-01,
         -3.15009378e-01,   7.12573899e-02,   4.23906552e-01,
         -1.27358727e-01],
       [  0.00000000e+00,  -3.35016487e-01,  -1.06142639e+00,
         -2.22808729e+00,  -1.19405897e-01,   1.15234938e+00,
          6.89533970e-01,  -6.52917010e-02,  -5.90775571e-02,
         -5.99083045e-01,  -1.39635030e+00,  -6.62472010e-01,
          5.83818835e-01,   7.24090881e-01,   8.42558957e-01,
         -6.62817197e-02,  -4.46250733e-02,  -3.06864184e-01,
         -2.55193837e-01,   7.91461457e-01,   5.91879737e-01,
          7.68292004e-01,   5.42806384e-01,  -5.36741585e-02,
         -3.33797263e-02,   2.50175252e-01,   3.19779097e-01,
          4.50182422e-01,   4.52144990e-01,   8.30074835e-01,
         -1.10230658e-01,  -4.72323823e-02,   0.00000000e+00,
         -1.45885982e-01,  -1.12522397e-01,  -1.30077403e-01,
          3.45991942e-01,   4.22084725e-01,  -4.07280882e-01,
          0.00000000e+00,  -6.13436689e-02,  -1.48325993e-01,
         -5.16672955e-01,  -4.00979306e-01,   6.06732962e-01,
          5.22640429e-01,  -5.50960638e-01,  -8.87416172e-02,
         -3.54332626e-02,  -3.12325028e-01,  -1.09666027e+00,
         -9.93992178e-01,   4.87447430e-01,   4.27532474e-01,
         -4.17347610e-01,  -2.09785127e-01,  -2.35964589e-02,
         -2.99081347e-01,  -1.07724044e+00,  -2.16725014e+00,
         -5.81851197e-02,   6.16068959e-01,  -2.52577194e-02,
         -1.86489840e-01],
       [  0.00000000e+00,  -2.81878694e-01,   4.15680198e-01,
          4.83967410e-01,  -9.87734433e-01,  -8.14353670e-01,
         -4.09723921e-01,  -1.25022923e-01,  -5.90775571e-02,
          5.68637408e-03,   2.22891039e-01,   2.35450182e-01,
         -8.36241725e-02,  -6.56230127e-01,  -4.84747673e-01,
         -1.30433381e-01,   1.48606871e-01,   3.39365112e-02,
         -1.29104164e-01,   5.73233568e-02,   3.29540951e-01,
         -3.94618638e-01,  -4.41594170e-01,  -1.14221844e-01,
         -3.33797263e-02,  -3.86774428e-01,  -7.48496986e-01,
         -4.51038805e-01,   1.39183545e-01,  -3.85342932e-01,
         -5.43898201e-01,  -4.72323823e-02,   0.00000000e+00,
         -6.27356946e-01,  -9.97287657e-01,  -5.67083821e-01,
          3.69592313e-01,  -6.95409357e-01,  -7.81810432e-01,
          0.00000000e+00,  -6.13436689e-02,  -5.31284098e-01,
         -8.99838950e-01,  -9.71721622e-02,   7.60894662e-01,
         -6.69978513e-01,  -6.61922115e-01,  -1.03208225e-02,
         -3.54332626e-02,  -3.20753739e-01,  -7.27584494e-02,
          7.47594719e-01,   7.57934213e-01,   3.11812806e-01,
          1.10930956e+00,   2.67944925e+00,  -2.35964589e-02,
         -2.86182360e-01,   2.26056049e-01,   4.67246773e-01,
          3.68444463e-01,   1.19600763e+00,   2.90602252e+00,
          3.73020714e+00],
       [  0.00000000e+00,  -2.41103400e-01,   4.18638061e-02,
          2.54019654e-01,   2.20912153e-01,   8.63986708e-02,
         -4.72872902e-02,  -1.12855951e-01,   1.08370976e-01,
         -2.96766877e-02,   8.37016137e-02,   4.77201595e-01,
          4.25842719e-01,   1.61738708e-01,  -5.74654332e-02,
         -1.11376762e-01,   1.07156485e-01,   6.55619377e-02,
          5.41702554e-02,   4.92836779e-01,   5.17562235e-01,
         -1.29638727e-01,  -3.57147637e-01,  -1.14221844e-01,
          1.55842255e-01,  -7.91095463e-02,  -7.33467140e-02,
          6.69193759e-01,   4.25255948e-01,  -6.65643656e-01,
         -6.04073094e-01,  -4.72323823e-02,   0.00000000e+00,
         -3.77713577e-01,  -5.01453528e-02,   7.27019421e-01,
          3.47474403e-01,  -1.07437673e+00,  -8.16450127e-01,
          0.00000000e+00,  -6.13436689e-02,  -2.81541434e-01,
          1.04043911e-01,   5.67602212e-01,   5.88241670e-01,
         -6.90851159e-01,  -7.25414069e-01,  -8.87416172e-02,
          1.09197388e-02,  -1.68581116e-02,   1.93030815e-01,
          4.37109328e-01,   5.30892528e-01,  -4.63255290e-01,
         -5.81042693e-01,  -2.06579673e-01,   1.10166433e-01,
         -2.07893242e-01,   3.35405862e-02,   3.21536155e-01,
          7.35914827e-03,  -4.24178630e-01,  -3.43675618e-01,
         -1.96007519e-01]])
In [10]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(digits.images[1], cmap='gray')
Out[10]:
<matplotlib.image.AxesImage at 0x26a1d1031d0>
In [11]:
model.predict(dataset[1].reshape(1,-1)) #should be one
Out[11]:
array([9])
In [12]:
#lets try again
plt.imshow(digits.images[333], cmap='gray')
Out[12]:
<matplotlib.image.AxesImage at 0x26a1d1b21d0>
In [13]:
model.predict(dataset[333].reshape(1,-1))  #should be 2
Out[13]:
array([6])

Principal Component Analysis

Principal Component Analysis is a linear dimensionality reduction algorithm. Before moving forward, it's important to mention the curse of dimensionality. Curse of dimensionality refers to how certain algorithms perform poorly on high-dimensional data. High dimensionality makes clustering hard, because having lots of dimensions means that everything is far away from each other. It's hard to know what true distance means when you have so many dimensions.

Reducing the dimensions is same as reducing the features of a dataset. We can do this by feature selection or feature extraction. PCA is a feature extraction technique, that is, we project original dataset(with high-dimensionality) onto a new feature space(with low-dimensionality). We can also think of it as a data compression technique while maintaining the most relevant information. PCA aims to find the directions of maximum variance in high-dimensional data and project it onto a new subspace with fewer dimensions.

Given the input data X(d-dimensional), we want to find a projection matrix (P) such that X.W = N (n-dimensional) and please note n <= d . We construct project matrix P using eigenvectors. Eigenvectors can be calculated by decomposing a covariance matrix or we can perform singular value decomposition. One important thing to note here is that PCA is sensitive to data scale, hence we should standardize data before going through PCA. We can summarize this process as follows:

  • Standardize the dataset.
  • Find eigenvector and eigenvalues by decomposing a covariance matrix or use singular value decomposition.
  • Sort eignevalues in descending order and select n eigenvectors corresponding to n largest eigenvalues.
  • Construct the projection matrix P from eigenvectors.
  • Project dataset X onto low-dimensional space by multiplying X and W
In [14]:
from sklearn.decomposition import PCA
import numpy as np

#lets create features
x1 = np.random.normal(size=200) 
x2 = np.random.normal(size=200)
x3 = x1 + x2  #not useful since its highly correlated with other features.
X = np.c_[x1,x2,x3]

pca = PCA()
pca.fit(X)
Out[14]:
PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)
In [15]:
pca.explained_variance_   #third feature is clearly useless
Out[15]:
array([  2.96131853e+00,   1.06196635e+00,   3.34168017e-32])
In [16]:
pca.n_components_  #still 3, because we have not specify no of components to keep in PCA() method
Out[16]:
3
In [17]:
pca2 = PCA(n_components=2)
pca2.fit(X)
Out[17]:
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)
In [18]:
pca2.n_components_
Out[18]:
2
In [19]:
X_processed = pca2.fit_transform(X)
In [20]:
X.shape
Out[20]:
(200, 3)
In [21]:
X_processed.shape
Out[21]:
(200, 2)

PCA is not the only Dimensionality Reduction algorithm. Check out Linear Discriminant Analysis.

Neural Networks

Neural Networks are very powerful supervised learning algorithms which can be used for Classification as well as Regression problems. In most cases, we have classification problem. It won't be wrong to say Neural Networks are the reason behind the hype of Machine Learning. Neural Networks or Artificial Neural Networks are also known as Universal Function Approximator. Read this chapter of neural networks and deep learning book. In this part, we will see what is neural networks, and how they work. Also, we will implement Neural Network using TensorFlow.

Neural Network is an information processing system, that is, we pass some input to the Neural Network, some processing happens and we get some output. Neural Networks are inspired from biological connection of neurons and how information processing happens in the brain. For more on biological neural networks see here and here. Very simple architecture of a neural network is shown below.

title

Neural Network is made up of many neurons. We can think of each neuron as a single processing unit which accepts some input and produces some output. As we can see in above figure, there are three layers-input layer, hidden layer and output layer. You might have one question-why it is called hidden? The answer is- it's called hidden because it's not visible(preceded by input layer and followed by output layer). Each edge you see in figure has some weight associated with it. We can think of it as a importance of a particular feature.

We pass input vector/matrix to the input layer then it will be multiplied by a weight matrix. Then it will be passed through an activation function which is used to introduce non-linearity into the network. If we use sigmoid activation function then input value will be squashed in between [0,1] and in case of tanh - [-1,1]. Common choices for activation functions are :

![title](./images/nn3.png)

Commonly used activation function is Relu since it has come nice property such as fast to converge and it also helps in regularization. Also, SELU (Scaled Exponential Linear Unit), which recently came into the play, has started getting a lot of buzz. Check out this Github repo for implementation and comparison of SELU with other activation functions.

Now, let us dive deeper into neural networks. Suppose, we have an input vector $ {x_1,x_2,x_3} $ and let us denote hidden units with $ {h_1,h_2,h_3,h_4} $ and output units with ${o_1,o_2} $ then \begin{align} h_1 = f(W_{11} x_1 + W_{21} x_2 + W_{31} x_3) \newline h_2 = f(W_{12} x_1 + W_{22} x_2 + W_{32} x_3) \newline h_3 = f(W_{13} x_1 + W_{23} x_2 + W_{33} x_3) \newline h_4 = f(W_{14} x_1 + W_{24} x_2 + W_{34} x_3) \newline \end{align}

Here, f is the activation function. We will use RELU. These equations are pretty straight forward. Now, we have activations for hidden layer neurons, we need to find the output. In case of classification, we usually have C number of output neurons. Where C is the number of classes and each output neuron gives the probability of input belonging to a particular class. In this example, we have two classes and hence two neurons in output layer. Same way we can find activations for output layer. Now, how do we map output layer activations to the probabilities? For this, we use softmax function. Softmax function squashes K-dimensional vector of real values to K-dimensional vector of real values in range (0,1] that add up to 1. Softmax function can be written as :

title

Here, denominator acts as a normalizer such that output vector add up to 1. So, now we have our output. Please note, there are two pass in neural network algorithm- forward and backward. In forward pass, we calculate the output and in backward pass- we calculate the gradient of cost function with respect to the parameters of the neural network. But, how does learning work in Neural Network? We use gradient descent for learning in Neural Network and popular backpropagation algorithm to find gradients. Before that we need to define our cost function. Here, we will use cross-entropy as the cost function. We define cross-entropy cost function as:

![title](./images/nn5.png)

If you remember, we update our parameters by substracting gradient of the cost function w.r.t a particular parameter multiplied by a learning rate. Here, I am not going into the details of gradient calculation but its easy to understand if you have basic knowledge of derivation and chain rule. Check this out if you are curious. Essentially, what we do is- calculate gradient of cost function with respect to each parameter. Note that, for optimization purpose we do not deal with individual weight/parameter rather we use vectors or matrices. For i.e, we represent weights of input to hidden layer as a vector. Also, I omitted bias term in above equations for simplicity purpose. Now, we will implement a vanilla neural network using TensorFlow. If you haven't used TensorFlow before than check out Github repo

In [1]:
#read the dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
In [2]:
#create placeholders to store input and output data
import tensorflow as tf
X = tf.placeholder(tf.float32, shape=[None, 784])  #28* 28 = 784
y = tf.placeholder(tf.float32, shape=[None, 10])  #10 classes
In [3]:
#create weights and bias
w1 = tf.Variable(tf.truncated_normal([784, 50], stddev=0.5))
b1 = tf.Variable(tf.ones([50]))
In [4]:
#for hidden to output layer
w2= tf.Variable(tf.truncated_normal([50,10], stddev=0.5))
b2= tf.Variable(tf.ones([10]))
In [5]:
h = tf.nn.relu(tf.matmul(X,w1)+b1)
o = tf.nn.relu(tf.matmul(h, w2)+b2)
In [19]:
#cost function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = o))
step = tf.train.GradientDescentOptimizer(0.2).minimize(cost)
In [20]:
#find accuracy
correct_prediction = tf.equal(tf.argmax(o,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
In [21]:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

for i in range(30000): #increase the number of iterations
    train_data = mnist.train.next_batch(128)
    _, t_loss = sess.run([step, cost], feed_dict={X:train_data[0], y:train_data[1]})
    if i%500 == 0:
        acc = sess.run([accuracy], feed_dict={X:mnist.test.images, y:mnist.test.labels})
        print ("Step = {}, Accuracy = {}".format(i,acc))
Step = 0, Accuracy = [0.1367]
Step = 500, Accuracy = [0.63599998]
Step = 1000, Accuracy = [0.65100002]
Step = 1500, Accuracy = [0.66369998]
Step = 2000, Accuracy = [0.82440001]
Step = 2500, Accuracy = [0.83740002]
Step = 3000, Accuracy = [0.84259999]
Step = 3500, Accuracy = [0.8488]
Step = 4000, Accuracy = [0.85290003]
Step = 4500, Accuracy = [0.85439998]
Step = 5000, Accuracy = [0.85579997]
Step = 5500, Accuracy = [0.85879999]
Step = 6000, Accuracy = [0.8599]
Step = 6500, Accuracy = [0.86080003]
Step = 7000, Accuracy = [0.86189997]
Step = 7500, Accuracy = [0.86330003]
Step = 8000, Accuracy = [0.86400002]
Step = 8500, Accuracy = [0.86330003]
Step = 9000, Accuracy = [0.8664]
Step = 9500, Accuracy = [0.86650002]
Step = 10000, Accuracy = [0.86739999]
Step = 10500, Accuracy = [0.8664]
Step = 11000, Accuracy = [0.86549997]
Step = 11500, Accuracy = [0.86750001]
Step = 12000, Accuracy = [0.86659998]
Step = 12500, Accuracy = [0.86860001]
Step = 13000, Accuracy = [0.86940002]
Step = 13500, Accuracy = [0.86799997]
Step = 14000, Accuracy = [0.86839998]
Step = 14500, Accuracy = [0.86930001]
Step = 15000, Accuracy = [0.86659998]
Step = 15500, Accuracy = [0.86949998]
Step = 16000, Accuracy = [0.86949998]
Step = 16500, Accuracy = [0.8689]
Step = 17000, Accuracy = [0.87099999]
Step = 17500, Accuracy = [0.86930001]
Step = 18000, Accuracy = [0.87010002]
Step = 18500, Accuracy = [0.8696]
Step = 19000, Accuracy = [0.87019998]
Step = 19500, Accuracy = [0.86849999]
Step = 20000, Accuracy = [0.87129998]
Step = 20500, Accuracy = [0.87029999]
Step = 21000, Accuracy = [0.87]
Step = 21500, Accuracy = [0.87040001]
Step = 22000, Accuracy = [0.86979997]
Step = 22500, Accuracy = [0.86970001]
Step = 23000, Accuracy = [0.86979997]
Step = 23500, Accuracy = [0.87110001]
Step = 24000, Accuracy = [0.8696]
Step = 24500, Accuracy = [0.86970001]
Step = 25000, Accuracy = [0.87040001]
Step = 25500, Accuracy = [0.87010002]
Step = 26000, Accuracy = [0.87110001]
Step = 26500, Accuracy = [0.86970001]
Step = 27000, Accuracy = [0.87]
Step = 27500, Accuracy = [0.87]
Step = 28000, Accuracy = [0.87080002]
Step = 28500, Accuracy = [0.87080002]
Step = 29000, Accuracy = [0.86940002]
Step = 29500, Accuracy = [0.8696]

This was very simple neural network. Please note, we can make few changes in our implementation to get accuracy more than 95%(maybe 99%). Few tweaks to get higher accuracy are- use different optimizer(i.e Adam), use dropout(to prevent overfitting), learning rate decay(see this for more), or use convolutional neural network( CNN Tutorial ).

In [ ]:
 

Share this post on social media :