Sunday, 24 May 2020

Begin Machine Learning as a software engineer

In this post, i am writing about how to start applying machine learning (ML) in software program as a software engineer.
The picture below shows that in ML, we build ML model using data and results. This stands in contrast to traditional programming where we use software program to do the actual computation.
After the model is built, we use it to make predictions.
For sure, we can always create our own ML model. In order to do that, we have to get a rather good understanding of ML algorithms. If we are not able or not willing to create our own ML model (such as when we want to apply ML to a practical solution), we can re-use existing models in the ML libraries.
In machine learning (ML) programming, we choose the existing model, build the model architecture (such as a classifier), feed training data to the model, and use the trained model to make decisions on newly arrived data.
The difference between algorithm and model is:
This is the algorithm of linear regression with one variable 𝑦=𝑤0+𝑤1x
This is the model after applying data and results 𝑦=(5)+(-2)x
The purpose of training a model, is to adjust the model parameters so that the model fit well with the user supplied data.
We choose keras library as a starting point. There is Sequential model in the keras library. For this article, we look at the bank customer data and decides if the customer would stop using the bank’s services. This is a classification problem. We will use the Sequential model to do the classification.
In part one, we will do data preprocessing. Firstly, we import the numpy and pandas library. Pandas is a data manipulation library.
import numpy as np
import pandas as pd
Secondly, we import the dataset using pandas.
dataset = pd.read_csv(‘Churn_Modelling.csv’)
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
Then, we encode the categorical data (one hot encoding).
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
We split the dataset into the training set and test set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
After that, we apply feature scaling to the data so that data is adjusted to a particular range.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
In part two, we will make an artificial neural network. Firstly, we import keras library.
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
Secondly, we use sequential model to build a ML classifier. Sequential model is a model with a sequence of layers; input, hidden, output layers.
classifier = Sequential()
We add the input layer and the first hidden layer.
classifier.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = 11))
We add the second hidden layer.
classifier.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’))
Lastly, we add the output layer.
classifier.add(Dense(units = 1, kernel_initializer = ‘uniform’, activation = ‘sigmoid’))
Now, compile the classifier. The adam optimizer is an adaptive moment estimator. N-th moment of random variable is the expected value of the variable to power of N. Optimizer decides how network weight will be updated.
classifier.compile(optimizer = ‘adam’, loss = ‘binary_crossentropy’, metrics = [‘accuracy’])
For binary classification, the binary_crossentropy loss function is suitable and should be used.
We fit the training set to the classifier.
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
In part three, we use the classifier to make predictions and evaluating the model. Firstly, we feed the test data to the classifier.
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
Secondly, we use the confusion matrix to evaluate the model.
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
When i examine the cm value, it is shown as:
It means out of 2000 samples, it makes correct predictions 1595 times.
The python and CSV files can be found at: