We run our ninth and final ML model, a neural network. We test different architectures, layers, loss functions, metrics, optimisers, regularization, dropout rates, batch normalizations, activation functions and epochs. We also visualise various accuracy scores, the confusion matrix and the ROC curve. We end by dumping our best model for further comparison.

Neural Network

We will be running a neural network with various architectures to attempt to optimise its performance.

%run /Users/thomasadler/Desktop/futuristic-platipus/capstone/notebooks/ta_01_packages_functions.py
/Users/thomasadler/opt/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

Preparing data

modelling_df=pd.read_csv(data_filepath + 'master_modelling_df.csv', index_col=0)

#check
modelling_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 107184 entries, 0 to 108905
Data columns (total 32 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   lat_deg                   107184 non-null  float64
 1   lon_deg                   107184 non-null  float64
 2   is_functioning            107184 non-null  int64  
 3   distance_to_primary       107184 non-null  float64
 4   distance_to_secondary     107184 non-null  float64
 5   distance_to_tertiary      107184 non-null  float64
 6   distance_to_city          107184 non-null  float64
 7   distance_to_town          107184 non-null  float64
 8   usage_cap                 107184 non-null  float64
 9   is_complex_tech           107184 non-null  int64  
 10  is_installed_after_2006   107184 non-null  int64  
 11  is_public_management      107184 non-null  int64  
 12  crucialness               107184 non-null  float64
 13  perc_hh_head_male         107184 non-null  float64
 14  perc_pop1318_secondary    107184 non-null  float64
 15  perc_pop017_certificate   107184 non-null  float64
 16  perc_pop017_both_parents  107184 non-null  float64
 17  perc_pop2p_disability     107184 non-null  float64
 18  perc_pop1017_married      107184 non-null  float64
 19  perc_pop1217_birth        107184 non-null  float64
 20  perc_pop1464_working      107184 non-null  float64
 21  perc_hh_temp_dwelling     107184 non-null  float64
 22  perc_hh_mosquito_net      107184 non-null  float64
 23  perc_hh_toilet            107184 non-null  float64
 24  perc_hh_own_house         107184 non-null  float64
 25  perc_hh_bank_acc          107184 non-null  float64
 26  perc_hh_electricity       107184 non-null  float64
 27  total_events_adm4         107184 non-null  float64
 28  perc_local_served         107184 non-null  float64
 29  is_central                107184 non-null  int64  
 30  is_eastern                107184 non-null  int64  
 31  is_western                107184 non-null  int64  
dtypes: float64(25), int64(7)
memory usage: 27.0 MB
Image(dictionary_filepath+"5-Modelling-Data-Dictionary.png")
X =modelling_df.loc[:, modelling_df.columns != 'is_functioning']
y = modelling_df['is_functioning']

#check
print(X.shape)
print(y.shape)
(107184, 31)
(107184,)

Our independent variable (X) should have the same number of rows (107,184) than our dependent variable (y). y should only have one column as it is the outcome variable.

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=rand_seed)
sm = SMOTE(random_state=rand_seed)
X_train_res, y_train_res = sm.fit_resample(X_train, y_train)

#compre resampled dataset
print(f"Test set has {round(y_test.value_counts(normalize=True)[0]*100,1)}% non-functioning water points and {round(y_test.value_counts(normalize=True)[1]*100,1)}% functioning")
print(f"Original train set has {round(y_train.value_counts(normalize=True)[0]*100,1)}% non-functioning water points and {round(y_train.value_counts(normalize=True)[1]*100,1)}% functioning")
print(f"Resampled train set has {round(y_train_res.value_counts(normalize=True)[0]*100,1)}% non-functioning water points and {round(y_train_res.value_counts(normalize=True)[1]*100,1)}% functioning")
Test set has 19.2% non-functioning water points and 80.8% functioning
Original train set has 19.7% non-functioning water points and 80.3% functioning
Resampled train set has 50.0% non-functioning water points and 50.0% functioning

We over-sample the minority class, non-functioning water points, to get an equal distribution of our outcome variable. Note this should be done on the train set and not the test set as we should not tinker with the latter.

X_train_res_scaled, X_test_scaled = scaling(StandardScaler(), X_train_res, X_test)

We also need to scale the data as this should improve the accuracy of our neural network.

We will be testing various neural networks with different parameters and attempt to infer which ones respond best and provide the best accuracy score on the test set.

1. Baseline model: choosing loss function and metric

We will try running a neural network for our classification problem. Neural networks are good at identifying non-linear and complex relationships between things. Let's see if they add anything to our problem.

#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="relu"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
2022-08-04 17:29:06.631191: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Accuracy score on Train set: 0.7586790323257446
Accuracy score on Test set: 0.7305592894554138

We try a first model with baseline, common parameters such as the Adam optimiser, Binary cross entropy as the loss function and the binary accuracy as our metric. The model has no hidden layer and just one input and output layer. One thing we can see is that the difference between the train and test accuracy is very low.

2. Choosing layers

#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="relu"))

#hidden layer
NN.add(layers.Dense(16, activation="relu"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7769299745559692
Accuracy score on Test set: 0.7187572717666626

We add a hidden layer of 16 nodes and the accuracy score for test set improves by around 3 percentage points.

#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="relu"))

#hidden layers
NN.add(layers.Dense(16, activation="relu"))
NN.add(layers.Dense(8, activation="relu"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7799791097640991
Accuracy score on Test set: 0.7273872494697571

We add a second hidden layer of 8 nodes, but the accuracy score does not improve.

#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="relu"))

# #hidden layer
NN.add(layers.Dense(8, activation="relu"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7725886702537537
Accuracy score on Test set: 0.7399822473526001

We look at a single hidden layer of 8 nodes instead of 16, the accuracy score is not better.

3. Choosing optimiser

#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="relu"))

#hidden layer
NN.add(layers.Dense(16, activation="relu"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.SGD(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7678698301315308
Accuracy score on Test set: 0.7461864948272705

Using a different optimiser, SGD, does not improve our model, we will stick the Adam optimiser.

4. Choosing regularization

#sequential model
NN = keras.Sequential()

#set regularization
regularizer = keras.regularizers.l2(0.01)

#input layer
NN.add(layers.Dense(32, activation="relu", kernel_regularizer=regularizer))

#hidden layer
NN.add(layers.Dense(16, activation="relu", kernel_regularizer=regularizer))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7316510081291199
Accuracy score on Test set: 0.6628259420394897
#sequential model
NN = keras.Sequential()

#set regularization
regularizer = keras.regularizers.l1(0.01)

#input layer
NN.add(layers.Dense(32, activation="relu", kernel_regularizer=regularizer))

#hidden layer
NN.add(layers.Dense(16, activation="relu", kernel_regularizer=regularizer))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.646124005317688
Accuracy score on Test set: 0.6440266966819763

Adding any kind of regulariser (l1-Lasso or l2-Ridge) hurts our neural network heavily. Regularization attempts to prevent overfitting in a model, here it does not do a good job.

5. Choosing dropout rate

#sequential model
NN = keras.Sequential()


#input layer
NN.add(layers.Dense(32, activation="relu"))

#hidden layer
NN.add(layers.Dense(16, activation="relu"))
NN.add(layers.Dropout(0.2))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7600075602531433
Accuracy score on Test set: 0.7217894196510315

Adding a dropout rate of 20% for our hidden layer means that 80% of nodes are taken into account when running the neural network at every epoch. Using this technique does improve our accuracy scores.

6. Choosing batch normalization

#sequential model
NN = keras.Sequential()

#INPUT layer
NN.add(layers.Dense(32, activation="relu"))

#hidden layer
NN.add(layers.Dense(16, activation="relu"))
NN.add(layers.BatchNormalization())

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7657862901687622
Accuracy score on Test set: 0.7311190962791443

Applying batch normalization on our hidden layer also does not improve our scores.

7. Choosing activation function

#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="relu"))

#hidden layer
NN.add(layers.Dense(16, activation="sigmoid"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7798120975494385
Accuracy score on Test set: 0.7426412105560303
#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="sigmoid"))

#hidden layer
NN.add(layers.Dense(16, activation="sigmoid"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7718336582183838
Accuracy score on Test set: 0.7479591369628906
#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="relu"))

#hidden layer
NN.add(layers.Dense(16, activation="relu"))

#output layer
NN.add(layers.Dense(1, activation="relu"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7488929033279419
Accuracy score on Test set: 0.6999580264091492
#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="sigmoid"))

#hidden layer
NN.add(layers.Dense(16, activation="relu"))

#output layer
NN.add(layers.Dense(1, activation="relu"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7412629127502441
Accuracy score on Test set: 0.7148854732513428
#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="sigmoid"))

#hidden layer
NN.add(layers.Dense(16, activation="sigmoid"))

#output layer
NN.add(layers.Dense(1, activation="relu"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.5
Accuracy score on Test set: 0.19172458350658417

We test different combinations of activation functions for our input, hidden and output layers. We see that the best combinations is using a sigmoid function for all three of these layers. The accuracy score for that model is not terrific, but better than anything else we've gotten so far.

8. Trying more epochs

#sequential model
NN = keras.Sequential()

#input layer
NN.add(layers.Dense(32, activation="sigmoid"))

#hidden layer
NN.add(layers.Dense(16, activation="sigmoid"))

#output layer
NN.add(layers.Dense(1, activation="sigmoid"))

#compile
NN.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#fit on training set
results = NN.fit(X_train_res_scaled, y_train_res, epochs=200, verbose=0)

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Accuracy score on Train set: 0.7868831157684326
Accuracy score on Test set: 0.7378830909729004

Optimal Model

NN_opt = keras.Sequential()

#input layer
NN_opt.add(layers.Dense(32, activation="sigmoid"))

#hidden layer
NN_opt.add(layers.Dense(16, activation="sigmoid"))

#output layer
NN_opt.add(layers.Dense(1, activation="sigmoid"))

#compile
NN_opt.compile(optimizer=keras.optimizers.Adam(),  
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[keras.metrics.BinaryAccuracy()])

#time process
start=time.time()

#fit on training set
results = NN_opt.fit(X_train_res_scaled, y_train_res, epochs=50, verbose=0, validation_data=(X_test, y_test))

end=time.time()

time_fit_opt=end-start

#get scores from neural network
train_accuracy = results.history["binary_accuracy"][-1]
result = NN_opt.evaluate(X_test_scaled, y_test, verbose=0)

print(f"Time to fit the model on the training set is {round(time_fit_opt,3)} seconds")
print(f"Accuracy score on Train set: {train_accuracy}")
print(f"Accuracy score on Test set: {result[1]}")
Time to fit the model on the training set is 201.178 seconds
Accuracy score on Train set: 0.7708027958869934
Accuracy score on Test set: 0.7360638380050659

We re-run our optimal model, it does not achieve the same accuracy score as before as the neural networks starts with random weight every time.

Analysis

plt.figure()
plt.plot(results.epoch, results.history['binary_accuracy'])
plt.plot(results.epoch, results.history['val_binary_accuracy'])
plt.title('Binary Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'])
plt.grid()
plt.show()

The highest accuracy is achieved at around the 12th epoch. It seems like our dataset and relationships are not complex enough to be needing that many epochs to achieve a high accuracy score.

plt.figure()
plt.plot(results.epoch, results.history['loss'])
plt.plot(results.epoch, results.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'])
plt.grid()
plt.show()

Similarly the loss is minimised around the 12th epoch.

predictions_train_proba=NN_opt.predict(X_train_res_scaled)

#convert to class
predictions_train=np.where(predictions_train_proba>0.5, 1, 0)
4305/4305 [==============================] - 4s 749us/step
fpr_train_opt, tpr_train_opt, thresholds_roc_train_opt = roc_curve(y_train_res, predictions_train_proba)

#getting precision/recall scores
precision_train_opt_plot, recall_train_opt_plot, thresholds_pr_train_opt = precision_recall_curve(y_train_res, predictions_train_proba)

# storing values
roc_auc_train_opt = auc(fpr_train_opt, tpr_train_opt)
pr_auc_train_opt = auc(recall_train_opt_plot, precision_train_opt_plot)

# seeing model results
print(f'ROC AUC: {roc_auc_train_opt}')
print(f'PR AUC: {pr_auc_train_opt}')

print(classification_report(y_train_res, predictions_train))

#print confusion matrix
cf_matrix=confusion_matrix(y_train_res, predictions_train)

group_names = ['True Neg','False Pos','False Neg','True Pos']

group_counts = ["{0:0.0f}".format(value) for value in
            cf_matrix.flatten()]

group_percentages = ["{0:.2%}".format(value) for value in
                    cf_matrix.flatten()/np.sum(cf_matrix)]

labels = [f"{v1}\n{v2}\n{v3}" for v1, v2, v3 in
        zip(group_names,group_counts,group_percentages)]

labels = np.asarray(labels).reshape(2,2)

ax = sns.heatmap(cf_matrix, annot=labels, fmt='', cmap='Greens')

ax.set_xlabel('\nPredicted Values')
ax.set_ylabel('Actual Values ');

ax.xaxis.set_ticklabels(['Not functioning','Functioning'])
ax.yaxis.set_ticklabels(['Not functioning','Functioning'])

plt.show()
ROC AUC: 0.8581960705709735
PR AUC: 0.8654983269292414
              precision    recall  f1-score   support

           0       0.77      0.77      0.77     68873
           1       0.77      0.77      0.77     68873

    accuracy                           0.77    137746
   macro avg       0.77      0.77      0.77    137746
weighted avg       0.77      0.77      0.77    137746

All of our accuracy metrics for our train set are just under the 80% mark.

start=time.time()

# prediction of our model on test set
predictions_test_proba=NN_opt.predict(X_test_scaled)

#convert to class
predictions_test=np.where(predictions_test_proba>0.5, 1, 0)

end=time.time()

time_predict_opt=end-start

print(f"Time to predict the model on the test set is {round(time_predict_opt,3)} seconds")
670/670 [==============================] - 1s 1ms/step
Time to predict the model on the test set is 0.936 seconds

We can see that the time it takes to predict a class is relatively long compared to other models. We also need to add an additional step as Keras does not enable us to directly predict the class of an observation.

fpr_test_opt, tpr_test_opt, thresholds_roc_test_opt = roc_curve(y_test, predictions_test_proba)

#getting precision/recall scores
precision_test_opt_plot, recall_test_opt_plot, thresholds_pr_test_opt = precision_recall_curve(y_test, predictions_test_proba)

# storing values
roc_auc_test_opt = auc(fpr_test_opt, tpr_test_opt)
pr_auc_test_opt = auc(recall_test_opt_plot, precision_test_opt_plot)

# seeing model results
print(f'ROC AUC: {roc_auc_test_opt}')
print(f'PR AUC: {pr_auc_test_opt}')

print(classification_report(y_test, predictions_test))

#print confusion matrix
cf_matrix=confusion_matrix(y_test, predictions_test)

group_names = ['True Neg','False Pos','False Neg','True Pos']

group_counts = ["{0:0.0f}".format(value) for value in
            cf_matrix.flatten()]

group_percentages = ["{0:.2%}".format(value) for value in
                    cf_matrix.flatten()/np.sum(cf_matrix)]

labels = [f"{v1}\n{v2}\n{v3}" for v1, v2, v3 in
        zip(group_names,group_counts,group_percentages)]

labels = np.asarray(labels).reshape(2,2)

ax = sns.heatmap(cf_matrix, annot=labels, fmt='', cmap='Greens')

ax.set_xlabel('\nPredicted Values')
ax.set_ylabel('Actual Values ');

ax.xaxis.set_ticklabels(['Not functioning','Functioning'])
ax.yaxis.set_ticklabels(['Not functioning','Functioning'])

plt.show()
ROC AUC: 0.7834369436221573
PR AUC: 0.9328626720545525
              precision    recall  f1-score   support

           0       0.39      0.66      0.49      4110
           1       0.90      0.76      0.82     17327

    accuracy                           0.74     21437
   macro avg       0.65      0.71      0.66     21437
weighted avg       0.80      0.74      0.76     21437

The model performs relatively well in the recall for non-functioning water points. It is not "missing" as many of them as other models. As expected, the model performs very well for functioning water points, 90% of its functioning labels are correct.

Comparing results

plt.plot(figsize=(10,15))
plt.plot([0,1], [0,1], color='black', linestyle='--')
plt.title('Receiver Operating Characteristic (ROC) Curve - NN')
plt.plot(fpr_train_opt, tpr_train_opt, color='blueviolet', lw=2,
    label='Train AUC = %0.2f' % roc_auc_train_opt)
plt.plot(fpr_test_opt, tpr_test_opt, color='crimson', lw=2,
    label='Test AUC = %0.2f' % roc_auc_test_opt)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="best")
plt.tight_layout()
plt.grid()

The test set has a smaller, as expected, AUC compared to the train set, as it is an unseen dataset.

Similarly to our KNN model, we do not visualise the feature importance of our optimal neural network model. The reason is that we need to use SHAP, and it is computationally extremely expensive (it took nearly 10min for one run, it is recommended to run 50-100). We will consider using SHAP later on, when comparing models, if the neural network or KNN model is the best performing one. In this case, it might be worth using SHAP.

Image(dictionary_filepath+"6-Hypotheses.png")

Exporting

joblib.dump(NN_opt, model_filepath+'neural_network_model.sav')
INFO:tensorflow:Assets written to: ram://45c52fa5-5c0f-4151-8e9d-0ee96db1ecc4/assets
['/Users/thomasadler/Desktop/futuristic-platipus/models/neural_network_model.sav']
d = {'Model':['Neural Network'], 'Parameters':['Hidden layer=16 nodes, Optimizer=Adam, Loss function=BinaryCrossentropy, Metric=BinaryAccuracy'],\
     'Accuracy Train': None,\
    'Precision Train': None, 'Recall Train': None, 'F1 Train': None, 'ROC AUC Train':[roc_auc_train_opt],\
        'Accuracy Test': None, 'Precision Test': None, 'Recall Test': None, 'F1 Test': None,\
            'ROC AUC Test':[roc_auc_test_opt], 'Time Fit': time_fit_opt,\
                'Time Predict': time_predict_opt, "Precision Non-functioning Test":0.42, "Recall Non-functioning Test":0.63,\
                                "F1 Non-functioning Test":0.50, "Precision Functioning Test":0.90, "Recall Functioning Test":0.78,"Functioning Test":0.84}

#to dataframe
best_model_result_df=pd.DataFrame(data=d)

#check
best_model_result_df
Model Parameters Accuracy Train Precision Train Recall Train F1 Train ROC AUC Train Accuracy Test Precision Test Recall Test F1 Test ROC AUC Test Time Fit Time Predict Precision Non-functioning Test Recall Non-functioning Test F1 Non-functioning Test Precision Functioning Test Recall Functioning Test Functioning Test
0 Neural Network Hidden layer=16 nodes, Optimizer=Adam, Loss fu... None None None None 0.858196 None None None None 0.783437 201.177973 0.935681 0.42 0.63 0.5 0.9 0.78 0.84
best_model_result_df.to_csv(model_filepath + 'neural_network_model.csv')
metrics=[fpr_train_opt, tpr_train_opt, fpr_test_opt, tpr_test_opt]
metrics_name=['fpr_train_opt', 'tpr_train_opt', 'fpr_test_opt', 'tpr_test_opt']

#save numpy arrays for model comparison
for metric, metric_name in zip(metrics, metrics_name):
    np.save(model_filepath+f'neural_network_{metric_name}', metric)