[ad_1]
Overview
On this submit we are going to prepare an autoencoder to detect bank card fraud. We will even show prepare Keras fashions within the cloud utilizing CloudML.
The idea of our mannequin would be the Kaggle Credit score Card Fraud Detection dataset, which was collected throughout a analysis collaboration of Worldline and the Machine Studying Group of ULB (Université Libre de Bruxelles) on large information mining and fraud detection.
The dataset accommodates bank card transactions by European cardholders revamped a two day interval in September 2013. There are 492 frauds out of 284,807 transactions. The dataset is extremely unbalanced, the optimistic class (frauds) account for under 0.172% of all transactions.
Studying the information
After downloading the information from Kaggle, you possibly can learn it in to R with read_csv()
:
The enter variables encompass solely numerical values that are the results of a PCA transformation. With a purpose to protect confidentiality, no extra details about the unique options was offered. The options V1, …, V28 have been obtained with PCA. There are nonetheless 2 options (Time and Quantity) that weren’t reworked.
Time is the seconds elapsed between every transaction and the primary transaction within the dataset. Quantity is the transaction quantity and might be used for cost-sensitive studying. The Class variable takes worth 1 in case of fraud and 0 in any other case.
Autoencoders
Since solely 0.172% of the observations are frauds, we’ve a extremely unbalanced classification downside. With this sort of downside, conventional classification approaches normally don’t work very properly as a result of we’ve solely a really small pattern of the rarer class.
An autoencoder is a neural community that’s used to be taught a illustration (encoding) for a set of information, sometimes for the aim of dimensionality discount. For this downside we are going to prepare an autoencoder to encode non-fraud observations from our coaching set. Since frauds are presupposed to have a unique distribution then regular transactions, we anticipate that our autoencoder can have increased reconstruction errors on frauds then on regular transactions. Because of this we are able to use the reconstruction error as a amount that signifies if a transaction is fraudulent or not.
If you wish to be taught extra about autoencoders, a superb start line is that this video from Larochelle on YouTube and Chapter 14 from the Deep Studying e book by Goodfellow et al.
Visualization
For an autoencoder to work properly we’ve a powerful preliminary assumption: that the distribution of variables for regular transactions is totally different from the distribution for fraudulent ones. Let’s make some plots to confirm this. Variables have been reworked to a [0,1]
interval for plotting.
We are able to see that distributions of variables for fraudulent transactions are very totally different then from regular ones, aside from the Time variable, which appears to have the very same distribution.
Preprocessing
Earlier than the modeling steps we have to do some preprocessing. We’ll cut up the dataset into prepare and check units after which we are going to Min-max normalize our information (that is finished as a result of neural networks work a lot better with small enter values). We will even take away the Time variable because it has the very same distribution for regular and fraudulent transactions.
Primarily based on the Time variable we are going to use the primary 200,000 observations for coaching and the remainder for testing. That is good follow as a result of when utilizing the mannequin we need to predict future frauds based mostly on transactions that occurred earlier than.
Now let’s work on normalization of inputs. We created 2 features to assist us. The primary one will get descriptive statistics in regards to the dataset which can be used for scaling. Then we’ve a operate to carry out the min-max scaling. It’s vital to notice that we utilized the identical normalization constants for coaching and check units.
library(purrr)
#' Will get descriptive statistics for each variable within the dataset.
get_desc <- operate(x) {
map(x, ~checklist(
min = min(.x),
max = max(.x),
imply = imply(.x),
sd = sd(.x)
))
}
#' Given a dataset and normalization constants it's going to create a min-max normalized
#' model of the dataset.
normalization_minmax <- operate(x, desc) {
map2_dfc(x, desc, ~(.x - .y$min)/(.y$max - .y$min))
}
Now let’s create normalized variations of our datasets. We additionally reworked our information frames to matrices since that is the format anticipated by Keras.
We’ll now outline our mannequin in Keras, a symmetric autoencoder with 4 dense layers.
library(keras)
mannequin <- keras_model_sequential()
mannequin %>%
layer_dense(models = 15, activation = "tanh", input_shape = ncol(x_train)) %>%
layer_dense(models = 10, activation = "tanh") %>%
layer_dense(models = 15, activation = "tanh") %>%
layer_dense(models = ncol(x_train))
abstract(mannequin)
___________________________________________________________________________________
Layer (kind) Output Form Param #
===================================================================================
dense_1 (Dense) (None, 15) 450
___________________________________________________________________________________
dense_2 (Dense) (None, 10) 160
___________________________________________________________________________________
dense_3 (Dense) (None, 15) 165
___________________________________________________________________________________
dense_4 (Dense) (None, 29) 464
===================================================================================
Complete params: 1,239
Trainable params: 1,239
Non-trainable params: 0
___________________________________________________________________________________
We’ll then compile our mannequin, utilizing the imply squared error loss and the Adam optimizer for coaching.
mannequin %>% compile(
loss = "mean_squared_error",
optimizer = "adam"
)
Coaching the mannequin
We are able to now prepare our mannequin utilizing the match()
operate. Coaching the mannequin in all fairness quick (~ 14s per epoch on my laptop computer). We’ll solely feed to our mannequin the observations of regular (non-fraudulent) transactions.
We’ll use callback_model_checkpoint()
with a view to save our mannequin after every epoch. By passing the argument save_best_only = TRUE
we are going to carry on disk solely the epoch with smallest loss worth on the check set.
We will even use callback_early_stopping()
to cease coaching if the validation loss stops reducing for five epochs.
checkpoint <- callback_model_checkpoint(
filepath = "mannequin.hdf5",
save_best_only = TRUE,
interval = 1,
verbose = 1
)
early_stopping <- callback_early_stopping(persistence = 5)
mannequin %>% match(
x = x_train[y_train == 0,],
y = x_train[y_train == 0,],
epochs = 100,
batch_size = 32,
validation_data = checklist(x_test[y_test == 0,], x_test[y_test == 0,]),
callbacks = checklist(checkpoint, early_stopping)
)
Prepare on 199615 samples, validate on 84700 samples
Epoch 1/100
199615/199615 [==============================] - 17s 83us/step - loss: 0.0036 - val_loss: 6.8522e-04d from inf to 0.00069, saving mannequin to mannequin.hdf5
Epoch 2/100
199615/199615 [==============================] - 17s 86us/step - loss: 4.7817e-04 - val_loss: 4.7266e-04d from 0.00069 to 0.00047, saving mannequin to mannequin.hdf5
Epoch 3/100
199615/199615 [==============================] - 19s 94us/step - loss: 3.7753e-04 - val_loss: 4.2430e-04d from 0.00047 to 0.00042, saving mannequin to mannequin.hdf5
Epoch 4/100
199615/199615 [==============================] - 19s 94us/step - loss: 3.3937e-04 - val_loss: 4.0299e-04d from 0.00042 to 0.00040, saving mannequin to mannequin.hdf5
Epoch 5/100
199615/199615 [==============================] - 19s 94us/step - loss: 3.2259e-04 - val_loss: 4.0852e-04 enhance
Epoch 6/100
199615/199615 [==============================] - 18s 91us/step - loss: 3.1668e-04 - val_loss: 4.0746e-04 enhance
...
After coaching we are able to get the ultimate loss for the check set through the use of the consider()
fucntion.
loss <- consider(mannequin, x = x_test[y_test == 0,], y = x_test[y_test == 0,])
loss
loss
0.0003534254
Tuning with CloudML
We could possibly get higher outcomes by tuning our mannequin hyperparameters. We are able to tune, for instance, the normalization operate, the educational charge, the activation features and the dimensions of hidden layers. CloudML makes use of Bayesian optimization to tune hyperparameters of fashions as described in this weblog submit.
We are able to use the cloudml package deal to tune our mannequin, however first we have to put together our venture by making a coaching flag for every hyperparameter and a tuning.yml
file that may inform CloudML what parameters we need to tune and the way.
The complete script used for coaching on CloudML could be discovered at https://github.com/dfalbel/fraud-autoencoder-example. Crucial modifications to the code have been including the coaching flags:
FLAGS <- flags(
flag_string("normalization", "minmax", "One among minmax, zscore"),
flag_string("activation", "relu", "One among relu, selu, tanh, sigmoid"),
flag_numeric("learning_rate", 0.001, "Optimizer Studying Fee"),
flag_integer("hidden_size", 15, "The hidden layer dimension")
)
We then used the FLAGS
variable contained in the script to drive the hyperparameters of the mannequin, for instance:
mannequin %>% compile(
optimizer = optimizer_adam(lr = FLAGS$learning_rate),
loss = 'mean_squared_error',
)
We additionally created a tuning.yml
file describing how hyperparameters must be assorted throughout coaching, in addition to what metric we needed to optimize (on this case it was the validation loss: val_loss
).
tuning.yml
trainingInput:
scaleTier: CUSTOM
masterType: standard_gpu
hyperparameters:
aim: MINIMIZE
hyperparameterMetricTag: val_loss
maxTrials: 10
maxParallelTrials: 5
params:
- parameterName: normalization
kind: CATEGORICAL
categoricalValues: [zscore, minmax]
- parameterName: activation
kind: CATEGORICAL
categoricalValues: [relu, selu, tanh, sigmoid]
- parameterName: learning_rate
kind: DOUBLE
minValue: 0.000001
maxValue: 0.1
scaleType: UNIT_LOG_SCALE
- parameterName: hidden_size
kind: INTEGER
minValue: 5
maxValue: 50
scaleType: UNIT_LINEAR_SCALE
We describe the kind of machine we need to use (on this case a standard_gpu
occasion), the metric we need to reduce whereas tuning, and the the utmost variety of trials (i.e. variety of combos of hyperparameters we need to check). We then specify how we need to fluctuate every hyperparameter throughout tuning.
You may be taught extra in regards to the tuning.yml file on the Tensorflow for R documentation and at Google’s official documentation on CloudML.
Now we’re able to ship the job to Google CloudML. We are able to do that by working:
library(cloudml)
cloudml_train("prepare.R", config = "tuning.yml")
The cloudml package deal takes care of importing the dataset and putting in any R package deal dependencies required to run the script on CloudML. In case you are utilizing RStudio v1.1 or increased, it’s going to additionally permit you to monitor your job in a background terminal. You too can monitor your job utilizing the Google Cloud Console.
After the job is completed we are able to acquire the job outcomes with:
This may copy the recordsdata from the job with the very best val_loss
efficiency on CloudML to your native system and open a report summarizing the coaching run.
Since we used a callback to save lots of mannequin checkpoints throughout coaching, the mannequin file was additionally copied from Google CloudML. Information created throughout coaching are copied to the “runs” subdirectory of the working listing from which cloudml_train()
is named. You may decide this listing for the latest run with:
[1] runs/cloudml_2018_01_23_221244595-03
You too can checklist all earlier runs and their validation losses with:
ls_runs(order = metric_val_loss, reducing = FALSE)
run_dir metric_loss metric_val_loss
1 runs/2017-12-09T21-01-11Z 0.2577 0.1482
2 runs/2017-12-09T21-00-11Z 0.2655 0.1505
3 runs/2017-12-09T19-59-44Z 0.2597 0.1402
4 runs/2017-12-09T19-56-48Z 0.2610 0.1459
Use View(ls_runs()) to view all columns
In our case the job downloaded from CloudML was saved to runs/cloudml_2018_01_23_221244595-03/
, so the saved mannequin file is accessible at runs/cloudml_2018_01_23_221244595-03/mannequin.hdf5
. We are able to now use our tuned mannequin to make predictions.
Making predictions
Now that we skilled and tuned our mannequin we’re able to generate predictions with our autoencoder. We have an interest within the MSE for every statement and we anticipate that observations of fraudulent transactions can have increased MSE’s.
First, let’s load our mannequin.
mannequin <- load_model_hdf5("runs/cloudml_2018_01_23_221244595-03/mannequin.hdf5",
compile = FALSE)
Now let’s calculate the MSE for the coaching and check set observations.
A superb measure of mannequin efficiency in extremely unbalanced datasets is the Space Beneath the ROC Curve (AUC). AUC has a pleasant interpretation for this downside, it’s the likelihood {that a} fraudulent transaction can have increased MSE then a standard one. We are able to calculate this utilizing the Metrics package deal, which implements all kinds of frequent machine studying mannequin efficiency metrics.
[1] 0.9546814
[1] 0.9403554
To make use of the mannequin in follow for making predictions we have to discover a threshold (ok) for the MSE, then if if (MSE > ok) we take into account that transaction a fraud (in any other case we take into account it regular). To outline this worth it’s helpful to have a look at precision and recall whereas various the brink (ok).
possible_k <- seq(0, 0.5, size.out = 100)
precision <- sapply(possible_k, operate(ok) {
predicted_class <- as.numeric(mse_test > ok)
sum(predicted_class == 1 & y_test == 1)/sum(predicted_class)
})
qplot(possible_k, precision, geom = "line")
+ labs(x = "Threshold", y = "Precision")
recall <- sapply(possible_k, operate(ok) {
predicted_class <- as.numeric(mse_test > ok)
sum(predicted_class == 1 & y_test == 1)/sum(y_test)
})
qplot(possible_k, recall, geom = "line")
+ labs(x = "Threshold", y = "Recall")
A superb start line can be to decide on the brink with most precision however we might additionally base our determination on how a lot cash we’d lose from fraudulent transactions.
Suppose every guide verification of fraud prices us $1 but when we don’t confirm a transaction and it’s a fraud we are going to lose this transaction quantity. Let’s discover for every threshold worth how a lot cash we might lose.
cost_per_verification <- 1
lost_money <- sapply(possible_k, operate(ok) {
predicted_class <- as.numeric(mse_test > ok)
sum(cost_per_verification * predicted_class + (predicted_class == 0) * y_test * df_test$Quantity)
})
qplot(possible_k, lost_money, geom = "line") + labs(x = "Threshold", y = "Misplaced Cash")
We are able to discover the very best threshold on this case with:
[1] 0.005050505
If we wanted to manually confirm all frauds, it will value us ~$13,000. Utilizing our mannequin we are able to cut back this to ~$2,500.
[ad_2]