Tune In: Determination Threshold Optimization with scikit-learn’s TunedThresholdClassifierCV | by Kevin Arvai

Machine Learning

Tune In: Determination Threshold Optimization with scikit-learn’s TunedThresholdClassifierCV | by Kevin Arvai | Might, 2024

hhhhm

2024年5月28日

Tune In: Determination Threshold Optimization with scikit-learn’s TunedThresholdClassifierCV | by Kevin Arvai | Might, 2024

[ad_1]

Use instances and code to discover the brand new class that helps tune choice thresholds in scikit-learn

The 1.5 launch of scikit-learn features a new class, TunedThresholdClassifierCV, making optimizing choice thresholds from scikit-learn classifiers simpler. A choice threshold is a cut-off level that converts predicted chances output by a machine studying mannequin into discrete lessons. The default choice threshold of the .predict() technique from scikit-learn classifiers in a binary classification setting is 0.5. Though this can be a smart default, it’s hardly ever your best option for classification duties.

This put up introduces the TunedThresholdClassifierCV class and demonstrates the way it can optimize choice thresholds for numerous binary classification duties. This new class will assist bridge the hole between information scientists who construct fashions and enterprise stakeholders who make choices primarily based on the mannequin’s output. By fine-tuning the choice thresholds, information scientists can improve mannequin efficiency and higher align with enterprise targets.

This put up will cowl the next conditions the place tuning choice thresholds is helpful:

Maximizing a metric: Use this when selecting a threshold that maximizes a scoring metric, just like the F1 rating.
Value-sensitive studying: Alter the brink when the price of misclassifying a false constructive isn’t equal to the price of misclassifying a false damaging, and you’ve got an estimate of the prices.
Tuning beneath constraints: Optimize the working level on the ROC or precision-recall curve to satisfy particular efficiency constraints.

The code used on this put up and hyperlinks to datasets can be found on GitHub.

Let’s get began! First, import the required libraries, learn the information, and break up coaching and take a look at information.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_selector as selector
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
RocCurveDisplay,
f1_score,
make_scorer,
recall_score,
roc_curve,
confusion_matrix,
)
from sklearn.model_selection import TunedThresholdClassifierCV, train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScalerRANDOM_STATE = 26120

Maximizing a metric

Earlier than beginning the model-building course of in any machine studying undertaking, it’s essential to work with stakeholders to find out which metric(s) to optimize. Making this choice early ensures that the undertaking aligns with its meant objectives.

Utilizing an accuracy metric in fraud detection use instances to judge mannequin efficiency isn’t very best as a result of the information is usually imbalanced, with most transactions being non-fraudulent. The F1 rating is the harmonic imply of precision and recall and is a greater metric for imbalanced datasets like fraud detection. Let’s use the TunedThresholdClassifierCV class to optimize the choice threshold of a logistic regression mannequin to maximise the F1 rating.

We’ll use the Kaggle Credit score Card Fraud Detection dataset to introduce the primary scenario the place we have to tune a call threshold. First, break up the information into practice and take a look at units, then create a scikit-learn pipeline to scale the information and practice a logistic regression mannequin. Match the pipeline on the coaching information so we will evaluate the unique mannequin efficiency with the tuned mannequin efficiency.

creditcard = pd.read_csv("information/creditcard.csv")
y = creditcard["Class"]
X = creditcard.drop(columns=["Class"])X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=RANDOM_STATE, stratify=y
)
# Solely Time and Quantity should be scaled
original_fraud_model = make_pipeline(
ColumnTransformer(
[("scaler", StandardScaler(), ["Time", "Amount"])],
the rest="passthrough",
force_int_remainder_cols=False,
),
LogisticRegression(),
)
original_fraud_model.match(X_train, y_train)

No tuning has occurred but, however it’s coming within the subsequent code block. The arguments for TunedThresholdClassifierCV are much like different CV lessons in scikit-learn, comparable to GridSearchCV. At a minimal, the consumer solely must move the unique estimator and TunedThresholdClassifierCV will retailer the choice threshold that maximizes balanced accuracy (default) utilizing 5-fold stratified Ok-fold cross-validation (default). It additionally makes use of this threshold when calling .predict(). Nevertheless, any scikit-learn metric (or callable) can be utilized because the scoring metric. Moreover, the consumer can move the acquainted cv argument to customise the cross-validation technique.

Create the TunedThresholdClassifierCV occasion and match the mannequin on the coaching information. Go the unique mannequin and set the scoring to be “f1”. We’ll additionally wish to set store_cv_results=True to entry the thresholds evaluated throughout cross-validation for visualization.

tuned_fraud_model = TunedThresholdClassifierCV(
original_fraud_model,
scoring="f1",
store_cv_results=True,
)tuned_fraud_model.match(X_train, y_train)
# common F1 throughout folds
avg_f1_train = tuned_fraud_model.best_score_
# Evaluate F1 within the take a look at set for the tuned mannequin and the unique mannequin
f1_test = f1_score(y_test, tuned_fraud_model.predict(X_test))
f1_test_original = f1_score(y_test, original_fraud_model.predict(X_test))
print(f"Common F1 on the coaching set: {avg_f1_train:.3f}")
print(f"F1 on the take a look at set: {f1_test:.3f}")
print(f"F1 on the take a look at set (authentic mannequin): {f1_test_original:.3f}")
print(f"Threshold: {tuned_fraud_model.best_threshold_: .3f}")

Common F1 on the coaching set: 0.784
F1 on the take a look at set: 0.796
F1 on the take a look at set (authentic mannequin): 0.733
Threshold:  0.071

Now that we’ve discovered the brink that maximizes the F1 rating verify tuned_fraud_model.best_score_ to seek out out what one of the best common F1 rating was throughout folds in cross-validation. We are able to additionally see which threshold generated these outcomes utilizing tuned_fraud_model.best_threshold_. You possibly can visualize the metric scores throughout the choice thresholds throughout cross-validation utilizing the objective_scores_ and decision_thresholds_ attributes:

fig, ax = plt.subplots(figsize=(5, 5))
ax.plot(
tuned_fraud_model.cv_results_["thresholds"],
tuned_fraud_model.cv_results_["scores"],
marker="o",
linewidth=1e-3,
markersize=4,
shade="#c0c0c0",
)
ax.plot(
tuned_fraud_model.best_threshold_,
tuned_fraud_model.best_score_,
"^",
markersize=10,
shade="#ff6700",
label=f"Optimum cut-off level = {tuned_fraud_model.best_threshold_:.2f}",
)
ax.plot(
0.5,
f1_test_original,
label="Default threshold: 0.5",
shade="#004e98",
linestyle="--",
marker="X",
markersize=10,
)
ax.legend(fontsize=8, loc="decrease middle")
ax.set_xlabel("Determination threshold", fontsize=10)
ax.set_ylabel("F1 rating", fontsize=10)
ax.set_title("F1 rating vs. Determination threshold -- Cross-validation", fontsize=12)

# Verify that the coefficients from the unique mannequin and the tuned mannequin are the identical
assert (tuned_fraud_model.estimator_[-1].coef_ ==
original_fraud_model[-1].coef_).all()

We’ve used the identical underlying logistic regression mannequin to judge two totally different choice thresholds. The underlying fashions are the identical, evidenced by the coefficient equality within the assert assertion above. Optimization in TunedThresholdClassifierCV is achieved utilizing post-processing strategies, that are utilized on to the expected chances output by the mannequin. Nevertheless, it is vital to notice that TunedThresholdClassifierCV makes use of cross-validation by default to seek out the choice threshold to keep away from overfitting to the coaching information.

Value-sensitive studying

Value-sensitive studying is a sort of machine studying that assigns a value to every sort of misclassification. This interprets mannequin efficiency into models that stakeholders perceive, like {dollars} saved.

We’ll use the TELCO buyer churn dataset, a binary classification dataset, to display the worth of cost-sensitive studying. The objective is to foretell whether or not a buyer will churn or not, given options in regards to the buyer’s demographics, contract particulars, and different technical details about the shopper’s account. The motivation to make use of this dataset (and among the code) is from Dan Becker’s course on choice threshold optimization.

information = pd.read_excel("information/Telco_customer_churn.xlsx")
drop_cols = [
"Count", "Country", "State", "Lat Long", "Latitude", "Longitude",
"Zip Code", "Churn Value", "Churn Score", "CLTV", "Churn Reason"
]
information.drop(columns=drop_cols, inplace=True)# Preprocess the information
information["Churn Label"] = information["Churn Label"].map({"Sure": 1, "No": 0})
information.drop(columns=["Total Charges"], inplace=True)
X_train, X_test, y_train, y_test = train_test_split(
information.drop(columns=["Churn Label"]),
information["Churn Label"],
test_size=0.2,
random_state=RANDOM_STATE,
stratify=information["Churn Label"],
)

Arrange a fundamental pipeline for processing the information and producing predicted chances with a random forest mannequin. This can function a baseline to check to the TunedThresholdClassifierCV.

preprocessor = ColumnTransformer(
transformers=[("one_hot", OneHotEncoder(),
selector(dtype_include="object"))],
the rest="passthrough",
)original_churn_model = make_pipeline(
preprocessor, RandomForestClassifier(random_state=RANDOM_STATE)
)
original_churn_model.match(X_train.drop(columns=["customerID"]), y_train);

The selection of preprocessing and mannequin sort isn’t vital for this tutorial. The corporate desires to supply reductions to prospects who’re predicted to churn. Throughout collaboration with stakeholders, you study that giving a reduction to a buyer who is not going to churn (a false constructive) would price $80. You additionally study that it’s value $200 to supply a reduction to a buyer who would have churned. You possibly can characterize this relationship in a value matrix:

def cost_function(y, y_pred, neg_label, pos_label):
cm = confusion_matrix(y, y_pred, labels=[neg_label, pos_label])
cost_matrix = np.array([[0, -80], [0, 200]])
return np.sum(cm * cost_matrix)cost_scorer = make_scorer(cost_function, neg_label=0, pos_label=1)

We additionally wrapped the associated fee perform in a scikit-learn customized scorer. This scorer shall be used because the scoring argument within the TunedThresholdClassifierCV and to judge revenue on the take a look at set.

tuned_churn_model = TunedThresholdClassifierCV(
original_churn_model,
scoring=cost_scorer,
store_cv_results=True,
)tuned_churn_model.match(X_train.drop(columns=["CustomerID"]), y_train)
# Calculate the revenue on the take a look at set
original_model_profit = cost_scorer(
original_churn_model, X_test.drop(columns=["CustomerID"]), y_test
)
tuned_model_profit = cost_scorer(
tuned_churn_model, X_test.drop(columns=["CustomerID"]), y_test
)
print(f"Authentic mannequin revenue: {original_model_profit}")
print(f"Tuned mannequin revenue: {tuned_model_profit}")

Authentic mannequin revenue: 29640
Tuned mannequin revenue: 35600

The revenue is increased within the tuned mannequin in comparison with the unique. Once more, we will plot the target metric towards the choice thresholds to visualise the choice threshold choice on coaching information throughout cross-validation:

fig, ax = plt.subplots(figsize=(5, 5))
ax.plot(
tuned_churn_model.cv_results_["thresholds"],
tuned_churn_model.cv_results_["scores"],
marker="o",
markersize=3,
linewidth=1e-3,
shade="#c0c0c0",
label="Goal rating (utilizing cost-matrix)",
)
ax.plot(
tuned_churn_model.best_threshold_,
tuned_churn_model.best_score_,
"^",
markersize=10,
shade="#ff6700",
label="Optimum cut-off level for the enterprise metric",
)
ax.legend()
ax.set_xlabel("Determination threshold (likelihood)")
ax.set_ylabel("Goal rating (utilizing cost-matrix)")
ax.set_title("Goal rating as a perform of the choice threshold")

In actuality, assigning a static price to all cases which are misclassified in the identical means isn’t life like from a enterprise perspective. There are extra superior strategies to tune the brink by assigning a weight to every occasion within the dataset. That is coated in scikit-learn’s cost-sensitive studying instance.

Tuning beneath constraints

This technique isn’t coated within the scikit-learn documentation at the moment, however is a typical enterprise case for binary classification use instances. The tuning beneath constraint technique finds a call threshold by figuring out some extent on both the ROC or precision-recall curves. The purpose on the curve is the utmost worth of 1 axis whereas constraining the opposite axis. For this walkthrough, we’ll be utilizing the Pima Indians diabetes dataset. This can be a binary classification process to foretell if a person has diabetes.

Think about that your mannequin shall be used as a screening take a look at for an average-risk inhabitants utilized to thousands and thousands of individuals. There are an estimated 38 million folks with diabetes within the US. That is roughly 11.6% of the inhabitants, so the mannequin’s specificity ought to be excessive so it doesn’t misdiagnose thousands and thousands of individuals with diabetes and refer them to pointless confirmatory testing. Suppose your imaginary CEO has communicated that they won’t tolerate greater than a 2% false constructive fee. Let’s construct a mannequin that achieves this utilizing TunedThresholdClassifierCV.

For this a part of the tutorial, we’ll outline a constraint perform that shall be used to seek out the utmost true constructive fee at a 2% false constructive fee.

def max_tpr_at_tnr_constraint_score(y_true, y_pred, max_tnr=0.5):
fpr, tpr, thresholds = roc_curve(y_true, y_pred, drop_intermediate=False)
tnr = 1 - fpr
tpr_at_tnr_constraint = tpr[tnr >= max_tnr].max()
return tpr_at_tnr_constraintmax_tpr_at_tnr_scorer = make_scorer(
max_tpr_at_tnr_constraint_score, max_tnr=0.98)
information = pd.read_csv("information/diabetes.csv")
X_train, X_test, y_train, y_test = train_test_split(
information.drop(columns=["Outcome"]),
information["Outcome"],
stratify=information["Outcome"],
test_size=0.2,
random_state=RANDOM_STATE,
)

Construct two fashions, one logistic regression to function a baseline mannequin and the opposite, TunedThresholdClassifierCV which can wrap the baseline logistic regression mannequin to attain the objective outlined by the CEO. Within the tuned mannequin, set scoring=max_tpr_at_tnr_scorer. Once more, the selection of mannequin and preprocessing isn’t vital for this tutorial.

# A baseline mannequin
original_model = make_pipeline(
StandardScaler(), LogisticRegression(random_state=RANDOM_STATE)
)
original_model.match(X_train, y_train)# A tuned mannequin
tuned_model = TunedThresholdClassifierCV(
original_model,
thresholds=np.linspace(0, 1, 150),
scoring=max_tpr_at_tnr_scorer,
store_cv_results=True,
cv=8,
random_state=RANDOM_STATE,
)
tuned_model.match(X_train, y_train)

Evaluate the distinction between the default choice threshold from scikit-learn estimators, 0.5, and one discovered utilizing the tuning beneath constraint strategy on the ROC curve.

# Get the fpr and tpr of the unique mannequin
original_model_proba = original_model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, original_model_proba)
closest_threshold_to_05 = (np.abs(thresholds - 0.5)).argmin()
fpr_orig = fpr[closest_threshold_to_05]
tpr_orig = tpr[closest_threshold_to_05]# Get the tnr and tpr of the tuned mannequin
max_tpr = tuned_model.best_score_
constrained_tnr = 0.98
# Plot the ROC curve and evaluate the default threshold to the tuned threshold
fig, ax = plt.subplots(figsize=(5, 5))
# Notice that this would be the identical for each fashions
disp = RocCurveDisplay.from_estimator(
original_model,
X_test,
y_test,
identify="Logistic Regression",
shade="#c0c0c0",
linewidth=2,
ax=ax,
)
disp.ax_.plot(
1 - constrained_tnr,
max_tpr,
label=f"Tuned threshold: {tuned_model.best_threshold_:.2f}",
shade="#ff6700",
linestyle="--",
marker="o",
markersize=11,
)
disp.ax_.plot(
fpr_orig,
tpr_orig,
label="Default threshold: 0.5",
shade="#004e98",
linestyle="--",
marker="X",
markersize=11,
)
disp.ax_.set_ylabel("True Optimistic Price", fontsize=8)
disp.ax_.set_xlabel("False Optimistic Price", fontsize=8)
disp.ax_.tick_params(labelsize=8)
disp.ax_.legend(fontsize=7)

The tuned beneath constraint technique discovered a threshold of 0.80, which resulted in a median sensitivity of 19.2% throughout cross-validation of the coaching information. Evaluate the sensitivity and specificity to see how the brink holds up within the take a look at set. Did the mannequin meet the CEO’s specificity requirement within the take a look at set?

# Common sensitivity and specificity on the coaching set
avg_sensitivity_train = tuned_model.best_score_# Name predict from tuned_model to calculate sensitivity and specificity on the take a look at set
specificity_test = recall_score(
y_test, tuned_model.predict(X_test), pos_label=0)
sensitivity_test = recall_score(y_test, tuned_model.predict(X_test))
print(f"Common sensitivity on the coaching set: {avg_sensitivity_train:.3f}")
print(f"Sensitivity on the take a look at set: {sensitivity_test:.3f}")
print(f"Specificity on the take a look at set: {specificity_test:.3f}")

Common sensitivity on the coaching set: 0.192
Sensitivity on the take a look at set: 0.148
Specificity on the take a look at set: 0.990

Conclusion

The brand new TunedThresholdClassifierCV class is a robust software that may provide help to change into a greater information scientist by sharing with enterprise leaders the way you arrived at a call threshold. You discovered the right way to use the brand new scikit-learn TunedThresholdClassifierCV class to maximise a metric, carry out cost-sensitive studying, and tune a metric beneath constraint. This tutorial was not meant to be complete or superior. I needed to introduce the brand new function and spotlight its energy and suppleness in fixing binary classification issues. Please try the scikit-learn documentation, consumer information, and examples for thorough utilization examples.

An enormous shoutout to Guillaume Lemaitre for his work on this function.

Thanks for studying. Blissful tuning.

Information Licenses:
Bank card fraud: DbCL
Pima Indians diabetes: CC0
TELCO churn: business use OK

[ad_2]