Utilizing Double Machine Studying and Linear Programming to optimise therapy methods | by Ryan O’Sullivan

Machine Learning

Utilizing Double Machine Studying and Linear Programming to optimise therapy methods | by Ryan O’Sullivan | Apr, 2024

hhhhm

2024年4月26日

Utilizing Double Machine Studying and Linear Programming to optimise therapy methods | by Ryan O’Sullivan | Apr, 2024

[ad_1]

Causal AI, exploring the mixing of causal reasoning into machine studying

Welcome to my sequence on Causal AI, the place we are going to discover the mixing of causal reasoning into machine studying fashions. Count on to discover various sensible functions throughout totally different enterprise contexts.

Within the final article we explored de-biasing therapy results with Double Machine Studying. This time we are going to delve additional into the potential of DML masking utilizing Double Machine Studying and Linear Programming to optimise therapy methods.

When you missed the final article on Double Machine Studying, test it out right here:

This text will showcase how Double Machine Studying and Linear Programming can be utilized optimise therapy methods:

Count on to achieve a broad understanding of:

Why companies wish to optimise therapy methods.
How conditional common therapy results (CATE) can assist personalise therapy methods (also called Uplift modelling).
How Linear Programming can be utilized to optimise therapy project given finances constraints.
A labored case research in Python illustrating how we are able to use Double Machine Studying to estimate CATE and Linear Programming to optimise therapy methods.

The complete pocket book may be discovered right here:

There’s a frequent query which arises in most companies: “What’s the optimum therapy for a buyer with the intention to maximise future gross sales while minimising value?”.

Let’s break this concept down with a easy instance.

Your online business sells socks on-line. You don’t promote a vital product, so you could encourage current prospects to repeat buy. Your fundamental lever for that is sending out reductions. So the therapy technique on this case is sending out reductions:

10% low cost
20% low cost
50% low cost

Every low cost has a unique return on funding. When you assume again to the final article on common therapy results, you possibly can in all probability see how we are able to calculate ATE for every of those reductions after which choose the one with the very best return.

Nonetheless, what if we have now heterogenous therapy results — The therapy impact varies throughout totally different subgroups of the inhabitants.

That is when we have to get thinking about conditional common therapy results (CATE)!

CATE

CATE is the common affect of a therapy or intervention on totally different subgroups of a inhabitants. ATE was very a lot about “does this therapy work?” whereas CATE permits us to vary the query to “who ought to we deal with?”.

We “situation” on our management options to permit therapy results to differ relying on buyer traits.

Suppose again to the instance the place we’re sending out reductions. If prospects with the next variety of earlier orders reply higher to reductions, we are able to situation on this buyer attribute.

It’s value mentioning that in Advertising and marketing, estimating CATE is sometimes called Uplift Modelling.

Estimating CATE with Double Machine Studying

We coated DML within the final article, however simply in case you want a little bit of a refresher:

“First stage:

Remedy mannequin (de-biasing): Machine studying mannequin used to estimate the chance of therapy project (sometimes called propensity rating). The therapy mannequin residuals are then calculated.
Final result mannequin (de-noising): Machine studying mannequin used to estimate the result utilizing simply the management options. The result mannequin residuals are then calculated.

Second stage:

The therapy mannequin residuals are used to foretell the result mannequin residuals.”

We will use Double Machine Studying to estimate CATE by interacting our management options (X) with the therapy impact within the second stage mannequin.

This may be actually highly effective as we are actually in a position to get buyer degree therapy results!

What’s it?

Linear programming is an optimisation methodology which can be utilized to search out the optimum resolution of a linear perform given some constraints. It’s typically used to resolve transportation, scheduling and useful resource allocation issues. A extra generic time period which you may see used is Operations Analysis.

Let’s break linear programming down with a easy instance:

Resolution variables: These are the unknown portions which we wish to estimate optimum values for — The advertising spend on Social Media, TV and Paid Search.
Aims perform: The linear equation we are attempting to minimise or maximise — The advertising Return on Funding (ROI).
Constraints: Some restrictions on the choice variables, normally represented by linear inequalities — Complete advertising spend between £100,000 and £500,000.

The intersection of all constraints kinds a possible area, which is the set of all doable options that fulfill the given constraints. The objective of linear programming is to search out the purpose inside the possible area that optimizes the target perform.

Project issues

Project issues are a selected sort of linear programming downside the place the objective is to assign a set of “duties” to a set of “brokers”. Lets use an instance to deliver it to life:

You run an experiment the place you ship totally different reductions out to 4 random teams of current prospects (the 4th of which truly you don’t ship any low cost). You construct 2 CATE fashions — (1) Estimating how the supply worth results the order worth and (2) Estimating how supply worth results the fee.

Brokers: Your current buyer base
Duties: Whether or not you ship them a ten%, 20% or 50% low cost
Resolution variables: Binary choice variable
Goal perform: The full order worth minus prices
Constraint 1: Every agent is assigned to at most 1 activity
Constraint 2: The price ≥ £10,000
Constraint 3: The price ≤ £100,000

We principally wish to discover out the optimum therapy for every buyer given some general value constraints. And linear programming can assist us do that!

It’s value noting that this downside is “NP onerous”, a classification of issues which might be at the very least as onerous as the toughest issues in NP (nondeterministic polynomial time).

Linear programming is a very tough however rewarding subject. I’ve tried to introduce the concept to get us began — If you wish to be taught extra I like to recommend this useful resource:

OR Instruments

OR instruments is an open supply package deal developed by Google which might remedy a variety of linear programming issues, together with project issues. We are going to exhibit it in motion later within the article.

Background

We’re going to proceed with the project downside instance and illustrate how we are able to remedy this in Python.

Information producing course of

We arrange an information producing course of with the next traits:

Tough nuisance parameters (b)
Remedy impact heterogeneity (tau)

The X options are buyer traits taken earlier than the therapy:

T is a binary flag indicating whether or not the shopper acquired the supply. We create three totally different therapy interactions to permit us to simulate totally different therapy results.

Person generated picture

def data_generator(tau_weight, interaction_num):# Set variety of observations
n=10000
# Set variety of options
p=10
# Create options
X = np.random.uniform(dimension=n * p).reshape((n, -1))
# Nuisance parameters
b = (
np.sin(np.pi * X[:, 0] * X[:, 1])
+ 2 * (X[:, 2] - 0.5) ** 2
+ X[:, 3]
+ 0.5 * X[:, 4]
+ X[:, 5] * X[:, 6]
+ X[:, 7] ** 3
+ np.sin(np.pi * X[:, 8] * X[:, 9])
)
# Create binary therapy
T = np.random.binomial(1, expit(b))
# therapy interactions
interaction_1 = X[:, 0] * X[:, 1] + X[:, 2]
interaction_2 = X[:, 3] * X[:, 4] + X[:, 5]
interaction_3 = X[:, 6] * X[:, 7] + X[:, 9]
# Set therapy impact
if interaction_num==1:
tau = tau_weight * interaction_1
elif interaction_num==2:
tau = tau_weight * interaction_2
elif interaction_num==3:
tau = tau_weight * interaction_3
# Calculate end result
y = b + T * tau + np.random.regular(dimension=n)
return X, T, tau, y

We will use the information generator to simulate three therapies, every with a unique therapy impact.

np.random.seed(123)# Generate samples for 3 totally different therapies
X1, T1, tau1, y1 = data_generator(0.75, 1)
X2, T2, tau2, y2 = data_generator(0.50, 2)
X3, T3, tau3, y3 = data_generator(0.90, 3)

As within the final article, the information producing course of python code is predicated on the artificial knowledge creator from Ubers Causal ML package deal:

Estimating CATE with DML

We then prepare three DML fashions utilizing LightGBM as versatile first stage fashions. This could permit us to seize the tough nuisance parameters while appropriately calculating the therapy impact.

Take note of how we cross the X options in by X somewhat than W (not like within the final article the place we handed the X options by W). Options handed by X can be utilized in each the primary and second stage fashions — Within the second stage mannequin the options are used to create interplay phrases with the therapy residual.

np.random.seed(123)# Practice DML mannequin utilizing versatile stage 1 fashions
dml1 = LinearDML(model_y=LGBMRegressor(), model_t=LGBMClassifier(), discrete_treatment=True)
dml1.match(y1, T=T1, X=X1, W=None)
# Practice DML mannequin utilizing versatile stage 1 fashions
dml2 = LinearDML(model_y=LGBMRegressor(), model_t=LGBMClassifier(), discrete_treatment=True)
dml2.match(y2, T=T2, X=X2, W=None)
# Practice DML mannequin utilizing versatile stage 1 fashions
dml3 = LinearDML(model_y=LGBMRegressor(), model_t=LGBMClassifier(), discrete_treatment=True)
dml3.match(y3, T=T3, X=X3, W=None)

After we plot the precise vs estimated CATE, we see that the mannequin does an affordable job.

# Create a determine and subplots
fig, axes = plt.subplots(1, 3, figsize=(15, 5))# Plot scatter plots on every subplot
sns.scatterplot(x=dml1.impact(X1), y=tau1, ax=axes[0])
axes[0].set_title('Remedy 1')
axes[0].set_xlabel('Estimated CATE')
axes[0].set_ylabel('Precise CATE')
sns.scatterplot(x=dml2.impact(X2), y=tau2, ax=axes[1])
axes[1].set_title('Remedy 2')
axes[1].set_xlabel('Estimated CATE')
axes[1].set_ylabel('Precise CATE')
sns.scatterplot(x=dml3.impact(X3), y=tau3, ax=axes[2])
axes[2].set_title('Remedy 3')
axes[2].set_xlabel('Estimated CATE')
axes[2].set_ylabel('Precise CATE')
# Add labels to your complete determine
fig.suptitle('Precise vs Estimated')
# Present plots
plt.present()

Naive optimisation

We are going to begin by exploring this as an optimisation downside. We have now a 3 therapies which a buyer might obtain. Beneath we create a mapping for the price of every therapy, and set an general value constraint.

# Create mapping for value of every therapy
cost_dict = {'T1': 0.1, 'T2': 0.2, 'T3': 0.3}# Set constraints
max_cost = 3000

We will then estimate the CATE for every buyer after which initially choose every prospects finest therapy. Nonetheless, choosing the right therapy doesn’t preserve us inside the most value constraint. Subsequently choose the purchasers with the very best CATE till we attain our max value constraint.

# Concatenate options
X = np.concatenate((X1, X2, X3), axis=0)# Estimate CATE for every therapy utilizing DML fashions
Treatment_1 = dml1.impact(X)
Treatment_2 = dml2.impact(X)
Treatment_3 = dml3.impact(X)
cate = pd.DataFrame({"T1": Treatment_1, "T2": Treatment_2, "T3": Treatment_3})
# Choose the most effective therapy for every buyer
best_treatment = cate.idxmax(axis=1)
best_value = cate.max(axis=1)
# Map value for every therapy
best_cost = pd.Sequence([cost_dict[value] for worth in best_treatment])
# Create dataframe with every prospects finest therapy and related value
best_df = pd.concat([best_value, best_cost], axis=1)
best_df.columns = ["value", "cost"]
best_df = best_df.sort_values(by=['value'], ascending=False).reset_index(drop=True)
# Naive optimisation
best_df_cum = best_df.cumsum()
opt_index = best_df_cum['cost'].searchsorted(max_cost)
naive_order_value = spherical(best_df_cum.iloc[opt_index]['value'], 0)
naive_cost_check = spherical(best_df_cum.iloc[opt_index]['cost'], 0)
print(f'The full order worth from the naive therapy technique is {naive_order_value} with a value of {naive_cost_check}')

Person generated picture

Optimising therapy methods with Linear Programming

We begin by making a dataframe with the price of every therapy for every buyer.

# Value mapping for all therapies
cost_mapping = {'T1': [cost_dict["T1"]] * 30000,
'T2': [cost_dict["T2"]] * 30000,
'T3': [cost_dict["T3"]] * 30000}# Create DataFrame
df_costs = pd.DataFrame(cost_mapping)

Now it’s time to make use of the OR Instruments package deal to resolve this project downside! The code takes the next inputs:

Value constraints
Array containing the price of every therapy for every buyer
Array containing the estimated order worth for every therapy for every buyer

The code outputs a dataframe with every prospects potential therapy, and a column indicating which one is the optimum project.

solver = pywraplp.Solver.CreateSolver('SCIP')# Set constraints
max_cost = 3000
min_cost = 3000
# Create enter arrays
prices = df_costs.to_numpy()
order_value = cate.to_numpy()
num_custs = len(prices)
num_treatments = len(prices[0])
# x[i, j] is an array of 0-1 variables, which can be 1 if buyer i is assigned to therapy j.
x = {}
for i in vary(num_custs):
for j in vary(num_treatments):
x[i, j] = solver.IntVar(0, 1, '')
# Every buyer is assigned to at most 1 therapy.
for i in vary(num_custs):
solver.Add(solver.Sum([x[i, j] for j in vary(num_treatments)]) <= 1)
# Value constraints
solver.Add(sum([costs[i][j] * x[i, j] for j in vary(num_treatments) for i in vary(num_custs)]) <= max_cost)
solver.Add(sum([costs[i][j] * x[i, j] for j in vary(num_treatments) for i in vary(num_custs)]) >= min_cost)
# Goal
objective_terms = []
for i in vary(num_custs):
for j in vary(num_treatments):
objective_terms.append((order_value[i][j] * x[i, j] - prices[i][j] * x[i, j] ))
solver.Maximize(solver.Sum(objective_terms))
# Remedy
standing = solver.Remedy()
assignments = []
values = []
if standing == pywraplp.Solver.OPTIMAL or standing == pywraplp.Solver.FEASIBLE:
for i in vary(num_custs):
for j in vary(num_treatments):
# Take a look at if x[i,j] is 1 (with tolerance for floating level arithmetic).
if x[i, j].solution_value() > -0.5:
assignments.append([i, j])
values.append([x[i, j].solution_value(), prices[i][j] * x[i, j].solution_value(), order_value[i][j]])
# Create a DataFrame from the collected knowledge
df = pd.DataFrame(assignments, columns=['customer', 'treatment'])
df['assigned'] = [x[0] for x in values]
df['cost'] = [x[1] for x in values]
df['order_value'] = [x[2] for x in values]
df

While maintaining to the fee constraint of £3k, we are able to generate £18k so as worth utilizing the optimised therapy technique. That is 36% greater than the naive strategy!

opt_order_value = spherical(df['order_value'][df['assigned'] == 1].sum(), 0)
opt_cost_check = spherical(df['cost'][df['assigned'] == 1].sum(), 0)print(f'The full order worth from the optimised therapy technique is {opt_order_value} with a value of {opt_cost_check}')

Person generated picture

Right now we coated utilizing Double Machine Studying and Linear Programming to optimise therapy methods. Listed here are some closing ideas:

We coated Linear DML, chances are you’ll wish to discover various approaches that are extra suited to coping with advanced interplay results within the second stage mannequin:

But additionally keep in mind you don’t have to make use of DML, different strategies like T-Learner or DR-Learner may very well be used.
To maintain this text to a fast learn I didn’t tune the hyper-parameters — As we improve the complexity of the issue and strategy used, we have to pay nearer consideration to this half.
Linear programming/project issues are NP onerous, so you probably have a big buyer base and/or a number of therapies this a part of the code could take a very long time to run.
It may be difficult operationalising a every day pipeline with linear programming/project issues — Another is operating the optimisation periodically and studying the optimum coverage primarily based on the outcomes with the intention to create a segmentation to make use of in a every day pipeline.

[ad_2]