Bayesian Logistic Regression in Python | by Fraser Brown

Machine Learning

Bayesian Logistic Regression in Python | by Fraser Brown | Feb, 2024

hhhhm

2024年2月21日

Bayesian Logistic Regression in Python | by Fraser Brown | Feb, 2024

[ad_1]

How one can clear up binary classification issues utilizing Bayesian strategies in Python.

Bayesian Pondering — OpenAI DALL-E Generated Picture by Writer

Introduction

On this article, I’ll construct a easy Bayesian logistic regression mannequin utilizing Pyro, a Python probabilistic programming bundle. This text will cowl EDA, function engineering, mannequin construct and analysis. The main focus is to supply a easy framework for Bayesian logistic regression. Subsequently, the depth of the primary two sections might be restricted. The code used on this article will be discovered right here:

Exploratory Knowledge Evaluation

I’m utilizing the center failure prediction dataset from Kaggle, linked under. This dataset is offered underneath the Open Knowledge Commons Open Database License (ODbL) v1.0. Full reference to this dataset will be discovered on the finish of this text.

This dataset accommodates 918 examples and 11 options for predicting coronary heart illness. The goal variable is ‘HeartDisease’. There are 5 numeric and 6 categorical options on this dataset. To discover the distributions of the numeric options, I generated boxplots utilizing seaborn, such because the one under.

Field plot of the function OldPeak — Picture by Writer

One thing to spotlight is the presence of outliers within the boxplot above. Outliers have been current in most of the numeric options. That is essential to notice as it is going to affect the function scaling technique used within the subsequent part. For categorical variables, I produced bar plots containing the quantity of every class break up by the goal class.

Field plot of the function ST_Slope — Picture by Writer

These graphs point out that each of those variables could possibly be predictive, given the distinction in distribution by the goal variable, ‘HeartDisease’.

Characteristic Engineering

I used standardisation scaling for steady numerical options and one-hot encoding for categorical options for this mannequin. My choice to make use of this scaling technique was because of the presence of outliers within the options. Normalisation scaling is extra delicate to outliers, subsequently using the approach would require utilizing strategies to deal with the outliers, or to take away them utterly. For simplicity, I opted to make use of standardisation scaling, which is much less delicate to outliers.

Take a look at and Coaching Knowledge

I break up the information into coaching and check units utilizing an 80/20 break up. The operate under generates the coaching and check information. Word that information is returned as PyTorch tensors.