BigQuery Strategies For Re-Creating Pandas’ Prime EDA Capabilities | by Tom Ellyatt

Machine Learning

BigQuery Strategies For Re-Creating Pandas’ Prime EDA Capabilities | by Tom Ellyatt | Feb, 2024

hhhhm

2024年2月6日

BigQuery Strategies For Re-Creating Pandas’ Prime EDA Capabilities | by Tom Ellyatt | Feb, 2024

[ad_1]

On this information, we’ll discover the best way to re-create key Pandas features used for EDA equivalent to describe and corr in BigQuery

Transitioning from BigQuery/SQL to Python may be fairly eye-opening, particularly within the context of knowledge evaluation. I typically discover myself writing intensive queries to govern and analyze knowledge in BigQuery SQL. It’s a robust language, however it will possibly get fairly heavy.

Now, once I switched to Python, I used to be stunned by how streamlined sure duties had been. Python’s libraries, like pandas, permit you to carry out knowledge manipulations and analyses that will be cumbersome in SQL.

I discovered just a few Pandas features like DESCRIBE, CORR, and ISNULL().SUM() tremendous helpful, and wished they had been in BigQuery. This bought me exploring different cool EDA features in pandas and impressed me to put in writing this text. Right here, I’m sharing the strategies and code I got here up with in BigQuery to match a number of the greatest pandas EDA features.

Let’s get caught in!

On this article, we’ll check out these 13 features:

Head / Tail
Columns
Dtypes
Nunique
Distinctive
ISNA / ISNULL()
ISNULL().SUM()
DropNA
Form
Corr
Nlargest
Pattern
Describe

All through this text, we’ll mess around with the favored mtcars dataset. The mtcars dataset is a publicly accessible built-in dataset in R. It contains 11 options of 32 cars from the 1974 Motor Pattern US journal.

My picture, screenshot taken from R Studio

Panda Icon Supply — Flaticon (hyperlink)

While you first have a look at a dataset, contemplate ‘Head’ and ‘Tail’ as the back and front pages…

[ad_2]