Home Machine Learning De-Nesting Google Analytics Knowledge in BigQuery | by Martin Weitzmann | Mar, 2024

De-Nesting Google Analytics Knowledge in BigQuery | by Martin Weitzmann | Mar, 2024

0
De-Nesting Google Analytics Knowledge in BigQuery | by Martin Weitzmann | Mar, 2024

[ad_1]

The correct technique to flat tables

Photograph of Singapore by Mike Enerio on Unsplash

BigQuery is an analytics engine optimized to crunch pre-joined (or: nested) knowledge. Sub-relations make sense in analytical situations as a result of we don’t wish to take care of joins over larger datasets — simply think about day by day year-over-year comparisons over the past 3 years, aggregating Terabytes of knowledge — however with joins including one other layer of complexity.

A sub-relation, or sub-table, is normally carried out as an array of structs. The array as a list-like knowledge sort gives rows, the struct, just like a map or dictionary, gives columns. The sub-schema is constant all through the desk — in distinction to JSON varieties who can change their schema from row to row.

The one different engine happening this route of nested knowledge appears to be AWS Redshift Spectrum. But, if we wish to use Google Analytics (GA) knowledge in one other system you’d nearly all the time wish to de-join the info to have flat tables, as a result of capabilities to mixture or change arrays of structs are fairly restricted. Most analytical database engines appear to optimize for…

[ad_2]