Home Machine Learning Do not Repair Unhealthy Information High quality, Do This As an alternative

Do not Repair Unhealthy Information High quality, Do This As an alternative

0
Do not Repair Unhealthy Information High quality, Do This As an alternative

[ad_1]

Individuals don’t know what they imply after they speak about knowledge high quality.

Picture by No Revisions on Unsplash

Just a few years in the past, our knowledge platform crew aimed to pinpoint the first issues of our knowledge customers. We performed a survey amongst people interacting with our knowledge platform, and unsurprisingly, the primary concern highlighted was knowledge high quality.

The preliminary response, attribute of our engineering mindset, was to develop knowledge high quality tooling. We launched an inner software named Contessa. Regardless of being considerably cumbersome and necessitating vital guide configuration, Contessa facilitated checks for traditional dimensions of knowledge high quality, encompassing consistency, timeliness, validity, uniqueness, accuracy and completeness. After working the software for a few months with lots of of knowledge high quality checks we concluded that:

  • Information high quality checks sometimes assisted knowledge customers in discovering, in a shorter timeframe, that the info was compromised and couldn’t be relied upon.
  • Regardless of the frequent execution of knowledge high quality checks, there was no noticeable enchancment within the subjective notion of knowledge high quality.
  • For a good portion of points, notably these recognized by way of automated knowledge high quality checks similar to consistency or validity, no corrective actions have been ever taken.

Survey and goal measurement are helpful instruments, however nothing can substitute a dialogue over espresso and cake, as Jane Carruthers writes in her e book, “The Chief Information Officer’s Playbook”. Certainly, I like to recommend this to anyone, as one-on-one conversations helped us uncover one other necessary angle of the scenario. A few of these conversations unfolded as follows:

“Hey, you say, that knowledge high quality is poor, what do you imply by that?”

#1 Pricing enterprise analyst: “We’re engaged on organising value for the ancillary product X. Within the dataset we use, we’re lacking knowledge on what was the precise income from the product X per every order. We’ve got this dataset , however it accommodates solely anticipated worth of the income from X at time of the acquisition. We are able to see additionally the precise income per product, however not on the order granularity.”

[ad_2]