Data Drift

Data Drift is the phenomenon where changes to data degrade model performance.

Also known as distributional shift, data drift is where changes in the underlying data causes models to become less accurate over time. Since models tend to be trained on an initial set of data and are then put into production, if the underlying data changes the model will become unaccurate.

There are many potential causes of data drift.

  • The process that produces the data could change. Example: An IoT sensor measuring data could become uncalibrated.
  • Unexpected changes to the data infrastructure: Example: A team serving data in an enterprise setting changes their data definitions.
  • General exogenous changes to the data. Example: Behavioral changes that cause people to drive faster or slower would change a traffic dataset.

To combat data drift, it's important to monitor the data and frequently retrain models.

Create a free

account

Sign up to try HASH out for yourself, and see what all the fuss is about

By signing up you agree to our terms and conditions and privacy policy