There is only one thing guaranteed when writing ETL pipelines and that is that upstream changes WILL occur and WILL break your ETL pipeline.
As Data Engineers we MUST make sure that we test well so that we can identify breaking changes, rapidly, AND fix and deploy just as fast. To do this we need to use automated testing.
To get started here are links some links to a four part blog series I wrote on testing ETL pipelines, the theory is the same whether your pipelines are SSIS packages, Informatica, Python / Airflow or anything you can dream of:
- Part one, unit tests
- Upstream breakages, part one
- Upstream breakages, part two
- Testing in production
Here are slides from a talk I have given on ETL testing:
This is a talk I did for the awesome groupby conference in 2020 about how to test our data pipeline:
Finally, if you want to unit test T-SQL code (whether part of a data pipeline or not) then I have written a self-paced course on unit testing T-SQL using the awesome framework tSQLt: