As organizations unleash the power of the data lake by providing business broader access to more and more data, they are facing a growing IT dilemma—How to keep improperly governed or poor quality data from polluting the data lake.
While IT’s traditional approach to managing data governance and quality have been quite effective over the years, the magnitude of data in today’s data lake is much larger than traditional data warehouse levels. Traditional tools and tactics are being overwhelmed by Big Data in the lake.
There are, however, strategies that organizations can use to reshape data governance and quality standards in the Big Data world. While our tactics and tools are still evolving, I will share some of the efforts we are developing at EMC IT to keep our data lake clean.