As organizations unleash the power of the data lake by providing business broader access to more and more data, they are facing a growing IT dilemma—How to keep improperly governed or poor quality data from polluting the data lake.
While IT’s traditional approach to managing data governance and quality have been quite effective over the years, the magnitude of data in today’s data lake is much larger than traditional data warehouse levels. Traditional tools and tactics are being overwhelmed by Big Data in the lake.
There are, however, strategies that organizations can use to reshape data governance and quality standards in the Big Data world. While our tactics and tools are still evolving, I will share some of the efforts we are developing at EMC IT to keep our data lake clean.
From using analytics to predict how our storage arrays will perform in the field, to engineering product configurations to best meet customers’ future needs, EMC is just beginning to tap into the gold mine of intelligence waiting to be extracted from our new data lake.
In fact, we are currently working on dozens of business use cases that are projected to drive millions in revenue opportunities. And we are just scratching the surface. There’s a lot more data available, more to be harvested, and more analytics to be built out as data scientists and business users hit their stride in exploring a new era of data-driven innovation at EMC.
As I noted in my earlier blog ( The Analytics Journey Leading to the Business Data Lake), EMC IT embarked on creating a data lake to transition from traditional business intelligence to advance analytics more than two years ago. A key focus of this effort was to address the fact that data scientists and business users seeking to leverage our growing amount of data were stifled by the need for such projects to go through IT, which was a costly and slow process that discouraged innovation.
We now have the foundation and tools in place to use data and analytics to create sustainable, long-term competitive differentiation. To get here, we worked closely with EMC affiliate Pivotal Software, Inc. to mature together and leverage the multi-tenancy capabilities of their Big Data Suite.
More than ever, businesses see their futures tied to their ability to harness the explosive growth in data. You may even be familiar with the Business Data Lake concept—a central repository of vast information which can be used across an enterprise to drive all business intelligence, advanced analytics and even, eventually, intelligent applications.
We, at EMC IT, are in the process of creating a Business Data Lake, and I will be sharing insights about our efforts in this blog. To start, let’s trace the vision that’s leading EMC IT and other businesses to the shores of this new data landmark.
The opinions and interests expressed on Dell EMC employee blogs are the employees' own and do not necessarily represent Dell EMC's positions, strategies or views. Dell EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the Dell EMC logo and content regarding Dell EMC products and services, employee blogs are independent of Dell EMC and Dell EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.