The data lake has not only allowed IT to open up Big Data to a broader community of internal business users, it is now helping us channel unprecedented amounts of information to DELL EMC customers as well.
Using data lake technology, for example, IT and our DELL EMC business groups forged a groundbreaking partnership to allow customers to leverage Big Data to monitor and proactively manage their IT environments. We created a tool called MyService360, an on-line solution that gives DELL EMC Support customers and partners easier and faster access to near real-time service information. It features a personalized dashboard that provides customers with a 360-degree view of their environment and customer service experience.
Launched last May, MyService360 only scratches the service of the potential value that is expected to spring-board from leveraging Big Data in the data lake. Having all the data in a centralized location provides easy access and gives developers and data scientists the opportunity to gain data insights that would be extremely difficult to achieve without the data lake. Those insights can then be used to create metrics that we can share to empower our customers.
Whether companies refer to results, outcomes, ROI, or case studies, Big Data and data science are finally moving beyond the hype and proving to deliver dividends over time. Several new Big Data technologies and predictive tools have been launched to meet the growing demand within business and technology groups to harness the constant growth of both structured and unstructured data within and outside of the enterprise. But such technologies and tools won’t be effective unless you define the problem to be addressed.
Most data science initiatives start with a proof of concept (PoC) or in some cases with a proof of value (PoV) if the foundational concept is clearly established. Developing a pipeline of PoC’s can be extremely helpful through working sessions with data scientists, business subject matter experts (SME’s), data experts, and leaders. Following this, prioritize PoCs by stack-ranking each of them based on business value and ease of implementation which factors in availability of data, granularity, and quality.
As organizations unleash the power of the data lake by providing business broader access to more and more data, they are facing a growing IT dilemma—How to keep improperly governed or poor quality data from polluting the data lake.
While IT’s traditional approach to managing data governance and quality have been quite effective over the years, the magnitude of data in today’s data lake is much larger than traditional data warehouse levels. Traditional tools and tactics are being overwhelmed by Big Data in the lake.
There are, however, strategies that organizations can use to reshape data governance and quality standards in the Big Data world. While our tactics and tools are still evolving, I will share some of the efforts we are developing at EMC IT to keep our data lake clean.
From using analytics to predict how our storage arrays will perform in the field, to engineering product configurations to best meet customers’ future needs, EMC is just beginning to tap into the gold mine of intelligence waiting to be extracted from our new data lake.
In fact, we are currently working on dozens of business use cases that are projected to drive millions in revenue opportunities. And we are just scratching the surface. There’s a lot more data available, more to be harvested, and more analytics to be built out as data scientists and business users hit their stride in exploring a new era of data-driven innovation at EMC.
As I noted in my earlier blog ( The Analytics Journey Leading to the Business Data Lake), EMC IT embarked on creating a data lake to transition from traditional business intelligence to advance analytics more than two years ago. A key focus of this effort was to address the fact that data scientists and business users seeking to leverage our growing amount of data were stifled by the need for such projects to go through IT, which was a costly and slow process that discouraged innovation.
We now have the foundation and tools in place to use data and analytics to create sustainable, long-term competitive differentiation. To get here, we worked closely with EMC affiliate Pivotal Software, Inc. to mature together and leverage the multi-tenancy capabilities of their Big Data Suite.
With the expanding volume of information in the digital universe and the increasing number of disk drives required to store that information, disk drive reliability prediction is imperative for EMC and EMC customers.
Figure 1- An illustration of the information expansion in the last years and expected growth
Disk drive reliability analysis, which is a general term for the monitoring and “learning” process of disk drive prior-to-failure patterns, is a highly explored domain both in academia and in the industry. The Holy Grail for any data storage company is to be able to accurately predict drive failures based on measurable performance metrics.
Naturally, improving the logistics of drive replacements is worth big money for the business. In addition, predicting that a drive will fail long enough in advance can facilitate product maintenance, operation and reliability, dramatically improving Total Customer Experience (TCE). In the last few months, EMC’s Data Science as a Service (DSaaS) team has been developing a solution capable of predicting the imminent failures of specific drives installed at customer sites.
From taking charge of healthcare choices to customizing product purchases, today’s consumers are increasingly using self-service, social, and mobile digital capabilities. EMC’s new MyService360 now brings that same personalized, proactive service to our Online Support customers.
Powered by EMC data lake solution, MyService360 (launched at EMC World 2016 on May 2) gives EMC Support customers easier and faster access to real-time information at their fingertips. Using its easy-to-read visual and powerful analytics, customers can view analysis of code levels, health, and risk scoring on their installed EMC products, service activity views by site, incident management, and more.
Darryl Smith an EMC Distinguished Engineer and the Chief Database Architect in EMC's IT organization.
He is responsible for all databases at EMC, including one of the largest Oracle eBusiness Suites and Database Grid deployments in the world. He has been working with Oracle Databases since 1988 starting with version 4 and Oracle clustered databases since version 7.
Over the past 7 years, he has helped EMC capture and document the best practices learned from managing a global deployment of Oracle Applications, Middleware and Database Grids and actively engage with EMC and Oracle customers to share EMC's experience and perform knowledge transfer.
It takes many different best-of-breed technologies to effectively harvest “game-changing” analytics value from the data lake. Getting the right architecture to navigate your data lake requires a deep understanding of both the needs of Big Data and the available technologies in order to match analytics use cases with the appropriate platforms to get results.
Do you need to analyze large amounts of data fast or process many queries simultaneously? Is the data you are using organized in columns and rows, customer records perhaps? Or are you searching document files?
Let’s look at the basics of data lake architecture, some of the technologies and tools you should consider, and how EMC IT is approaching this crucial process.
In the expanding world of Big Data, there is more and more information out there that can help your organization target the right customers with the most effective messages for the right products and services at the right time. EMC IT is using data lake technology to help our Marketing and Sales teams gain unprecedented insights into our customer behaviors, needs and sentiments to drive effective marketing.
At the center of this effort is our Marketing Science Lab, which provides advanced analytics support for Marketing using a shared Marketing and Sales workspace in the data lake. The Lab collaborates with Sales on shared data and models to deliver 360 views of customer behaviors by analyzing a vast array of data from internal and increasingly, external sources.
With the digital universe expected to swell to 44 zettabytes of data by 2020, today’s enterprises need a central data repository that can process increasing volumes of all types of data faster to let business users make better, real-time decisions. In short they need a stronger backbone; they need the data lake!
Not only do traditional databases constrain real-time and shared data analytics due to their siloed nature, they also lack the technology to accommodate the skyrocketing level and types of data being created at an increasing rate. After all, according to IDC research, the growing number of smart devices that analyze everything from home heating systems to consumer information will mean that within four years there will be some 7 billion connected people using an estimated 30 billion devices.
The ease with which we have long been able to retrieve information from the World Wide Web (WWW) using increasingly efficient and high quality search engines underscores a less-than-impressive performance from search engines serving the enterprise environment. Off-the-shelf tools that let organizations retrieve their enterprise information just do not give us the same experience as Google or Bing. But what if you could build your own enterprise information retrieval system by leveraging open source tools and platforms?
In this blog, we will explore the feasibility of doing just that.
The opinions and interests expressed on Dell EMC employee blogs are the employees' own and do not necessarily represent Dell EMC's positions, strategies or views. Dell EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the Dell EMC logo and content regarding Dell EMC products and services, employee blogs are independent of Dell EMC and Dell EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.