With the digital universe expected to swell to 44 zettabytes of data by 2020, today’s enterprises need a central data repository that can process increasing volumes of all types of data faster to let business users make better, real-time decisions. In short they need a stronger backbone; they need the data lake!
Not only do traditional databases constrain real-time and shared data analytics due to their siloed nature, they also lack the technology to accommodate the skyrocketing level and types of data being created at an increasing rate. After all, according to IDC research, the growing number of smart devices that analyze everything from home heating systems to consumer information will mean that within four years there will be some 7 billion connected people using an estimated 30 billion devices.
The ease with which we have long been able to retrieve information from the World Wide Web (WWW) using increasingly efficient and high quality search engines underscores a less-than-impressive performance from search engines serving the enterprise environment. Off-the-shelf tools that let organizations retrieve their enterprise information just do not give us the same experience as Google or Bing. But what if you could build your own enterprise information retrieval system by leveraging open source tools and platforms?
In this blog, we will explore the feasibility of doing just that.
From adapting energy use to maximizing data consolidation, Big Data (BD) analytics has taken the guesswork out of optimizing the modern data center.
More than ever, the modern data center is a living, changing environment, with new technologies coming in, old technologies being cycled out, and evolving energy efficiency strategies to keep it all humming. We have to make sure we have the space and power to install the latest technology, while we still have the old equipment in place.
Up until recently, orchestrating this shifting ecosystem was only partially data-driven and the rest was based on gauging changing needs from past experience. At EMC IT—like most IT organizations—we had long tracked metrics on our data center facilities, including space, power, cooling, humidity, temperature, etc. And we collected storage data—server utilization, virtual machines, growth trends. But we lacked the tools to process this vast amount of data and we were never able to aggregate this information into one data base.
Converged infrastructure (CI)—pre-engineered and deeply integrated blocks of compute, storage and network that deliver mission-critical performance offered as a turnkey solution—has been a game changer in helping IT keep pace with rapidly evolving business demands. And now a more agile technological cousin, called hyper-converged infrastructure (HCI), offers similar plug-and-play efficiencies using building blocks in smaller, more flexible chunks.
So when should you use converged infrastructure and when is hyper-converged technology a better option? The answer depends on what IT workloads you are running, how much resiliency is required, and the need for guaranteed performance verses agility and scalability.
The data lake is proving to be a crucial tool as EMC IT strives to partner more closely with the business clients it serves to help them get the most out of enterprise Big Data. For example, EMC IT is offering a smart data base that lets business users across the company leverage a uniform customer profile for more efficient and effective sales analytics.
Created in collaboration with EMC Global Services, the CAP (Customer Account Profile) is based on information collected and aggregated from multiple sources to provide a holistic customer view—a single version of the truth, if you will, about our customers.
CAP is managed by IT and is one of the enterprise data sets made available via the data lake to business clients seeking to analyze customer trends, opportunities and insights.
Wouldn’t it be great if you could analyze all customer interaction and learn which parts of our services or sales are better than others? Or analyze all of our service request textual descriptions and infer the call volume drivers? Understand the main topics of a chat session? Use the same data to understand how the customers are actually using our products? Or to go beyond customer interactions and help us identify the common bugs in our code by analyzing the text engineers type in a bug tracking system such as Jira or Bugzilla?
Liberating your data is not enough if a big chunk of it remains locked in human generated texts.
EMC’s Data Science as a Service team has created a highly-advanced text analytics technology which can help your organization unlock the value in human generated texts.
The Business Data Lake(BDL) is positioned as the one-stop-shop for all of the organization’s (big) data storage and analytics requirements. It is intended to address the three V’s of Big Data analytics – Volume, Variety and Velocity – by providing a vast amount of storage, ingestion of streaming, mini-batches and batches of data, either structured, semi-structured or unstructured. It fundamentally shifts the paradigm in business data storage and analytics by consolidating the multiple silos of data that can be found in organizations today.
More than ever, businesses see their futures tied to their ability to harness the explosive growth in data. You may even be familiar with the Business Data Lake concept—a central repository of vast information which can be used across an enterprise to drive all business intelligence, advanced analytics and even, eventually, intelligent applications.
We, at EMC IT, are in the process of creating a Business Data Lake, and I will be sharing insights about our efforts in this blog. To start, let’s trace the vision that’s leading EMC IT and other businesses to the shores of this new data landmark.
Viktor Mayer-Schonberger and Kenneth Cukier, authors of Big Data: A Revolution That Will Transform How We Live, Work and Think, wrote, “If big data teaches us anything, it is that just acting better, making improvements – without deeper understanding – is often good enough.”
EMC IT not only recognizes the hidden value of Big Data, but also strives to generate better outcomes. So, we at EMC IT can act better and faster to improve our customers’ experience.
In his November 2013article, Dan Inbar from EMC’s IT organization eloquently presented what IT has been doing to improve the operations of our Exchange email environment. PAITO (Predictive Analytics for IT Operations) is our Big Data analytics solution for outage prediction that allows our IT operations team to collect, analyze, store, and leverage key indicators to predict and prevent interruption in mission-critical operations. The journey that started more than a year ago as a pilot has evolved into a full-fledged IT data lake and analytics platform for various IT managed areas, including applications, servers, devices, licenses, network, storage, security and workloads. (more…)
In an age when most companies invest to become data-driven, the value of data is increasingly a key criteria for making IT decisions, and the protection of the data becomes paramount to those decisions
When making backup-related decisions, price justification involves the potential capital loss to the organization when a data loss or unavailability occurs. Understanding the value of data and access to that data is key when prioritizing backup technology or even for deciding which infrastructure to protect during a cyber-attack. However, estimating this price is not trivial.
I recently worked on a research project with a team of academic partners at Ben-Gurion University for prioritizing data replication to minimize the monetary loss in the case of a disaster. The method we derived can limit the costs of data loss, and could provide a high return on investment (ROI) of up to one million dollars per incident.
The opinions and interests expressed on Dell EMC employee blogs are the employees' own and do not necessarily represent Dell EMC's positions, strategies or views. Dell EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the Dell EMC logo and content regarding Dell EMC products and services, employee blogs are independent of Dell EMC and Dell EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.