Posts Tagged ‘data science’

Framing the Data Science Proof of Concept

Brahma Tangella

Brahma Tangella

Sr. Manager, Service Strategy, Dell IT

Whether companies refer to results, outcomes, ROI, or case studies, Big Data and data science are finally moving beyond the hype and proving to deliver dividends over time. Several new Big Data technologies and predictive tools have been launched to meet the growing demand within business and technology groups to harness the constant growth of both structured and unstructured data within and outside of the enterprise. But such technologies and tools won’t be effective unless you define the problem to be addressed.

Most data science initiatives start with a proof of concept (PoC) or in some cases with a proof of value (PoV) if the foundational concept is clearly established. Developing a pipeline of PoC’s can be extremely helpful through working sessions with data scientists, business subject matter experts (SME’s), data experts, and leaders. Following this, prioritize PoCs by stack-ranking each of them based on business value and ease of implementation which factors in availability of data, granularity, and quality.

(more…)

The Data Science of Predicting Disk Drive Failures

Shiri Gaber

Shiri Gaber

Data Scientist, Dell IT
Shiri Gaber

Latest posts by Shiri Gaber (see all)

With the expanding volume of information in the digital universe and the increasing number of disk drives required to store that information, disk drive reliability prediction is imperative for EMC and EMC customers.

Information Expansion

Figure 1- An illustration of the information expansion in the last years and expected growth

Disk drive reliability analysis, which is a general term for the monitoring and “learning” process of disk drive prior-to-failure patterns, is a highly explored domain both in academia and in the industry. The Holy Grail for any data storage company is to be able to accurately predict drive failures based on measurable performance metrics.

Naturally, improving the logistics of drive replacements is worth big money for the business. In addition, predicting that a drive will fail long enough in advance can facilitate product maintenance, operation and reliability, dramatically improving Total Customer Experience (TCE). In the last few months, EMC’s Data Science as a Service (DSaaS) team has been developing a solution capable of predicting the imminent failures of specific drives installed at customer sites.

(more…)

Why a Data Lake? Keeping Up with the Digital Universe

Brahma Tangella

Brahma Tangella

Sr. Manager, Service Strategy, Dell IT

With the digital universe expected to swell to 44 zettabytes of data by 2020, today’s enterprises need a central data repository that can process increasing volumes of all types of data faster to let business users make better, real-time decisions. In short they need a stronger backbone; they need the data lake!

Not only do traditional databases constrain real-time and shared data analytics due to their siloed nature, they also lack the technology to accommodate the skyrocketing level and types of data being created at an increasing rate. After all, according to IDC research, the growing number of smart devices that analyze everything from home heating systems to consumer information will mean that within four years there will be some 7 billion connected people using an estimated 30 billion devices.

(more…)

Enterprise Information Retrieval: Build vs. Buy Approach

The ease with which we have long been able to retrieve information from the World Wide Web (WWW) using increasingly efficient and high quality search engines underscores a less-than-impressive performance from search engines serving the enterprise environment. Off-the-shelf tools that let organizations retrieve their enterprise information just do not give us the same experience as Google or Bing. But what if you could build your own enterprise information retrieval system by leveraging open source tools and platforms?

In this blog, we will explore the feasibility of doing just that.

(more…)

Unlock the Textual Content in Your Data Lake

Dr. Raphael Cohen

Dr. Raphael Cohen

Principal Data Scientist, EMC IT

Wouldn’t it be great if you could analyze all customer interaction and learn which parts of our services or sales are better than others? Or analyze all of our service request textual descriptions and infer the call volume drivers? Understand the main topics of a chat session? Use the same data to understand how the customers are actually using our products? Or to go beyond customer interactions and help us identify the common bugs in our code by analyzing the text engineers type in a bug tracking system such as Jira or Bugzilla?

Liberating your data is not enough if a big chunk of it remains locked in human generated texts.

EMC’s Data Science as a Service team has created a highly-advanced text analytics technology which can help your organization unlock the value in human generated texts.

(more…)

The Business Data Lake from a Data Scientist Perspective

Oshry Ben-Harush

Oshry Ben-Harush

Data Science Manager, EMC IT

The Business Data Lake (BDL) is positioned as the one-stop-shop for all of the organization’s (big) data storage and analytics requirements. It is intended to address the three V’s of Big Data analytics – Volume, Variety and Velocity – by providing a vast amount of storage, ingestion of streaming, mini-batches and batches of data, either structured, semi-structured or unstructured. It fundamentally shifts the paradigm in business data storage and analytics by consolidating the multiple silos of data that can be found in organizations today.

BDL3 (more…)

Assessing Data Loss Costs: Value-Driven Protection of the Bottom Line

In an age when most companies invest to become data-driven, the value of data is increasingly a key criteria for making IT decisions, and the protection of the data becomes paramount to those decisions

When making backup-related decisions, price justification involves the potential capital loss to the organization when a data loss or unavailability occurs. Understanding the value of data and access to that data is key when prioritizing backup technology or even for deciding which infrastructure to protect during a cyber-attack. However, estimating this price is not trivial.

I recently worked on a research project with a team of academic partners at Ben-Gurion University for prioritizing data replication to minimize the monetary loss in the case of a disaster. The method we derived can limit the costs of data loss, and could provide a high return on investment (ROI) of up to one million dollars per incident.

(more…)

Creating New Business Value Around the Business Data Lake

Sean Brown

Sean Brown

Director, Big Data Service Strategy, EMC IT
Sean Brown

Latest posts by Sean Brown (see all)

If your organization is like most, you have multiple business groups seeking to leverage pools of segmented Big Data in various ways to improve their operations, gain insight into customers, target marketing efforts, hone product features and more.  Maybe you are even one of the few who have gained some significant value from these siloed business analytics using increasingly popular data science techniques.

However, most organizations, including EMC, still have a way to go to become an analytical enterprise, which bases both tactical and strategic decisions on data and analytics.  This does not mean that the decision-making is out of the hands of the leadership of the company and the years of experience they bring, but it does mean that every decision has been critiqued based on what your analysis is telling you.

(more…)

From Neuroscience to Data Science

Shiri Gaber

Shiri Gaber

Data Scientist, Dell IT
Shiri Gaber

Latest posts by Shiri Gaber (see all)

Project: Root cause analysis of difference in support hours
ROI: Model suggests saving of 500-1,000 support hours on average weekly (up to $5M annually)

I have recently made the transition from academic neuroscience to becoming a member of the Data-Science-as-a-Service team in EMC’s IT organization. The change from academia to the business world is far from trivial. Coming from a computational neuroscience lab, where most of the work involved developing probabilistic models for the activity of neural populations, simulations and implementations were not a top priority. As a data scientist with a mostly theoretical background, coping with implementation, let alone implementation in a Big Data environment, is challenging.

Lucky for me, the change of scientific domains underlying the two disciplines is not as large a “leap” as it may seem at first. When you think about predictive analytics, what is more natural than to think of our brain as a complicated learning machine whose main goal is data compression and interpretation?

(more…)

Talent and Innovation Come Together To Reach the Third Platform

EMC IT Proven

EMC IT Proven

EMC IT
IT Proven allows you to leverage EMC IT’s first-hand knowledge and best practices to accelerate your own IT transformation journeys, transforming operations and delivering IT as a Service through the power of cloud computing. IT Proven highlights how EMC IT transformed into an agile, innovative, and competitive service provider.

ccthumbnail2The journey to the Third Platform is forging new requirements, skills and expectations of the evolving employee. To stay ahead of the curve and fulfill the opportunities associated with emerging technologies, CIOs must tap new resources to acquire global talent that drives innovation. Such is a key mission of the EMC Centers of Excellence, where EMC is constantly breaking new ground to the benefit of the enterprise and our partners and customers.

In the latest edition of Coffee in the Café with Vic Bhagat, Vic traveled to the EMC Israel Center of Excellence. There, he met with Orna Berry, Vice President of Growth and Innovation, to discuss how EMC is finding, fostering and developing talent in a hot bed for technology.
(more…)