Archive for the ‘Data Science’ Category

From Neuroscience to Data Science

Shiri Gaber

Shiri Gaber

Data Scientist, Dell IT
Shiri Gaber

Latest posts by Shiri Gaber (see all)

Project: Root cause analysis of difference in support hours
ROI: Model suggests saving of 500-1,000 support hours on average weekly (up to $5M annually)

I have recently made the transition from academic neuroscience to becoming a member of the Data-Science-as-a-Service team in EMC’s IT organization. The change from academia to the business world is far from trivial. Coming from a computational neuroscience lab, where most of the work involved developing probabilistic models for the activity of neural populations, simulations and implementations were not a top priority. As a data scientist with a mostly theoretical background, coping with implementation, let alone implementation in a Big Data environment, is challenging.

Lucky for me, the change of scientific domains underlying the two disciplines is not as large a “leap” as it may seem at first. When you think about predictive analytics, what is more natural than to think of our brain as a complicated learning machine whose main goal is data compression and interpretation?

(more…)

Smart Data Visualization: Helping Decision Makers Get the Picture

Dr. Lena Tenenboim-Chekina

Dr. Lena Tenenboim-Chekina

Sr. Data Scientist, EMC IT
Dr. Lena Tenenboim-Chekina

Latest posts by Dr. Lena Tenenboim-Chekina (see all)

Smart data visualization is proving to be an essential tool in maintaining increasingly complex Big Data systems in the cloud.

The adoption of Big Data tools and technology heavily relies on distributed scaled out computing. One of the main differences in this setting is that it includes systems that operate as a whole on top of several independent hosts. These hosts coordinate their actions with limited information and as a result maintenance complexity significantly increases. One way to overcome this challenge is smart data visualization, which helps the IT experts and management pinpoint the source of problems quickly.

The need for smart visualization is not unique to this problem. Representing complex data as a concise picture which tells decision-makers a story is a key part of any data analytics or data science project. Valuable results of a rigorous analysis may remain undiscovered due to a lack of a visualization clearly communicating the underlying information to the reader. The importance of data visualization is not a novelty. A number of visualization tools, as well as a general interest in data visualization topics, have exploded in popularity in recent years, as evidenced by the proliferation of literature available about infographics and visualization arcanum in both print and online media.

(more…)

EMC CIO Vic Bhagat: Embracing the Power of Big Data

Dell IT Proven

Dell IT Proven

Dell IT
IT Proven allows you to leverage Dell IT’s first-hand knowledge and best practices to accelerate your own IT transformation journeys, transforming operations and delivering IT as a Service through the power of cloud computing. IT Proven highlights how Dell IT transformed into an agile, innovative, and competitive service provider.

VicBig Data is changing the way IT organizations operate and deliver solutions to the business. It is a new, contemporary approach for IT to help business users harness and interpret information to drive more efficiency, productivity, performance and value for the business.  As EMC IT embraces Third Platform, we are breaking new ground with Big Data analytics to better position the organization to deliver a more competitive solutions.

EMC CIO Vic Bhagat (@VicBhagat) addressed this topic and more in a recent interview with the Pivotal Blog, tackling the questions, challenges and opportunities facing both EMC IT and global CIOs. Where can IT organizations begin? How can they drive new behaviors? How should they address internal clients?

Continue reading >>

The Price is Right: Predicting Cost of Support Contracts for Complex Products

Oshry Ben-Harush

Oshry Ben-Harush

Data Science Manager, EMC IT

One of the challenges hardware (and software) manufacturers are facing is estimating the future level of support required in maintaining their products. Underestimating the support requirements would lead to major loses on the support contract while overestimating hurts the competitive edge of the product.

Future level of support includes: replacements, repairs, remote and on-site support. To that end, manufacturers develop reliability models for everything from hard/flash drives to cars and aircraft. These models take into account different configuration parameters of the final product and its internal components.

Click to Enlarge

Click to Enlarge

In 2007, Google conducted a large-scale analysis for a subset of its drive population. It utilized an environment containing a large number of disk drives, collected different types of data from these drives to a Big Data store (Google’s Bigtable) and conducted an analysis of the different Key Performance Indicators (KPIs) and their correlation with drive mortality:

  • Manufacturer, Models and Vintage
  • Temperature
  • Utilization
  • Self-Monitoring, Analysis and Reporting Technology (M.A.R.T)

Contrary to expectations, Google’s researchers found that these KPIs are more useful for predicting trends for a large population than for predicting a single drive failure.

(more…)

Text Analytics: Easy Classification For Routing Service Requests

Dr. Raphael Cohen

Dr. Raphael Cohen

Principal Data Scientist, EMC IT

The first step we ask as data scientists when we approach a new project is what’s the customer’s available data? While some of the time the answer will be a table or file with lots of nice numbers just waiting to be ingested by a machine learning classifier, most of the time a big chunk of the information will be stored in free text columns or documents.

As a customer-facing organization we store information describing EMC’s interaction with clients: some of it structured such as time to close, problem codes, etc.— but also free text fields such as problem summary or comments from the customer satisfaction survey.  These free text fields can be used for accurately routing service requests to the right support team to improve resolution times and customer satisfaction, identify burning issues in the customer satisfaction survey or identify emerging problems.

Similarly, Sales would like to use a potential customer’s web site in order to categorize that company’s needs and identify products sold to similar companies.

(more…)