Author Archive

Shiri Gaber

Shiri Gaber

Data Scientist, Dell IT
Shiri Gaber

Latest posts by Shiri Gaber (see all)

The Data Science of Predicting Disk Drive Failures

With the expanding volume of information in the digital universe and the increasing number of disk drives required to store that information, disk drive reliability prediction is imperative for EMC and EMC customers.

Information Expansion

Figure 1- An illustration of the information expansion in the last years and expected growth

Disk drive reliability analysis, which is a general term for the monitoring and “learning” process of disk drive prior-to-failure patterns, is a highly explored domain both in academia and in the industry. The Holy Grail for any data storage company is to be able to accurately predict drive failures based on measurable performance metrics.

Naturally, improving the logistics of drive replacements is worth big money for the business. In addition, predicting that a drive will fail long enough in advance can facilitate product maintenance, operation and reliability, dramatically improving Total Customer Experience (TCE). In the last few months, EMC’s Data Science as a Service (DSaaS) team has been developing a solution capable of predicting the imminent failures of specific drives installed at customer sites.


From Neuroscience to Data Science

Project: Root cause analysis of difference in support hours
ROI: Model suggests saving of 500-1,000 support hours on average weekly (up to $5M annually)

I have recently made the transition from academic neuroscience to becoming a member of the Data-Science-as-a-Service team in EMC’s IT organization. The change from academia to the business world is far from trivial. Coming from a computational neuroscience lab, where most of the work involved developing probabilistic models for the activity of neural populations, simulations and implementations were not a top priority. As a data scientist with a mostly theoretical background, coping with implementation, let alone implementation in a Big Data environment, is challenging.

Lucky for me, the change of scientific domains underlying the two disciplines is not as large a “leap” as it may seem at first. When you think about predictive analytics, what is more natural than to think of our brain as a complicated learning machine whose main goal is data compression and interpretation?