Author Archive

Oshry Ben-Harush

Oshry Ben-Harush

Data Science Manager, EMC IT

The Business Data Lake from a Data Scientist Perspective

The Business Data Lake (BDL) is positioned as the one-stop-shop for all of the organization’s (big) data storage and analytics requirements. It is intended to address the three V’s of Big Data analytics – Volume, Variety and Velocity – by providing a vast amount of storage, ingestion of streaming, mini-batches and batches of data, either structured, semi-structured or unstructured. It fundamentally shifts the paradigm in business data storage and analytics by consolidating the multiple silos of data that can be found in organizations today.

BDL3 (more…)

The Price is Right: Predicting Cost of Support Contracts for Complex Products

One of the challenges hardware (and software) manufacturers are facing is estimating the future level of support required in maintaining their products. Underestimating the support requirements would lead to major loses on the support contract while overestimating hurts the competitive edge of the product.

Future level of support includes: replacements, repairs, remote and on-site support. To that end, manufacturers develop reliability models for everything from hard/flash drives to cars and aircraft. These models take into account different configuration parameters of the final product and its internal components.

Click to Enlarge

Click to Enlarge

In 2007, Google conducted a large-scale analysis for a subset of its drive population. It utilized an environment containing a large number of disk drives, collected different types of data from these drives to a Big Data store (Google’s Bigtable) and conducted an analysis of the different Key Performance Indicators (KPIs) and their correlation with drive mortality:

  • Manufacturer, Models and Vintage
  • Temperature
  • Utilization
  • Self-Monitoring, Analysis and Reporting Technology (M.A.R.T)

Contrary to expectations, Google’s researchers found that these KPIs are more useful for predicting trends for a large population than for predicting a single drive failure.

(more…)