Archive for the ‘Data Science’ Category

Data Science Lessons: Insights from an Agricultural Proof of Concept

Agriculture has come a long way from ancient times through the industrial revolution to the current digital era. In 2017, modern agricultural organizations have access to increasingly large amounts of data collected by sensors from soil quality measurements, weather sensors, GPS guided machinery, and more. According to a USDA’s recent survey, more than 60 percent of corn and soybean crops are monitored by data collection devices (source). However, there is still a substantial gap between the potential of utilizing this data and what happens in reality. Despite having the data, many companies lack the capability to effectively process, analyze, and efficiently build informative models in order to make data-driven decisions.

That’s where guidance from data service providers, such as Virtustream, can help. Virtustream provides data management expertise, tools and data science consulting to enable customers across different industries to get value from their data resources.

Our data science team in Dell IT recently initiated a Data-Science-as-a-Service Proof of Concept (PoC) as part of a Virtustream service engagement with a large company that plants thousands of farms across USA. Virtustream had enabled the company to become more data-driven by harnessing its large amounts of data, as well as developing and implementing different applications that enable scalable, faster, and more accurate operations – operations that couldn’t be executed with existing tools. Our PoC sought to demonstrate the speed and efficiency of those analytics applications.

(more…)

Framing the Data Science Proof of Concept

Brahma Tangella

Brahma Tangella

Sr. Manager, Service Strategy, Dell IT

Whether companies refer to results, outcomes, ROI, or case studies, Big Data and data science are finally moving beyond the hype and proving to deliver dividends over time. Several new Big Data technologies and predictive tools have been launched to meet the growing demand within business and technology groups to harness the constant growth of both structured and unstructured data within and outside of the enterprise. But such technologies and tools won’t be effective unless you define the problem to be addressed.

Most data science initiatives start with a proof of concept (PoC) or in some cases with a proof of value (PoV) if the foundational concept is clearly established. Developing a pipeline of PoC’s can be extremely helpful through working sessions with data scientists, business subject matter experts (SME’s), data experts, and leaders. Following this, prioritize PoCs by stack-ranking each of them based on business value and ease of implementation which factors in availability of data, granularity, and quality.

(more…)

The Data Science of Predicting Disk Drive Failures

Shiri Gaber

Shiri Gaber

Data Scientist, Dell IT
Shiri Gaber

Latest posts by Shiri Gaber (see all)

With the expanding volume of information in the digital universe and the increasing number of disk drives required to store that information, disk drive reliability prediction is imperative for EMC and EMC customers.

Information Expansion

Figure 1- An illustration of the information expansion in the last years and expected growth

Disk drive reliability analysis, which is a general term for the monitoring and “learning” process of disk drive prior-to-failure patterns, is a highly explored domain both in academia and in the industry. The Holy Grail for any data storage company is to be able to accurately predict drive failures based on measurable performance metrics.

Naturally, improving the logistics of drive replacements is worth big money for the business. In addition, predicting that a drive will fail long enough in advance can facilitate product maintenance, operation and reliability, dramatically improving Total Customer Experience (TCE). In the last few months, EMC’s Data Science as a Service (DSaaS) team has been developing a solution capable of predicting the imminent failures of specific drives installed at customer sites.

(more…)

Why a Data Lake? Keeping Up with the Digital Universe

Brahma Tangella

Brahma Tangella

Sr. Manager, Service Strategy, Dell IT

With the digital universe expected to swell to 44 zettabytes of data by 2020, today’s enterprises need a central data repository that can process increasing volumes of all types of data faster to let business users make better, real-time decisions. In short they need a stronger backbone; they need the data lake!

Not only do traditional databases constrain real-time and shared data analytics due to their siloed nature, they also lack the technology to accommodate the skyrocketing level and types of data being created at an increasing rate. After all, according to IDC research, the growing number of smart devices that analyze everything from home heating systems to consumer information will mean that within four years there will be some 7 billion connected people using an estimated 30 billion devices.

(more…)

Enterprise Information Retrieval: Build vs. Buy Approach

The ease with which we have long been able to retrieve information from the World Wide Web (WWW) using increasingly efficient and high quality search engines underscores a less-than-impressive performance from search engines serving the enterprise environment. Off-the-shelf tools that let organizations retrieve their enterprise information just do not give us the same experience as Google or Bing. But what if you could build your own enterprise information retrieval system by leveraging open source tools and platforms?

In this blog, we will explore the feasibility of doing just that.

(more…)

Unlock the Textual Content in Your Data Lake

Dr. Raphael Cohen

Dr. Raphael Cohen

Principal Data Scientist, EMC IT

Wouldn’t it be great if you could analyze all customer interaction and learn which parts of our services or sales are better than others? Or analyze all of our service request textual descriptions and infer the call volume drivers? Understand the main topics of a chat session? Use the same data to understand how the customers are actually using our products? Or to go beyond customer interactions and help us identify the common bugs in our code by analyzing the text engineers type in a bug tracking system such as Jira or Bugzilla?

Liberating your data is not enough if a big chunk of it remains locked in human generated texts.

EMC’s Data Science as a Service team has created a highly-advanced text analytics technology which can help your organization unlock the value in human generated texts.

(more…)

The Business Data Lake from a Data Scientist Perspective

Oshry Ben-Harush

Oshry Ben-Harush

Data Science Manager, EMC IT

The Business Data Lake (BDL) is positioned as the one-stop-shop for all of the organization’s (big) data storage and analytics requirements. It is intended to address the three V’s of Big Data analytics – Volume, Variety and Velocity – by providing a vast amount of storage, ingestion of streaming, mini-batches and batches of data, either structured, semi-structured or unstructured. It fundamentally shifts the paradigm in business data storage and analytics by consolidating the multiple silos of data that can be found in organizations today.

BDL3 (more…)

Predictive Analytics for IT Operations: Continuing the Journey

Bhanu Dhanaraj

Bhanu Dhanaraj

Sr. Manager, Enterprise Analytics, EMC IT
Bhanu Dhanaraj

Latest posts by Bhanu Dhanaraj (see all)

Viktor Mayer-Schonberger and Kenneth Cukier, authors of Big Data: A Revolution That Will Transform How We Live, Work and Think, wrote, “If big data teaches us anything, it is that just acting better, making improvements – without deeper understanding – is often good enough.”

EMC IT not only recognizes the hidden value of Big Data, but also strives to generate better outcomes. So, we at EMC IT can act better and faster to improve our customers’ experience.

In his November 2013 article, Dan Inbar from EMC’s IT organization eloquently presented what IT has been doing to improve the operations of our Exchange email environment. PAITO (Predictive Analytics for IT Operations) is our Big Data analytics solution for outage prediction that allows our IT operations team to collect, analyze, store, and leverage key indicators to predict and prevent interruption in mission-critical operations. The journey that started more than a year ago as a pilot has evolved into a full-fledged IT data lake and analytics platform for various IT managed areas, including applications, servers, devices, licenses, network, storage, security and workloads. (more…)

Assessing Data Loss Costs: Value-Driven Protection of the Bottom Line

In an age when most companies invest to become data-driven, the value of data is increasingly a key criteria for making IT decisions, and the protection of the data becomes paramount to those decisions

When making backup-related decisions, price justification involves the potential capital loss to the organization when a data loss or unavailability occurs. Understanding the value of data and access to that data is key when prioritizing backup technology or even for deciding which infrastructure to protect during a cyber-attack. However, estimating this price is not trivial.

I recently worked on a research project with a team of academic partners at Ben-Gurion University for prioritizing data replication to minimize the monetary loss in the case of a disaster. The method we derived can limit the costs of data loss, and could provide a high return on investment (ROI) of up to one million dollars per incident.

(more…)

Creating New Business Value Around the Business Data Lake

Sean Brown

Sean Brown

Director, Big Data Service Strategy, EMC IT
Sean Brown

Latest posts by Sean Brown (see all)

If your organization is like most, you have multiple business groups seeking to leverage pools of segmented Big Data in various ways to improve their operations, gain insight into customers, target marketing efforts, hone product features and more.  Maybe you are even one of the few who have gained some significant value from these siloed business analytics using increasingly popular data science techniques.

However, most organizations, including EMC, still have a way to go to become an analytical enterprise, which bases both tactical and strategic decisions on data and analytics.  This does not mean that the decision-making is out of the hands of the leadership of the company and the years of experience they bring, but it does mean that every decision has been critiqued based on what your analysis is telling you.

(more…)

Follow Dell EMC

IT PROVEN MICROSITE:

Recent Tweets

Everything VMware! @Scottdelandy gives a fast #VMworld overview 💨 giving you time to scope out our VMware community… https://t.co/81NXKlHoZ2 about 2 hours ago
Manage your data simply, securely and cost effectively with Isilon Express. See the top 8 benefits:… https://t.co/bWdDLMhVkZ about 4 hours ago
Providing your business with the best service & support– see how our legacy services offerings have merged for you.… https://t.co/BUqtkbaLiT about 15 hours ago