Author Archive

Darryl Smith

Darryl Smith

Chief Data Platform Architect, Distinguished Engineer, Dell IT

Carving Out a New Data Lake

Creating a single data lake to serve a newly merged Dell Inc. and EMC Corp. is a bit like harnessing the tectonic shifts in the Earth’s crust that form the more traditional lakes some of us would rather be fishing on.

Both companies—united last fall as Dell Technologies, the world’s largest privately held technology company—have relied on somewhat different technologies to perform critical Big Data analytics that are key to their success. Critical data for each company was housed in multiple legacy systems and platforms. The challenge was how to bring everything together in a central repository—i.e. a data lake.

As soon as the groundbreaking merger took place last fall, a newly merged Big Data team, for which I serve as lead architect, began working to develop a world-class data ecosystem that would provide the right data, in right place, in the right format and at the right time to solve for current challenges and position the company for digital transformation.


Architecting a Data Lake: Matching Technology with Your Harvesting Needs

It takes many different best-of-breed technologies to effectively harvest “game-changing” analytics value from the data lake. Getting the right architecture to navigate your data lake requires a deep understanding of both the needs of Big Data and the available technologies in order to match analytics use cases with the appropriate platforms to get results.

Do you need to analyze large amounts of data fast or process many queries simultaneously? Is the data you are using organized in columns and rows, customer records perhaps? Or are you searching document files?

Let’s look at the basics of data lake architecture, some of the technologies and tools you should consider, and how EMC IT is approaching this crucial process.

Data Lake: Core Architectures


Best Practices for Virtualizing Your Oracle Database – Datastores

First off, my apologies for delaying the last part of this four part blog for so long.  I have been building a fully automated application platform as a service product for EMC IT to allow us to deploy entire infrastructure stacks in minutes – all fully wired, protected and monitored, but that topic is for another blog.

In my last post, Best Practices For Virtualizing Your Oracle Database With VMware, the best practices were all about the virtual machine itself.  This post will focus on VMware’s virtual storage layer, called a datastore.  A datastore is storage mapped to the physical ESX servers that a VM’s luns, or disks, are provisioned onto.   This is a critical component of any virtual database deployment as it is where the database files reside.  It is also a silent killer of performance because there are no metrics that will tell you that you have a problem, just unexplained high IO latencies.


Best Practices For Virtualizing Your Oracle Database With VMware

In this blog (the third in a series on virtualizing Oracle), I will describe the best practices that EMC IT developed as we virtualized our most mission critical and highly transactional databases. You can find the earlier blogs here: [Running Oracle on Virtual Infrastructure Really Pays Off, Best Practices for Virtualizing Your Oracle Database]

There are two trains of thought when you talk to people about virtualization. From the infrastructure point of view, it is all about getting more efficiency out of the physical infrastructure layer. On one hand you can try to go extreme with this approach, but it will come at the expense of incurring higher administrative costs required to constantly troubleshoot performance and functionality issues.  The other point of view is mainly about reserving all of the resources of the underlying servers, just in case the application needs it. Fortunately, with VMware vSphere you can have both, by using a more balanced approach.

I promised, in my earlier posts, that I would publish the secret sauce to achieving both great performance and high efficiency when virtualizing Oracle databases – so here it is. I have broken it up into four categories: memory, networking, CPU and storage (vSphere datastores).  I will actually save the datastore best practices for the next and last post in this series, due to their complexity.


Running Oracle on Virtual Infrastructure Really Pays Off: Save Big on Capital, Operating and License Costs

Chances are your organization has begun virtualizing its application infrastructure (App tier) to gain revolutionary efficiencies and cost savings this transformation offers. Less common, but every bit as groundbreaking – for cost savings as well as plenty of other benefits – is virtualizing your organization’s Oracle database infrastructure.

To visualize the gains of virtualizing Oracle, picture the difference between a parking lot and a parking garage. The parking lot has a finite number of spaces in a given area of land. The parking garage, however, adds more levels to that same area, letting you double or triple the number of cars you can park within the same patch of ground.

Now consider a typical physical database server. It uses a given amount of power for operation as well as for cooling, yet most servers are only 10 to 20 percent utilized.  The reality is that most workloads don’t require the full power of today’s servers but database administrators prefer to maintain excess server capacity rather than risk poor performance, due to insufficient compute power.


Follow Dell EMC


Recent Tweets

The @CRN 2017 Products of the Year are out! Winners/finalists: ⭐ VMAX high-end storage ⭐ PowerEdge server ⭐ VxRail… about 35 mins ago
RT @DellEMC_CI: The holidays are a great time to be thankful! @Sakacc shares a story about @AeroFarms, a company that is using technology t… about 1 hour ago
How do you reach today’s hyper-connected customer? EMEA Marketing SVP @MargaretatDell highlights 5 key ways: about 3 hours ago