Creating a single data lake to serve a newly merged Dell Inc. and EMC Corp. is a bit like harnessing the tectonic shifts in the Earth’s crust that form the more traditional lakes some of us would rather be fishing on.
Both companies—united last fall as Dell Technologies, the world’s largest privately held technology company—have relied on somewhat different technologies to perform critical Big Data analytics that are key to their success. Critical data for each company was housed in multiple legacy systems and platforms. The challenge was how to bring everything together in a central repository—i.e. a data lake.
As soon as the groundbreaking merger took place last fall, a newly merged Big Data team, for which I serve as lead architect, began working to develop a world-class data ecosystem that would provide the right data, in right place, in the right format and at the right time to solve for current challenges and position the company for digital transformation.
It takes many different best-of-breed technologies to effectively harvest “game-changing” analytics value from the data lake. Getting the right architecture to navigate your data lake requires a deep understanding of both the needs of Big Data and the available technologies in order to match analytics use cases with the appropriate platforms to get results.
Do you need to analyze large amounts of data fast or process many queries simultaneously? Is the data you are using organized in columns and rows, customer records perhaps? Or are you searching document files?
Let’s look at the basics of data lake architecture, some of the technologies and tools you should consider, and how EMC IT is approaching this crucial process.
First off, my apologies for delaying the last part of this four part blog for so long. I have been building a fully automated application platform as a service product for EMC IT to allow us to deploy entire infrastructure stacks in minutes – all fully wired, protected and monitored, but that topic is for another blog.
In my last post,Best Practices For Virtualizing Your Oracle Database With VMware, the best practices were all about the virtual machine itself. This post will focus on VMware’s virtual storage layer, called a datastore. A datastore is storage mapped to the physical ESX servers that a VM’s luns, or disks, are provisioned onto. This is a critical component of any virtual database deployment as it is where the database files reside. It is also a silent killer of performance because there are no metrics that will tell you that you have a problem, just unexplained high IO latencies.
There are two trains of thought when you talk to people about virtualization. From the infrastructure point of view, it is all about getting more efficiency out of the physical infrastructure layer. On one hand you can try to go extreme with this approach, but it will come at the expense of incurring higher administrative costs required to constantly troubleshoot performance and functionality issues. The other point of view is mainly about reserving all of the resources of the underlying servers, just in case the application needs it. Fortunately, with VMware vSphere you can have both, by using a more balanced approach.
I promised, in my earlier posts, that I would publish the secret sauce to achieving both great performance and high efficiency when virtualizing Oracle databases – so here it is. I have broken it up into four categories: memory, networking, CPU and storage (vSphere datastores). I will actually save the datastore best practices for the next and last post in this series, due to their complexity.
Chances are your organization has begun virtualizing its application infrastructure (App tier) to gain revolutionary efficiencies and cost savings this transformation offers. Less common, but every bit as groundbreaking – for cost savings as well as plenty of other benefits – is virtualizing your organization’s Oracle database infrastructure.
To visualize the gains of virtualizing Oracle, picture the difference between a parking lot and a parking garage. The parking lot has a finite number of spaces in a given area of land. The parking garage, however, adds more levels to that same area, letting you double or triple the number of cars you can park within the same patch of ground.
Now consider a typical physical database server. It uses a given amount of power for operation as well as for cooling, yet most servers are only 10 to 20 percent utilized. The reality is that most workloads don’t require the full power of today’s servers but database administrators prefer to maintain excess server capacity rather than risk poor performance, due to insufficient compute power.
The opinions and interests expressed on Dell EMC employee blogs are the employees' own and do not necessarily represent Dell EMC's positions, strategies or views. Dell EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the Dell EMC logo and content regarding Dell EMC products and services, employee blogs are independent of Dell EMC and Dell EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.