Название: Data Lakes For Dummies
Автор: Alan R. Simon
Издательство: John Wiley & Sons Limited
Жанр: Базы данных
isbn: 9781119786184
isbn:
Deciding what to do about your data warehouses
Aligning your data lake plans with your organization’s analytical needs
Setting your data velocity speed limits
Getting a handle on your analytical costs
Suppose that you and about 15 other family members or friends all head to your favorite lake for a weeklong summer vacation.
You love going to the lake because you jump into your sailboat every day and spend hours out on the water. Others in your group, though, have their own favorite pastimes. Some prefer a boat with a little more “oomph” and spend their days in speedboats, zooming up and down the length of the lake. Others prefer leisurely canoeing. Some are into waterskiing, so they take turns latching onto one of those speedboats and zipping along the water. Others in your group are into fishing, and that’s how they spend most of their time at the lake. Still others aren’t all that interested in even going out on the water at all — they plop down on the beach to read, soak up some rays, and even grab a snooze every afternoon.
A data lake is very much like that weeklong trip to your favorite lake. Because a data lake is an enterprise-scale effort, spanning numerous organizations and departments, as well as many different business functions, you and your coworkers will likely seek a variety of varying benefits and outcomes from all that hard work.
The best data lakes are those that satisfy the needs of a broad range of constituencies — basically, something for everyone to make the results well worth the effort.
Carpe Diem: Seizing the Day with Big Data
Maybe your organization has been dabbling in the world of big data for a while, going back to when Hadoop was one of the hottest new technologies. You’ve built some pretty nifty predictive analytics models, and now you’re fairly adept at discovering important patterns buried in mountains of data.
So far, though, your AAA — adventures in advanced analytics — have been highly fragmented. In fact, your analytical data is all over the place. You don’t have consistent approaches to cleansing and refining raw data to get the data ready for analytics; different groups do their own thing. It’s like the Wild West out there!
The concept of a data lake helps you harness the power of big data technology to the benefit of your entire organization. By following emerging best practices, avoiding traps and pitfalls, and building a solidly architected data lake, you can seize the day and help take your organization to new heights when it comes to analytics and data-driven insights.
You’ll achieve economies of scale for the data side of analytics throughout your organization, which means that you’ll get “more bang for your buck” when it comes to acquiring, consolidating, preparing, and storing your analytical data on behalf of your enterprise as a whole rather than repetitively doing so for numerous smaller groups.
Managing Equal Opportunity Data
Your data lake’s big data foundation presents you with an opportunity that, not too long ago, was out of reach for most organizations. You can store, manage, and analyze all three types of data — structured, unstructured, and semi-structured — within a single environment, and without having to jump through hoops to do so!
Many of the business questions you ask of your data will only require structured data. Suppose you work in the supply chain organization within your company. You’ll definitely want your data lake to provide insight into the following:
Who among your strategic suppliers has the best combination of on-time component production and also very low problem rates?
Which third-party logistics firms have the best — or worst — on-time shipping performance?
What’s the percentage of product spoilage among all internal and third-party warehouses during the past six months?
Other critical business analytics may involve unstructured or semi-structured data. You’ll want to know the following:
What percentage of tweets from your customers represent a positive sentiment about your product quality? Negative sentiment? What “hot spots” are showing up in blogs, tweets, and other social media posts, as well as YouTube videos, that can mean profitability and market share problems for you down the road?
Your reports show a dramatic increase in breakage in Warehouse #2. You have surveillance cameras in all your facilities. Is there anything that shows up on video that could indicate one or more root causes for this breakage that you can address through procedural changes?
Your data lake gives you one-stop shopping for structured, unstructured, and semi-structured data in a logically centralized, cohesive environment.
BACK TO THE FUTURE, PART 2
In the first edition of Data Warehousing For Dummies (Wiley), back in 1996, I included a chapter about the future directions of data warehousing. One of the forecasts I made was that the first-generation data warehousing of that time would eventually evolve into what I called “multimedia data warehousing” and would include not only structured data but also video and audio content. I made this prediction on the basis that “not all of the business questions we need to ask out of a data warehouse will come from numbers, dates, and character strings; sometimes we need information from images and other multimedia content as well.”
Guess what? You can think of a data lake as the modern incarnation of that “multimedia data warehouse” that I wrote about more than a quarter-century ago. It’s here!
Building Today’s — and Tomorrow’s — Enterprise Analytical Data Environment
Building an all-new analytical data environment around big data technology sounds like a great idea, right? You may be worried, though, that your organization can invest a ton of money over the next couple of years, only to find that your data lake is obsolete because of an entirely new generation of technology.
In other words, can your data lake be not just today’s but also tomorrow’s go-to platform for more and more analytical data and data-driven insights? Absolutely!
Constructing a bionic data environment
Maybe you’ve heard of a B-52. No, not a member of the American new wave music group (so don’t start singing “Love Shack”) but rather the U.S. Air Force plane.
The СКАЧАТЬ