Название: Data Science For Dummies
Автор: Lillian Pierson
Издательство: John Wiley & Sons Limited
Жанр: Базы данных
isbn: 9781119811619
isbn:
That said, to lead data science projects, you should know what’s involved in implementing them — you’ll lead a team of data implementers, after all. See Part 2 — it covers all the basics on data science implementation. You also need to know prominent data science use cases, which you can explore over in Part 3.
The data entrepreneur
The third data superhero archetype that has evolved over the past decade is the data entrepreneur. If you’re a data entrepreneur, your secret superpower is building up businesses by delivering exceptional data science services and products.
You have the same type of focus and drive as the data implementer, but you apply it toward bringing your business vision to reality. But, like the data leader, your love for data science is inspired mostly by the incredible outcomes that it makes possible. A data entrepreneur has many overlapping traits and a greater affinity for either the data implementer or the data leader, but with one important difference:
Data entrepreneurs crave the creative freedom that comes with being a founder.
Data entrepreneurs are more risk-tolerant than their data implementer or data leader counterparts. This risk tolerance and desire for freedom allows them to do what they do — which is to create a vision for a business and then use their data science expertise to guide the business to turn that vision into reality.
For more information on how to transform data science expertise into a profitable product or business, jump over to Part 3.
Using my own data science career to illustrate what this framework looks like in action, (as mentioned earlier in this chapter) I started off as a data science implementer, and quickly turned into a data entrepreneur. Within my data business, however, my focus has been on data science training services, data strategy services, and mentoring data entrepreneurs to build world-class businesses. I’ve helped educate more than a million data professionals on data science and helped grow existing data science communities to more than 650,000 data professionals — and counting. Stepping back, you could say that although I call myself a data entrepreneur, the work I do has a higher degree of affinity to data leadership than data implementation.
I encourage you to go to the companion site to this book at
https://businessgrowth.ai/
and take that career path quiz I mention earlier in this section. The quiz can give you a head-start in determining where you best fit within the spectrum of data science superhero archetypes.
Chapter 2
Tapping into Critical Aspects of Data Engineering
IN THIS CHAPTER
Unraveling the big data story
Looking at important data sources
Differentiating data science from data engineering
Storing data on-premise or in a cloud
Exploring other data engineering solutions
Though data and artificial intelligence (AI) are extremely interesting topics in the eyes of the public, most laypeople aren’t aware of what data really is or how it’s used to improve people’s lives. This chapter tells the full story about big data, explains where big data comes from and how it’s used, and then outlines the roles that machine learning engineers, data engineers, and data scientists play in the modern data ecosystem. In this chapter, I introduce the fundamental concepts related to storing and processing data for data science so that this information can serve as the basis for laying out your plans for leveraging data science to improve business performance.
Defining Big Data and the Three Vs
I am reluctant to even mention big data in this, the third, edition of Data Science For Dummies. Back about a decade ago, the industry hype was huge over what people called big data — a term that characterizes data that exceeds the processing capacity of conventional database systems because it’s too big, it moves too fast, or it lacks the structural requirements of traditional database architectures.
My reluctance stems from a tragedy I watched unfold across the second decade of the 21st century. Back then, the term big data was so overhyped across industry that countless business leaders made misguided impulse purchases. The narrative in those days went something like this: “If you’re not using big data to develop a competitive advantage for your business, the future of your company is in great peril. And, in order to use big data, you need to have big data storage and processing capabilities that are available only if you invest in a Hadoop cluster.”
Hadoop is a data processing platform that is designed to boil down big data into smaller datasets that are more manageable for data scientists to analyze. For reasons you’re about to see, Hadoop’s popularity has been in steady decline since 2015.
Despite its significant drawbacks, Hadoop is, and was, powerful at satisfying one requirement: batch-processing and storing large volumes of data. That's great if your situation requires precisely this type of capability, but the fact is that technology is never a one-size-fits-all sort of thing. If I learned anything from the years I spent building technical and strategic engineering plans for government institutions, it’s this: Before investing in any sort of technology solution, you must always assess the current state of your organization, select an optimal use case, and thoroughly evaluate competing alternatives, all before even considering whether a purchase should be made. This process is so vital to the success of data science initiatives that I cover it extensively in Part 4.
Unfortunately, in almost all cases back then, business leaders bought into Hadoop before having evaluated whether it was an appropriate choice. Vendors sold Hadoop and made lots of money. Most of those projects failed. Most Hadoop vendors went out of business. Corporations got burned on investing in data projects, and the data industry got a bad rap. For any data professional who was working in the field between 2012 and 2015, the term big data represents СКАЧАТЬ