Data Science For Dummies. Lillian Pierson
Чтение книги онлайн.

Читать онлайн книгу Data Science For Dummies - Lillian Pierson страница 14

Название: Data Science For Dummies

Автор: Lillian Pierson

Издательство: John Wiley & Sons Limited

Жанр: Базы данных

Серия:

isbn: 9781119811619

isbn:

СКАЧАТЬ target="_blank" rel="nofollow" href="#ucfd70c26-4f82-5be9-9e17-ccbe1a4d3699">Chapters 7 through 9 and Chapters 15 through 17 in this book are dedicated to showing you the basics of the data science leadership-and-strategy skills you need in order to nail down a job as a data science leader.

      That said, to lead data science projects, you should know what’s involved in implementing them — you’ll lead a team of data implementers, after all. See Part 2 — it covers all the basics on data science implementation. You also need to know prominent data science use cases, which you can explore over in Part 3.

      The data entrepreneur

      The third data superhero archetype that has evolved over the past decade is the data entrepreneur. If you’re a data entrepreneur, your secret superpower is building up businesses by delivering exceptional data science services and products.

      You have the same type of focus and drive as the data implementer, but you apply it toward bringing your business vision to reality. But, like the data leader, your love for data science is inspired mostly by the incredible outcomes that it makes possible. A data entrepreneur has many overlapping traits and a greater affinity for either the data implementer or the data leader, but with one important difference:

      Data entrepreneurs crave the creative freedom that comes with being a founder.

      Data entrepreneurs are more risk-tolerant than their data implementer or data leader counterparts. This risk tolerance and desire for freedom allows them to do what they do — which is to create a vision for a business and then use their data science expertise to guide the business to turn that vision into reality.

      For more information on how to transform data science expertise into a profitable product or business, jump over to Part 3.

      

I encourage you to go to the companion site to this book at https://businessgrowth.ai/ and take that career path quiz I mention earlier in this section. The quiz can give you a head-start in determining where you best fit within the spectrum of data science superhero archetypes.

      Tapping into Critical Aspects of Data Engineering

      IN THIS CHAPTER

      

Unraveling the big data story

      

Looking at important data sources

      

Differentiating data science from data engineering

      

Storing data on-premise or in a cloud

      

Exploring other data engineering solutions

      Though data and artificial intelligence (AI) are extremely interesting topics in the eyes of the public, most laypeople aren’t aware of what data really is or how it’s used to improve people’s lives. This chapter tells the full story about big data, explains where big data comes from and how it’s used, and then outlines the roles that machine learning engineers, data engineers, and data scientists play in the modern data ecosystem. In this chapter, I introduce the fundamental concepts related to storing and processing data for data science so that this information can serve as the basis for laying out your plans for leveraging data science to improve business performance.

      My reluctance stems from a tragedy I watched unfold across the second decade of the 21st century. Back then, the term big data was so overhyped across industry that countless business leaders made misguided impulse purchases. The narrative in those days went something like this: “If you’re not using big data to develop a competitive advantage for your business, the future of your company is in great peril. And, in order to use big data, you need to have big data storage and processing capabilities that are available only if you invest in a Hadoop cluster.”

      

Hadoop is a data processing platform that is designed to boil down big data into smaller datasets that are more manageable for data scientists to analyze. For reasons you’re about to see, Hadoop’s popularity has been in steady decline since 2015.

      Despite its significant drawbacks, Hadoop is, and was, powerful at satisfying one requirement: batch-processing and storing large volumes of data. That's great if your situation requires precisely this type of capability, but the fact is that technology is never a one-size-fits-all sort of thing. If I learned anything from the years I spent building technical and strategic engineering plans for government institutions, it’s this: Before investing in any sort of technology solution, you must always assess the current state of your organization, select an optimal use case, and thoroughly evaluate competing alternatives, all before even considering whether a purchase should be made. This process is so vital to the success of data science initiatives that I cover it extensively in Part 4.

      Unfortunately, in almost all cases back then, business leaders bought into Hadoop before having evaluated whether it was an appropriate choice. Vendors sold Hadoop and made lots of money. Most of those projects failed. Most Hadoop vendors went out of business. Corporations got burned on investing in data projects, and the data industry got a bad rap. For any data professional who was working in the field between 2012 and 2015, the term big data represents СКАЧАТЬ