Название: Smarter Data Science
Автор: Cole Stryker
Издательство: John Wiley & Sons Limited
Жанр: Базы данных
isbn: 9781119693420
isbn:
You can connect with Cole on LinkedIn at linkedin.com/in/colestryker.
Acknowledgments
I want to express my sincere gratitude to Jim Minatel at John Wiley & Sons for giving me this opportunity. I would also like to sincerely thank my editor, Tom Dinse, for his attention to detail and for his excellent suggestions in helping to improve this book. I am very appreciative of the input provided by Tarik El-Masri, Alex Baryudin, and Elis Gitin. I would also like to thank Matt Holt, Devon Lewis, Pete Gaughan, Kenyon Brown, Kathleen Wisor, Barath Kumar Rajasekaran, Steven Stansel, Josephine Schweiloch, and Betsy Schaefer.
During my career, there have been several notable giants with whom I have worked and upon whose shoulders I clearly stand. Without these people, my career would not have taken the right turns: John Zachman, Warren Selkow, Ronald Ross, David Hay, and the late John Hall. I would like to recognize the renowned Grady Booch for his graciousness and kindness to contribute the Foreword. Finally, I would like to acknowledge the efforts of Cole Stryker for helping take this book to the next level.
Neal Fishman
Thanks to Jim Minatel, Tom Dinse, and the rest of the team at Wiley for recognizing the need for this book and for enhancing its value with their editorial guidance. I'd also like to thank Elizabeth Schaefer for introducing me to Neal and giving me the opportunity to work with him. Thanks also to Jason Oberholtzer and the folks at Gather for enabling my work at IBM. Lastly, I'm grateful to Neal Fishman for sharing his vision and inviting me to contribute to this important book.
Cole Stryker
Foreword for Smarter Data Science
There have been remarkable advances in artificial intelligence the past decade, owing to a perfect storm at the confluence of three important forces: the rise of big data, the exponential growth of computational power, and the discovery of key algorithms for deep learning. IBM's Deep Blue beat the world's best chess player, Watson bested every human on Jeopardy, and DeepMind's AlphaGo and AlphaZero have dominated the field of Go and videogames. On the one hand, these advances have proven useful in commerce and in science: AI has found an important role in manufacturing, banking, and medicine, to name a few domains. On the other hand, these advances raise some difficult questions, especially with regard to privacy and the conduct of war.
While discoveries in the science of artificial intelligence continue, the fruits of that science are now being put to work in the enterprise in very tangible ways, ways that are not only economically interesting but that also contribute to the human condition. As such, enterprises that want to leverage AI must turn their focus to engineering pragmatic systems of value that contain cognitive components.
That's where Smarter Data Science comes in.
As the authors explain, data is not an afterthought in building such systems; it is a forethought. To leverage AI for predicting, automating, and optimizing enterprise outcomes, the science of data must be made an intentional, measurable, repeatable, and agile part of the development pipeline. Here, you'll learn about best practices for collecting, organizing, analyzing, and infusing data in ways that make AI real for the enterprise. What I celebrate most about this book is that not only are the authors able to explain these best practices from a foundation of deep experience, they do so in a manner that is actionable. Their emphasis on results-driven methodology that is agile yet enables a strong architectural framework is refreshing.
I'm not a data scientist; I'm a systems engineer, and increasingly I find myself working with data scientists. Believe me, this is a book that has taught me many things. I think you'll find it quite informative as well.
Grady Booch
ACM, IEEE, and IBM Fellow
Epigraph
“There is no AI without IA.”
Seth Earley
IT Professional, vol. 18, no. 03, 2016.
(info.earley.com/hubfs/EIS_Assets/ITPro-Reprint-No-AI-without-IA.pdf)
In 2016, IT consultant and CEO Seth Earley wrote an article titled “There is no AI without IA” in an IEEE magazine called IT Professional. Earley put forth an argument that enterprises seeking to fully capitalize on the capabilities of artificial intelligence must first build out a supporting information architecture. Smarter Data Science provides a comprehensive response: an IA for AI.
Preamble
“What I'm trying to do is deliver results.”
Lou Gerstner
Business Week
Why You Need This Book
“No one would have believed in the last years of the nineteenth century that this world was being watched keenly and closely…”
So begins H. G. Wells' The War of the Worlds, 1898, Harper&Brothers. In the last years of the 20th century, such disbelief also prevailed. But unlike the fictional watchers from the 19th century, the late-20th century watchers were real, pioneering digitally enabled corporations. In The War of the Worlds, simple bacteria proved to be a defining weapon for both offense and defense. Today, the ultimate weapon is data. When misusing data, a corporate entity can implode. When data is used appropriately, a corporate entity can thrive.
Ever since the establishment of hieroglyphs and alphabets, data has been useful. The term business intelligence (BI) can be traced as far back as 1865 (ia601409.us.archive.org/25/items/cyclopaediacomm00devegoog). However, it wasn't until Herman Hollerith, whose company would eventually become known as International Business Machines, developed the punched card that data could be harvested at scale. Hollerith initially developed his punched card–processing technology for the 1890 U.S. government census. Later in 1937, the U.S. government contracted IBM to use its punched card–reading machines for a new, massive bookkeeping project that involved 26 million Social Security numbers.
In 1965, the U.S. government built its first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic computer tape. With the advent of the Internet, and later mobile devices and IoT, it became possible for private companies to truly use data at scale, building massive stores of consumer data based on the growing number of touchpoints they now shared with their customers. Taken as an average, data is created at a rate of more than 1.7MB every second for every person (www.domo.com/solution/data-never-sleeps-6). That equates to approximately 154,000,000,000,000 punched cards. By coupling the volume of data with the capacity to meaningfully process that data, data can be used at scale for much more than simple record keeping.
Clearly, СКАЧАТЬ