Название: Smarter Data Science
Автор: Cole Stryker
Издательство: John Wiley & Sons Limited
Жанр: Базы данных
isbn: 9781119693420
isbn:
The six interrogatives—what, how, where, who, when, why—provide a methodical means toward inquiry. However, the use of the interrogatives in the Zachman Framework provide for a structural device in the framework. Because the Zachman Framework is structural in nature and is not a methodology, the framework is actually an ontology for describing the enterprise.
The Zachman Framework is not a methodology because the framework is not prescriptive or anchored on a process. The framework is concerned about creating, operating, or changing essential components that are of interest to an enterprise. The components can be big or small and include the enterprise itself, a department, a cloud, an app, a container, a schema, and an AI model.
Democratizing Data and Data Science
Despite the explosion of interest in data collection and storage, many organizations will intentionally relegate data science knowledge to a discrete, small number of employees. While organizations must foster areas of specialization, the need to designate the data scientist label to a small cohort of employees seems to stem from a misguided belief that AI is somehow magic.
In the long-term, neither data science nor AI should be the sole purview of the data scientist. The democratization of data science involves opening up the fundamentals of data science to a broader set of employees, paving the way for the establishment of new roles, including the citizen data scientist.
For example, a citizen data scientist would “create or generate [AI] models that use advanced diagnostic analytics or predictive and prescriptive capabilities, and whose primary job function is outside the field of statistics and analytics” (www.gartner.com/en/newsroom/press-releases/2017-01-16-gartner-says-more-than-40-percent-of-data-science-tasks-will-be-automated-by-2020). Citizen data scientists would extend the type of analytics that can be associated with self-service paradigms that are offered by organizations.
A citizen data scientist is still able to make use of advanced analytics without having all of the skills that characterize the conventional data scientist. Skills associated with a conventional data scientist would include proficiency in a programming language, such as Python or R, and applied knowledge of advanced-level math and statistics. By contrast, a citizen data scientist may possess intrinsic and expert domain knowledge that the data scientist does not possess. When a data scientist does additionally possess domain knowledge, they are jokingly referred to as a unicorn.
Attempting to relegate the handling of data associated with AI to a small specialized team of people within a company can be fraught with challenges. Some data scientists may find it wearisome to communicate insight and nuance to other employees who lack specific data literacy skills, such as the ability to read and work with digitized data. Business stakeholders can become frustrated because data requests are not addressed quickly and may appear to fail at addressing their questions.
Many software tools that have been designed for use by the data scientist community end up residing solely within each data science team. But while logical, creating a silo of data software tools and restricting tool access to a small team (such as a team of data scientists) can create its own dilemma. All departments across an organization can generate analytical needs. Each need can span a spectrum of complexity: from ultra-simple to insanely tricky. But realistically, not every requirement is going to be anchored on the insanely tricky end of the analytical need spectrum. Many needs may be solvable or addressable by someone with basic analytical training. By instituting the citizen data scientist, organizations can better tailor the initiatives suited for the deep expertise of the data scientist.
Democratizing data science empowers as many people as possible to make data-driven decisions. Empowering begins with education and is sustained through continual education. If AI is to impact 100% of all future jobs, education on AI and data literacy (data literacy is addressed in Chapter 7, “Maximizing the Use of Your Data: Being Value Driven;” statistical literacy is covered in Chapter 6, “Addressing Operational Disciplines on the AI Ladder”) should be viewed as a requisite offering that begins in grade school and must become part of the new-collar worker's continual learning within the workplace.
Building the organization's collective skills must encompass education in the use of collaborative software tools and socially oriented communication tools. Through being connected, employees can see who needs help and who can provide help, what problems need to be addressed, and how problems have been resolved. In democratizing data, organizations should notice that speed and value are moving in a positive direction, because sharing skills and knowledge can improve mutual understanding and business performance.
The impact of democratizing data and AI will circle back to refine existing job roles and responsibilities. Data scientists and citizen data scientists alike should be able to access and understand the curated datasets that are most relevant to support their own job functions. In building a workforce that is enabled to be data-driven through democratization, a new-collar workforce emerges. Organizations are faced with an unknown unknown in that this poses a new way to work but for which optimal organization structures have not been established. Change is upon the organization, but how that change manifests is not self-evident ahead of time. The new organizational structure of the enterprise is going to require frequent tuning.
Whether data science is applied by the data scientist or the citizen data scientist, sufficient oversight is necessary to ensure outcomes are not biased against the objectives of the organization. By empowering employees with essential skills, organizations can expand upon the opportunity to innovate and to find the next point of leverage. Sufficient oversight is also a concept that is distinct from sufficient insight. Sufficient insight would help to explain or articulate the what, how, where, who, when, and why of a singular outcome, whereas sufficient oversight would be the means to address causality across a series of outcomes.
Figure 2-4 shows the entwinement between the democratization of data and AI with data literacy and the ability to self-serve. The intersections should promote organizational collaboration, empowerment, and enablement of individuals and teams. The overall result is outcome-based in that the time-to-value proposition realized by an organization should be progressive and ultimately fair to all constituents.
Figure 2-4: Data and AI democratization
DEMOCRATIZATION
Four critical elements for enabling data democratization are that the data for which a person or a machine is entitled to see should be:
Easy to find
Understandable
Consumable
Of sufficient quality
For the most part, being easy to find means that you'll need to unilaterally catalog (or inventory) all of the data that exists within the enterprise and all of the applicable data that exists outside of the enterprise. The other elements are potentially nonunilateral in that understandability, consumability, and data quality are contextual and may vary for different people or different machines. For example, the names Kneel Fischman and Coal Striker may be of insufficient quality for the payroll department but be of sufficient quality for the internal fraud department.
СКАЧАТЬ