Название: Smarter Data Science
Автор: Cole Stryker
Издательство: John Wiley & Sons Limited
Жанр: Базы данных
isbn: 9781119693420
isbn:
When: A point in time, potentially associated with triggers that are fired or signals that are raised
Why: A goal or subgoal revealing motivation
NOTE
Zachman's article “A Framework for Information Systems Architecture” can be found at ieeexplore.ieee.org/document/5387671. “Extending and Formalizing the Framework for Information Systems Architecture” is available at ieeexplore.ieee.org/document/5387433.
By using Zachman's basic concepts of the six interrogatives, an organization can begin to understand or express how much the organization knows about something in order to infer a degree of trust and to help foster data-driven processes.
If a person or a machine had access to a piece of information or an outcome from an AI model, the person or machine could begin a line of inquiry to determine trust. For example, if the person or machine is given a score (representing the interrogative what), can they then ask, “How was this information produced? Where was this information produced? Who produced this information? When was this information produced? Is this information appropriate to meet my needs (why)?”
The Trust Matrix
To help visually grasp how the holistic nature of the six interrogatives can assist in trust and becoming data-driven, the interrogatives can be mapped to a trust matrix (shown in Figure 2-1) as the x-axis. The y-axis reflects the time horizons: past, present, and future.
Figure 2-1: Trust matrix
The past represents something that has occurred. The past is a history and can inform as to what happened, what was built, what was bought, what was collected (in terms of money), and so on. The present is about the now and can inform us as to things that are underway or in motion. The present addresses what is happening, what is being built, who is buying, etc. The future is about things to be. We can prepare for the future by planning or forecasting. We can budget, and we can predict.
Revealing the past can yield hindsight, present insight, and future foresight. The spectrum across the time horizons provides the viewpoints for what happened, is happening, and could/will happen. While the divisions are straightforward, the concept of the present can actually span the past and the present. Consider, “this year.” This year is part of the present, but the days gone are also part of the past, and the days to come are also part of the future. Normally, the context of inquiry can help to remove any untoward temporal complications.
At each x-y intersection lies what the organization can reasonably know. What is knowable has two dimensions, as shown in Figure 2-2. The two dimensions are breadth and depth. The breadth is a reflection of scope and represents a means to understand how much is known about a given topic. For example, some organizations may have a retention policy that requires information to be expunged after a given number of years—for example, seven years. In this example, the breadth of information an organization has access to is constrained to the most recent seven years.
Figure 2-2: Breadth and depth slivers
Conversely, depth is a reflection of detail. The topic of ethnography is addressed here. For example, a person may purchase a product, and if that product is gifted to some other person, the organization may not have any indication as to the actual consumer of the product, representing a lack of depth.
Breadth and depth can be approximated in terms of percentages and mapped to an intersection. Figure 2-2 shows an example where a breadth sliver is shown to be approximately 75%, and a depth sliver is approximately 25%. The third box combines the breadth and depth slivers together.
In Figure 2-3, the quality of the information is graded against the breadth and depth. The diamond grid pattern indicates that the data quality is known to be poor. The diagonal stripes pattern indicates that the data quality is moderate, which means that the information in specific conditions may prove to be unreliable, while the square grid pattern is used to indicate that the information is of high quality and is reliable.
Figure 2-3: Grading
Therefore, even if the breadth and depth are not both 100%, the available data can be graded in the context of the information that is at hand.
Across the overall trust matrix, if the information for a particular need could be measured in terms of breadth and depth for each aspect across each time horizon and then graded, a person or a machine could evaluate an aspect of risk in terms of consuming information. Being able to accurately quantify a risk in terms of how much is known and at what level of detail, an organization can pursue being data-driven with confidence, knowing that all subsequent actions or decisions are being made on the merit of the data.
Furthermore, a data-driven organization using its data as critical evidence to help inform and influence on strategy will need a means to weigh options against any inherent risk. A data-driven organization must develop an evidence-based culture in which data can be evaluated against a means to establish trust and that the analytics and AI performed against the data is deemed to be highly relevant, informative, and useful in determining next steps.
The Importance of Metrics and Human Insight
For organizations that make gut-feel decisions and are apprehensive about pursuing data-driven means, the ability to measure is vital. The ends and means model shown in Chapter 1, “Climbing the AI Ladder,” Figure 1-4 illustrates the necessity to balance what needs to be measured with something that can produce something that is measurable and is aligned to what ultimately needs to be measured.
The use of AI requires an organization to become data-driven, especially when a person is in a decision-making loop. Machine-to-machine communication fosters the ability of a machine to act independently to make decisions based purely on the information at hand. Orchestrating a person into a communication flow allows for decision augmentation and to act as a gatekeeper.
In the 1960s, the euphemism Mad Men was supposedly created by those working in the field of advertising, where—in the United States—the industry was heavily centered around Madison Avenue in New York City (the men of Madison Avenue). Mad Men created messages for the masses. Whether messages were regionally appropriate or whether an advertisement resonated exceptionally well with the discrete needs of each singular individual was not the core focus. Eventually, the gut feel of the Mad Men approach gave way to the focus group–oriented view of the Media Men. In turn, the Media Men have given way to the Math Men. The Math Men are the men and women of data science whose provinces are the hordes of big data and thick data, algorithms, and machine learning that derive insight from data. As the new-collar worker expands into all aspects of corporate work from using model-based outcomes, each decision is going to be based on data. New-collar workers are data-driven and so are their decisions.
THE СКАЧАТЬ