Smarter Data Science. Cole Stryker
Чтение книги онлайн.

Читать онлайн книгу Smarter Data Science - Cole Stryker страница 11

Название: Smarter Data Science

Автор: Cole Stryker

Издательство: John Wiley & Sons Limited

Жанр: Базы данных

Серия:

isbn: 9781119693420

isbn:

СКАЧАТЬ should be simplified and made available regardless of the form of the data and where it resides. Since the data used with advanced analytics and AI can be dynamic and fluid, not all data can be managed in a physical central location. With the ever-expanding number of data sources, virtualizing how data is collected is one of the critical activities that must be considered in an information architecture.

      These are key themes included in the Collect rung:

       Collecting data with a common SQL engine, the use of APIs for NoSQL access, and support for data virtualization across a broad ecosystem of data that can be referred to as a data estate

       Deploying data warehouses, data lakes, and other analytical-based repositories with always-on resiliency and scalability

       Scaling with real-time data ingestion and advanced analytics simultaneously

       Storing or extracting all types of business data whether structured, semistructured, or unstructured

       Optimizing collections with AI that may include graph databases, Python, machine learning SQL, and confidence-based queries

       Tapping into open source data stores that may include technologies such as MongoDB, Cloudera, PostgreSQL, Cloudant, or Parquet

      The Organize rung infers that there is a need to create a trusted data foundation. The trusted data foundation must, at a minimum, catalog what is knowable to your organization. All forms of analytics are highly dependent upon digital assets. What assets are digitized forms the basis for what an organization can reasonably know: the corpus of the business is the basis for the organizational universe of discourse—the totality of what is knowable through digitized assets.

      Having data that is business-ready for analytics is foundational to the data being business-ready for AI, but simply having access to data does not infer that the data is prepared for AI use cases. Bad data can paralyze AI and misguide any process that consumes output from an AI model. To organize, organizations must develop the disciplines to integrate, cleanse, curate, secure, catalog, and govern the full lifecycle of their data.

       Cleansing, integrating, and cataloging all types of data, regardless of where the data originates

       Automating virtual data pipelines that can support and provide for self-service analytics

       Ensuring data governance and data lineage for the data, even across multiple clouds

       Deploying self-service data lakes with persona-based experiences that provide for personalization

       Gaining a 360-degree view by combing business-ready views from multicloud repositories of data

       Streamlining data privacy, data policy, and compliance controls

      The Analyze rung incorporates essential business and planning analytics capabilities that are key for achieving sustained success with AI. The Analyze rung further encapsulates the capabilities needed to build, deploy, and manage AI models within an integrated organizational technology portfolio.

      These are key themes included in the Analyze rung:

       Preparing data for use with AI models; building, running, and managing AI models within a unified experience

       Lowering the required skill levels to build an AI model with automated AI generation

       Applying predictive, prescriptive, and statistical analysis

       Allowing users to choose their own open source frameworks to develop AI models

       Continuously evolving models based upon accuracy analytics and quality controls

       Detecting bias and ensuring linear decision explanations and adhering to compliance

      Infuse is a discipline involving the integration of AI into a meaningful business function. While many organizations are able to create useful AI models, they are rapidly forced to address operational challenges to achieve sustained and viable business value. The Infuse rung of the ladder highlights the disciplines that must be mastered to achieve trust and transparency in model-recommended decisions, explain decisions, detect untoward bias or ensure fairness, and provide a sufficient data trail for auditing. The Infuse rung seeks to operationalize AI use cases by addressing a time-to-value continuum.

       Improving the time to value with prebuilt AI applications for common use cases such as customer service and financial planning or bespoke AI applications for specialized use cases such as transportation logistics

       Optimizing knowledge work and business processes

       Employing AI-assisted business intelligence and data visualization

       Automating planning, budgeting, and forecasting analytics

       Customizing with industry-aligned AI-driven frameworks

       Innovating with new business models that are intelligently powered through the use of AI

      Once each rung is mastered to the degree that new efforts are repeating prior patterns and that the efforts are not considered bespoke or deemed to require heroic efforts, the organization can earnestly act on its efforts toward a future state. The pinnacle of the ladder, the journey to AI, is to constantly modernize: to essentially reinvent oneself at will. The Modernize rung is simply an attained future state of being. But once reached, this state becomes the organizational current state. Upon reaching the pinnacle, dynamic organizations will begin the ladder's journey anew. This cycle is depicted in Figures 1-2 and 1-3.

Schematic illustration of the ladder representing the part of a repetitive climb to continual improvement and adaptation.

      Figure 1-2: The ladder is part of a repetitive climb to continual improvement and adaptation.

Schematic illustration of the Modernize rung representing the cycle of current state to future state and then to a new current state.

      Figure 1-3: Current state ⇦ future state ⇦ current state

       Deploying a multicloud information architecture for AI

       Leveraging a uniform platform of choice across any private or public cloud

       Virtualizing data as a means of collecting data regardless of where the data is sourced

       Using DataOps and MLOps to establish trusted virtual data pipelines for self-service

       Using unified data and AI cloud services that are open and easily extensible

       Scaling dynamically and in real time to accommodate changing needs

      Modernize СКАЧАТЬ