Practical Data Analysis with JMP, Third Edition. Robert Carver
Чтение книги онлайн.

Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 7

Название: Practical Data Analysis with JMP, Third Edition

Автор: Robert Carver

Издательство: Ingram

Жанр: Программы

Серия:

isbn: 9781642956122

isbn:

СКАЧАТЬ Goals of Data Analysis: Description and Inference

       Types of Data

       Starting JMP

       A Simple Data Table

       Graph Builder: An Interactive Tool to Explore Data

       Using an Analysis Platform

       Row States

       Exporting and Sharing JMP Reports

       Saving and Reproducing Your Work

       Leaving JMP

      Statistical analysis and visualization of data have become an important foundation of decision making and critical thinking. Professionals in numerous walks of life—from medicine to government, from science to sports, from commerce to public health—all rely on the analysis of data to inform their work. In this first chapter, we take our first steps into the important and rapidly growing practice of data analysis.

      The central goal of this book is to help you build your capacity as a statistical thinker through progressive experience with the techniques and approaches of data analysis, specifically by using the features of JMP. As such, before using JMP, we will begin with some remarks about activities that require data analysis.

      People gather and analyze data for many different reasons. Engineers test materials or new designs to determine their utility or safety. Coaches and owners of professional sports teams track their players’ performance in different situations to structure rosters and negotiate salary offers. Chemists and medical researchers conduct clinical trials to investigate the safety and efficacy of new treatments. Demographers describe the characteristics of populations and market segments. Investment analysts study recent market data to fine-tune investment portfolios. Increasingly, “smart” devices continuously generate high volumes of data touching on varying topics. All of the individuals who are engaged in these activities have consequential, pressing needs for information, and they turn to the techniques of statistics to meet those needs.

      There are two basic types of statistical analysis: description and inference. We perform descriptive analysis in order to summarize or describe an ongoing process or the current state of a population—a group of individuals or items that is of interest to us. Sometimes we can collect data from every individual in a population (every professional athlete in a sport, or every firm in which we currently own stock), but more often we are dealing with a subset of a population—with a sample from the population. A sample is simply a subset. When we study ongoing processes, we nearly always deal with samples.

      If a company reviews the records of all its client firms to summarize last month’s sales to all customers, the summary will describe the population of customers. If the same company wants to use that summary information to make a forecast of sales for next month, the company needs to engage in inference. When we use available data to make a conclusion about something that we cannot observe, or about something that has not happened yet, we are drawing an inference. As we will come to understand, inferential thinking requires risk-taking. Learning to measure and minimize the risks involved in inference is a central part of the study of statistics.

      The practice of statistical analysis requires data—when we “do” analysis, we are analyzing data. It’s important to understand that analysis is just one phase in a statistical study. Later in this chapter, we will look at some data collected and reported by the World Population Division of the United Nations. Specifically, we will analyze the estimated life expectancy at birth for nations around the world in 2017. This set of data is a portion of a considerably larger collection spanning many years and assembled by numerous national and international agencies.

      In this example, we have five variables that are represented as five columns within a data table. A variable is an attribute that we can count, measure, or record. The variables in this example are a 3-letter code, country name, region, year, and life expectancy. Typically, we will capture multiple observations of each variable—whether we are taking repeated measurements of stock prices or recording facts from numerous respondents in a survey or individual countries around the globe. Each observation (often called a case or subject in survey data) occupies a row in a data table. In this example, the observational units are countries.

      Whenever we analyze a data set in JMP, we will work with a data table. The columns of the table contain different variables, and the rows of the table contain observations of each variable. In your statistics course, you will probably use the terms data set, variable, and observation (or case). In JMP, we more commonly speak of data tables, columns, and rows.

      Throughout this book, we will work with data organized into tables. The columns of the tables contain variables (for example, year, name, price) and the rows of the tables represent the individual items in the sample.

      One of the organizing principles that you will notice in JMP is the differentiation among data types and modeling types. Most columns that you will work with in this book are all either numeric or character data types, much like data in a spreadsheet are numeric or labels. JMP has two other major data types—Row States and Expressions—to be discussed later.

      In your statistics course, you might be learning about the distinctions among different types of quantitative and qualitative (or categorical) data. Before we analyze any data, we will want to understand clearly whether a column is quantitative or categorical. JMP helps us keep these distinctions straight by using different modeling types. In the first several chapters, we will work with three modeling types:

      ● Continuous columns are inherently quantitative. They are numeric so that you can meaningfully compute sums, averages, and so on. Continuous variables can assume an infinite number of values. Most measurements and financial figures are continuous data. Estimated average life expectancies (in years) are continuous.

      ● Ordinal columns reflect attributes that are sequential in nature or have some implicit ordering (for example, small, medium, large). Ordinal columns can be either numeric or character data.

      ● Nominal columns simply identify individuals or groups within the data. For example, if we are analyzing health data from different countries, we might want to label the nations and/or compare figures by continent. With our Life Expectancy 2017 data, both the names of countries and their continental regions are nominal columns. Nominal variables can also be numeric or character data. Names are nominal, as are postal codes or telephone numbers.

      As we will soon see, understanding the differences among these modeling types clarifies how JMP treats our data and presents us with choices.