Practical Data Analysis with JMP, Third Edition. Robert Carver
Чтение книги онлайн.

Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 24

Название: Practical Data Analysis with JMP, Third Edition

Автор: Robert Carver

Издательство: Ingram

Жанр: Программы

Серия:

isbn: 9781642956122

isbn:

СКАЧАТЬ results now look like Figure 4.4. On your screen, you need to scroll to see the results for San Francisco; in the figure below, we show only L.A. and Sacramento.

      Figure 4.5: Distribution of Phase of Flight BY City

Figure 1.1 Some JMP Help Options

      Figure 4.5 shows the results for two of the three airports. In general, the relative frequency of strikes is similar at both airports, though strikes during the Take-off run were more common in Sacramento than in Los Angeles, and just the opposite is true for strikes during the Landing Roll.

      Based on Figure 4.4, we might have had the impression that none of the airports experienced half of the strikes during approach, despite the fact that just over 50% of all strikes occur during approach. We now see that for both L.A. and Sacramento, more than half of the strikes do occur during approach.

      Why the difference between the two graphs? Missing observations. In L.A., we have Phase of Flight information about 821 strikes, but not for another 292 strikes. In other words, we have complete pairs of data for slightly less than three-quarters of all strikes (there were 821 + 292 = 1,113 strikes in L.A.). In contrast, the Phase of Flight was recorded for 91% of the strikes in Sacramento and 80% in San Francisco.

      Another common way to display covariation in categorical variables is a crosstabulation (also known as a two-way table, a crosstab, a joint-frequency table, or a contingency table). JMP provides two different platforms that create crosstabs, and we will look at one of them here.

      6. Select Analyze ► Fit Y by X. Select PHASE_OF_FLT 2 as Y, Response and Airport as X, Factor as shown in Figure 4.6.

      Figure 4.6: Fit Y by X Contextual Platform

Figure 1.1 Some JMP Help Options

      The Fit Y by X platform is contextual in the sense that JMP selects an appropriate bivariate analysis depending on the modeling types of the columns that you cast as Y and X. The four possibilities are shown in the lower left of the dialog box.

      Figures 4.7 and 4.8 show the results of this analysis. We find a mosaic plot—essentially a visual crosstab—and a contingency table. In the mosaic plot, the vertical axis represents phase of flight, and the horizontal represents airports. The width and height of the rectangles in the mosaic are determined by the number of events in each category, and colors indicate the different phases of flight.

      Figure 4.7: A Mosaic Plot of Wildlife Strikes at Three Airports

Figure 1.1 Some JMP Help Options

      In this graph, the width of the vertical bars makes it clear that Sacramento had the greatest number of reported strikes that included phase of flight information, and San Francisco had the fewest. Though the data table includes 10 distinct phases of flight, we see labels for only the seven most commonly appearing in the data. Finally, the height of boxes is comparatively consistent across cities, indicating that strikes tend to occur during the same phases of flight at the three airports, with some minor variations.

      The plot provides a clear visual impression, but if we want to dig into the specific numerical differences across regions, we turn to the Contingency Table (Figure 4.8). You might have heard tables like these called crosstabs or crosstabulations, or perhaps joint-frequency tables or two-way tables. All of these terms are synonymous.

      Figure 4.8: A Crosstabulation of the Bird Strike Data

Figure 1.1 Some JMP Help Options

      Across the top of this contingency table, we find the values of the PHASE_OF_FLT 2 column, and the rows correspond to the three airports. Each cell of the table contains four numbers representing the number of countries classified within the cell, as well as percentages computed for the full table, for the column, and for the row.

      For example, look at the highlighted cell of the table in Figure 4.8. The numbers and their meanings are as follows:

113113 reported incidents were strikes during the landing roll at San Francisco (SFO).
4.014.01% of all 2,818 reported incidents fall into this cell (landing roll at SFO).
23.5923.59% of the 479 landing roll events occurred at SFO.
16.7416.74% of the 675 events at SFO occurred during landing roll.

      We have just been treating phase of flight as a response variable, inquiring whether the prevalence of strikes during a particular phase varies depending on which airport was the site of the incident. With only minor differences, we found that strikes occur most often during the approach phase of the flight. We might wonder what it is about the approach phase that might account for the relatively high proportion of bird strikes—perhaps the altitude? The speed? With the data available to us here, we can begin to answer such questions.

      To illustrate, let’s examine the distribution of flight speeds during the different flight phases. Consider that the observational units in our data table are incidents involving wildlife strikes. We are not observing all flights, and we aren’t observing incident-free flights (which constitute the vast majority of air travel). Hence, we can only describe aspects of those flights that did strike birds.

      We have mentioned missing data before. This is another form of missing data, and it highlights another good habit of statistical thinking. For any set of data, it is wise to ask “what is not here?” The FAA Wildlife Strike database, by its nature, only contains data from flights that did have a bird strike. These flights are extraordinary in the context of all flights in the U.S.

      1. Clear all row states (Rows menu).

      2. Open the Graph Builder (Graph menu). Drag SPEED to the Y drop zone, and PHASE_OF_FLT 2 to the Group X drop zone at the top.

      What do you see in your graph? As we might expect, the columns of jittered points tend to rise and fall with the familiar phases of flight. You might also notice that each of the speed distributions is asymmetric; there tend to be concentrations of many points at a relatively high or low speed. To visualize this more distinctly, we could make box plots or histograms of Speed by Phase of Flight. Instead, let’s learn a new type of graph:

      3. In the menu bar at the top of Graph Builder, click the Contour icon (Figure 1.1 Some JMP Help Options). This will create a violin plot, so called because some of the resulting shapes (see Figure 4.9) look a bit like violins.

      Figure 4.9: Violin Plot of Aircraft Speeds by Phase of Flight

Figure 1.1 Some JMP Help Options

      A violin plot shows the range of values for a variable; for example, those strikes that occurred during take-off run had recorded speeds between 50 and 200 mph. The narrow portion of the violin indicates very few takeoff strikes below 75 mph. The bulges in the violin indicate higher frequencies. It appears that strikes during takeoff are relatively common in the vicinity of 120 to 140 mph. Looking across all phases, one might say that except during descent, strikes are relatively frequent at speeds between approximately 120 and 160 mph regardless of flight phase.

      For a more refined set of numerical summaries, do this:

      4. Select Analyze ► СКАЧАТЬ