Название: SAS Statistics by Example
Автор: Ron Cody, EdD
Издательство: Ingram
Жанр: Программы
isbn: 9781612900124
isbn:
Notice that some of the observations contain missing values, represented by periods for numeric values and blanks for character values
Computing Descriptive Statistics Using PROC MEANS
One way to compute means and standard deviations is to use PROC MEANS. Here is a program to compute some basic descriptive statistics on the two variables SBP and DBP:
Program 2.1: Generating Descriptive Statistics with PROC MEANS
libname example ’c:\books\statistics by example’; title “Descriptive Statistics for SBP and DBP”; proc means data=example.Blood_Pressure n nmiss mean std median maxdec=3; var SBP DBP; run; |
Because the Blood_Pressure data set is a permanent SAS data set (when it was created, it was placed in a folder on a disk drive instead of in a temporary SAS folder that disappears when you end your SAS session), you need a LIBNAME statement to tell SAS where to find the data set. In this example, the data set is located in the c:\books\statistics by example folder. Remember that SAS data set names contain two parts: the part before the period is a library reference (libref for short) that tells SAS where to find the data set, and the part after the period is the actual data set name (in this case, Blood_Pressure). If you were to use your operating system to list the contents of the c:\books\statistics by example folder, you would see a file called:
Blood_Pressure.sas7bdat
This file is the actual SAS data set and contains both the descriptor portion and the individual observations. The extension sas7bdat indicates that the data set is compatible with SAS 7 and later. This file is not a text file, and you cannot view it using a word processor or other Windows programs.
The TITLE statement causes SAS to print the title at the top of every page of output until you change the title or turn off all titles. In this program, the title is placed in double quotes. You can also use single quotation marks (as long as there are no single quotation marks in the title) or, for that matter, no quotation marks at all (SAS is smart enough to realize that the text following a TITLE statement is the title text).
PROC MEANS is a popular SAS procedure that produces a number of useful statistics. In this program, the keyword DATA= tells the procedure that you want to produce descriptive statistics on the Blood_Pressure data set.
You can control what statistics this procedure produces by using procedure options. These options are placed between the procedure name and the semicolon ending the statement, and you can place them in any order. If you omit these options, PROC MEANS will, by default, print the number of nonmissing observations, the mean, standard deviation, the minimum value, and the maximum value.
The first two options in this program, N and NMISS, cause the number of nonmissing and missing values for each variable to be reported. The next three options, MEAN, STD, and MEDIAN, request the mean, standard deviation, and the median to be computed. The last option, MAXDEC=n, specifies how many digits to the right of the decimal point you want in your report. In this program, you are requesting that all the statistics be reported to three decimal places.
The following list describes some of the more useful options:
Option | Description |
n | Number of nonmissing observations |
nmiss | Number of observations with missing values |
mean | Arithmetic mean |
std | Standard deviation |
stderr | Standard error |
min | Minimum value |
max | Maximum value |
median | Median |
maxdec= | Maximum number of decimal places to display |
clm | 95% confidence limit on the mean |
cv | Coefficient of variation |
The VAR statement tells the procedure which variables you want to analyze. If you omit a VAR statement, PROC MEANS produces statistics on all of the numeric variables in the specified data set (usually not a good idea).
Finally, the PROC step ends with a RUN statement. Here is the output:
Descriptive Statistics Broken Down by a Classification Variable
The data set Blood_Pressure also contains a variable called Drug. You might want to see the same statistics, but this time compute them for each level of Drug. One way to do this is to add a CLASS statement to PROC MEANS like this:
Program 2.2: Statistics Broken Down by a Classification Variable
title “Descriptive Statistics Broken Down by Drug”; proc means data=example.Blood_Pressure n nmiss mean std median maxdec=3; class Drug; var SBP DBP; run; |
The CLASS statement tells the procedure to produce the selected statistics for each unique value of Drug. This is a good time to tell you that when you have more than one statement in a PROC step (in this case, the CLASS and VAR statements), the order of these statements does not usually matter. The exceptions are certain statistical procedures in which you must specify your model before you ask for certain statistics.
Here is the output:
You should always request both the N and NMISS options when you run PROC MEANS, because missing values are a possible source of bias.
What if you want to see the grand mean, as well as the means broken down by Drug, all in one listing? The PROC MEANS option PRINTALLTYPES does this for you when you include a CLASS statement. Here is the modified program:
Program 2.3: Demonstrating the PRINTALLTYPES Option with PROC MEANS
title “Descriptive Statistics Broken Down by Drug”; proc means data=example.Blood_Pressure n nmiss mean std median printalltypes maxdec=3; class Drug; var SBP DBP; run; |
Here is the corresponding output:
Now you see statistics for each value of Drug and for all subjects, in the same listing.
Computing a 95% Confidence Interval and the Standard Error
A 95% confidence interval for the mean (often abbreviated as 95% CI) is useful in helping СКАЧАТЬ