End-to-end Data Analytics for Product Development. Chris Jones
Чтение книги онлайн.

Читать онлайн книгу End-to-end Data Analytics for Product Development - Chris Jones страница 12

Название: End-to-end Data Analytics for Product Development

Автор: Chris Jones

Издательство: John Wiley & Sons Limited

Жанр: Математика

Серия:

isbn: 9781119483700

isbn:

СКАЧАТЬ always a chance that the confidence interval won't contain the true population mean.

      We quantify how sure we need to be with a value called the confidence level, usually denoted by (1 − α).

      The confidence level is set by the researcher before calculation of a confidence interval.

      The most common confidence level is 95% (0.95). Other common levels are 90% and 99%.

      The confidence level is how sure we are that the confidence interval contains the actual population parameter value.

       Example 1.4. To illustrate the meaning of the confidence level, let's return to the previous example and suppose we drew 100 samples from the same population and calculated the confidence interval for each sample.If we used 95% confidence intervals, on average 95 out of 100 of the confidence intervals will contain the population parameter, while 5 out of 100 will not.In practice when we calculate a 95% confidence interval for our sample, we are confident that our sample is one of the 95% samples for which CI covers the true parameter value.

      Stat Tool 1.15 Hypothesis Testing Icon01

      A common task in statistical studies is the comparison of mean values, variances, proportions, and so on, to a hypothesized value of interest or among different groups, for example:

       What is the performance of a new product compared with the industry standard or products currently on the market?

      To investigate any such questions, we can conduct an inferential procedure called a hypothesis test. Hypothesis tests allow us to make decisions on business problems based on statistically significant results, not based on intuition alone.

      To begin with, the researchers need to determine which question they want to focus on and then define the hypotheses. A statistical hypothesis is a claim about a population parameter (e.g. about the mean or the standard deviation of a variable of interest).

      Hypotheses should be based on our knowledge of the process, such as how a process has performed in the past or customers' expectations.

      To perform a hypothesis test, we need to define the null hypothesis and the alternative hypothesis.

       Null hypothesis H0: usually states that a population parameter, such as the population mean, equals a specified value or parameters from other populations.E.g. H0: the mean performance of the new product is equal to the industry standard.

       Alternative hypothesis H1: is the opposite of the null hypothesis, so it usually states that the population parameter does not equal a specified value or parameters from other populations.H1: the mean performance of the new product is NOT equal to the industry standard.

      Sometimes the alternative hypothesis is directional or one‐sided; that is, we suspect the population parameter is greater than or less than a given value.

       Null hypothesis H0:E.g. H0: the mean performance of the new product is equal to the industry standard.

       One‐sided alternative hypothesis H1:H1: the mean performance of the new product is GREATER THAN the industry standard.OrH1: the mean performance of the new product is LESS THAN the industry standard.

      1 We can reject the null hypothesis in favor of the alternative hypothesis. If we reject the null hypothesis, we say that our result is statistically significant:E.g. H0 (the mean performance of the new product is equal to the industry standard) is REJECTED.

      2 We can fail to reject the null hypothesis and conclude that we do not have enough evidence to claim that the alternative hypothesis is true. We will say that our results are NOT statistically significant:E.g. H0 (the mean performance of the new product is equal to the industry standard) is NOT REJECTED.

      Because we are using sample data, decisions based on those hypothesis tests could be wrong.

      Let's consider the outcomes of a hypothesis test. If the null hypothesis is true and based on our sample data we fail to reject it, we make the correct decision, but if we reject it, we make an error. In hypothesis testing, the probability of rejecting a null hypothesis that instead is true is called significance level and is denoted by α. We always select it before performing the hypothesis test and it is usually equal to 5% (0.05) or 1% (0.01).

      Confidence level and significance level are tools to quantify the uncertainty about our inferential conclusions.

      Stat Tool 1.16 The p‐Value Icon01

      After establishing the null and alternative hypotheses and setting the significance level α, how do we decide to reject the null hypothesis?

      When we conduct a hypothesis test, the results include a probability called p‐value.

      We use the p‐value to determine whether we should reject or fail to reject the null hypothesis, by comparing it to the significance level α.

      If the p‐value is less than α, we reject the null hypothesis in favor of the alternative hypothesis (our result is statistically significant):

       E.g. H0 (the mean performance of the new product is equal to the industry standard) is REJECTED.

       E.g. H0 (the mean performance of the new product is equal to the industry standard) is NOT REJECTED.

       Example 1.5. A researcher wants to investigate whether the performance of a new product differs with respect to the industry standard.

      Suppose they set the significance level α equal to 0.05 (5%). The two hypotheses are:

       H0: The mean performance of the new product is equal to the industry standard.

       H1: The mean performance of the new product is NOT equal to the industry standard.

      What СКАЧАТЬ