## Six Sigma Tools for Testing Statistical Significance

“Suppose you want to know if a new design of a product is actually better than the current product. For example, your design team is working on increasing the speed of the KX Speed Drill. You have produced a small batch of the new design, the KX2, and you want to know if this speed is faster than the current speed of 17.5 revs per second.

To test the new design, QA has taken a sample of 13 KX2’s and clocked their speed. The results of this test are shown below.

Is it safe to conclude that the new design for the KX2 has a significantly higher speed with a confidence of 1%?”

DATA

1 –18.4

2 –19.3

3 –20.5

4 –18.6

5 –17.8

6 –21.0

7– 19.4

8 –19.2

9 –18.9

10 –15.9

11 –17.7

12 –20.5

13 –19.1

## Hypothesis test, Decision Rule, and Test Statistics

A consumer magazine randomly selected 100 new hybrid car dealers from across the country, and created a list of retail prices. They want to know if the price differs from $25,000.

RQ: Is the average hybrid car dealer retail price different than $25,000 with a 95% confidence level? Use Alpha = .05

A. Choose the Hypothesis

H1: μ≠ $25,000. The mean hybrid car retail price is different than $25,000 with a 95% confidence level.

B. Specify the Decision Rule

C. Calculate the Test Statistic, P-value

D. Make the Decision

E. Give an interpretation of the Decision

Attach your Excel document with calculations.

## Hypothesis testing, Correlation testing, Anova & Regression

Please see attached for better formatting on the tables.

8. Flat Tire and Missed Class A classic tale involves four carpooling students who missed test and gave as an excuse a flat tire. On the makeup test, the instructor asked the students to identify the particular tire that went flat. If they really didn’t have a flat tire, would they be able to identify the same tire? The author asked 41 other students to identify the tire they would select. The results are listed in the following table (expect for one student who selected the spare). Use a 0.05 significance level to test the author’s claim that the results fit a uniform distribution. What does the result suggest about the ability of the four students to select the same tire when they really didn’t have a flat?

Tire : Left front Right front Left rear Right rear

Number selected: 11 15 8 6

18.

Do World War 11 Bomb Hits a Poisson Distribution? In a analyzing hits by V-1 buzz bombs in World War 11, South London was subdivided into regions, each with an area of 0.25 km2. Shown below is a table of actual frequencies of hits and the frequencies expected with the Poisson distribution. (The Poisson distribution is described in Section 5-5) Use the valves listed and 0.05 significance level to test the claim that the actual frequencies fit a Poisson distribution.

Number of bombs hits 0 1 2 3 4 or more

Actual number of regions 229 211 93 35 8

Expected number of regions (from Poisson distribution) 227.5 211.4 97.9 30.5 8.7

11-3

18. Global Warming Survey – A pew research poll was conducted to investigate opinions about global warming. The respondents who answered yes when asked if there is solid evidence that the earth is getting warmer were then asked to select a cause of global warming. The results for two age brackets are given in the table below. Use a 0.01 significance level to test the claim that the age bracket is independent of the choice for the cause of global warming.

Do respondents from both age brackets appear to agree, or is there a substantial difference?

Human activity Natural patterns Don’t know or refused to answer

Under 30 108 41 7

65 and over 121 71 43

12-2

14. Car Emissions Listed below are measured amounts of greenhouse gas emissions from cars in three different categories (from Data Set 16 in Appendix B). The measurements are in tons per year, expressed as CO2 equivalents. Use a 0.05 significance level to test the claim that the different car categories have the same mean amount of greenhouse gas emissions. Based on the results, does the number of cylinders appear to affect the amount of greenhouse gas emissions?

Four cylinder 7.2 7.9 6.8 7.4 6.5 6.6 6.7 6.5 6.5 7.1 6.7 5.5 7.3

Six cylinder 8.7 7.7 7.7 8.7 8.2 9.0 9.3 7.4 7.0 7.2 7.2 8.5

Eight cylinder 9.3 9.3 8.6 8.6 8.7 9.3 9.3

9-5

12. Home Size and Selling Price Using the sample data from Data Set 23 in Appendix B, 21 homes with living areas under 2000 ft2have selling prices with a standard deviation of $32,159.73. There are 19 homes with living areas greater than 2000 ft2 and they have selling prices with a standard deviation of $66,628.50. Use a 0.05 significance level to test the claim of a real estate agent that homes larger than 2000 ft2 have selling prices that vary more that vary more than the smaller homes.

10-2

16. Heights of Presidents and Runners-Up Theories have been developed about the heights of winning candidates for the U.S presidency and the heights of candidates who were runners-up. Listed below are heights (in inches) from recent presidential elections. Is there a linear correlation between the heights of candidates who won and the heights of the candidates who were runners-up?

Winner: 69.5, 73, 73, 74, 74.5, 74.5, 71, 71

Runner up: 72, 69.5, 70, 68, 74, 74, 73, 76

10- 3

16. Heights of Presidents and Runners-Up Find the best predicted height of runner-up Goldwater, given that the height of the winning presidential candidate Johnson is 75 in. Is the predicted height of Goldwater close to his actual height of 72 in.?

Winner 69.5 73 73 74 74.5 74.5 71 71

Runner-Up 72 69.5 70 68 74 74 73 76

## Hypothesis Testing and Population Analysis

I would like the response to the questions below in excel format, so that I can see the formulas used.

1. In a recent year, the FCC reported that the mean wait for repairs for AT&T customers was 25.3 hours. In an effort to improve this service, suppose that a new repair service process was developed. This new process when used with a sample of 100 repairs, resulted in a sample mean of 22.3 hours and a sample standard deviation of 8.3 hours.

a. Is there evidence that the population mean amount is less than 25.3 hours? (Use a 0.05 level of significance.)

b. Determine the p-value and interpret its meaning.

2. The U.S. Department of Education reports that 46% of full-time college students are employed while attending college (data extracted from “The Condition of Education 2009,” National Center for Education Statistics, nces.ed.gov). A recent survey of 60 full-time students at Miami University found that 29 were employed.

A. Use the five-step p-value approach to hypothesis testing and a 0.05 level of significance to determine whether the proportion of full-time students at Miami University is different than the national norm of 0.46.

B. Assume that the study found that 36 of the 60 full-time students were employed and repeat (a). Are the conclusions the same?

3. Studies conducted by a manufacturer of “Boston” and “Vermont” asphalt shingles have shown products weight to be a major factor in the customers perception of quality. Moreover, the weight represents the amount of raw materials being used and is therefore very important to the company from a cost standpoint. The last stage of the assembly line packages the shingles before the packages are placed on wooden pallets. Once a pallet is full. (It holds 16 squares of shingles) The data file contains the weight in pounds for a sample of 368 pallets of Boston Shingles and 330pallets of Vermont shingles.

a. For the Boston shingles, is there evidence that the population mean weight is different from 3,150 pounds.

b. Interpret the meaning of the p-value in (a)

c. For the Vermont shingles, is there evidence that the population mean weight is different from 3700 pounds.

d. Interpret the meaning of the p-value in(c).

e. In (a) through (d), do you have to worry about the normality assumption? Explain.

## Defining the hypothesis tests

a) Which variable in an experiment determines whether to use parametric or nonparametric procedures?

b) In terms of the dependent variable, what are the two categories into which all nonparametric procedures can be grouped?

2)

a) Why, if possible, should we design a study that meets the assumptions of a parametric procedure?

b) Why shouldn’t you use parametric procedures for data that clearly violate their assumptions?

3) A survey finds that, given a choice, 34 females prefer males much taller than themselves, and 55 females prefer males only slightly taller than themselves.

a) What procedure should we perform?

b) What are the Ho and Ha?

c) With α= .05, what do you conclude about the preference of females in the population?

d) Describe how you would graph these results

4) The following data reflect the frequency with which people voted in the last election are were satisfied with the officials elected:

Yes No

Yes 48 35

No 33 52

What procedure should we perform?

What are the Ho and Ha?

What is f_e in each cell?

Compute X_obt^2

With α= .05, what do you conclude about these variables?

How consistent is this relationship?

## Defining the hypothesis tests

1)

a) Which variable in an experiment determines whether to use parametric or nonparametric procedures?

b) In terms of the dependent variable, what are the two categories into which all nonparametric procedures can be grouped?

2)

a) Why, if possible, should we design a study that meets the assumptions of a parametric procedure?

b) Why shouldn’t you use parametric procedures for data that clearly violate their assumptions?

3) A survey finds that, given a choice, 34 females prefer males much taller than themselves, and 55 females prefer males only slightly taller than themselves.

a) What procedure should we perform?

b) What are the Ho and Ha?

c) With α= .05, what do you conclude about the preference of females in the population?

d) Describe how you would graph these results

4) The following data reflect the frequency with which people voted in the last election are were satisfied with the officials elected:

Yes No

Yes 48 35

No 33 52

What procedure should we perform?

What are the Ho and Ha?

What is f_e in each cell?

Compute X_obt^2

With α= .05, what do you conclude about these variables?

How consistent is this relationship?

## Analysis of Liberals & Conservatives with Hypothesis Test

From the 2008 GSS-the average educational attainment for liberals is 13.90 year Sy=3.27) and the average educational attainment for conservatives is 13.55 years (Sy=2.82). Data are base on 187 liberals and 227 conservatives responses.

a. Test research hypotheses that there is a difference in level of education between liberals and conservatives, set alpha at .01.

b. Would your decision have been different if alpha were set at .05?

## Cross tabulation and analyzing relationships among variables

Do women and men have different opinions about affirmative action? Based on a sub sample of 2008 GSS, the output figure below shows respondent’s sex (sex) and attitudes toward affirmative action (DISCAFF: Are whites hurt by affirmative action).

Whites hurt by Affirmative action* respondents sex crosstabulation

Respondents sex

Male Female Total

Whites hurt by affirmative action very likely 66 93 159

somewhat likely 193 260 453

not very likely 172 179 351

Total 431 532 963

a. What is the independent variable?

b. What are the differences in attitudes between men and women?

c. What might be some other reasons that influence attitudes toward welfare spending? Suggest at least two reasons.

## Calculating the cramer’s and lambda using chi-square

Construct a cross tabulation to examine the relationships between the two nominal/ordinal level variables. Assume that you selected a random sample of two hundreds cases and collected data on these cases. Calculate Chi Square ans as appropriate, Lambda, gamma and Cramer’s V. Briefly state conclusion about hypothesis relationship.

Hypothesis 1

Robbery arrest Convictions

44 3

50 6

38 8

96 12

42 14

42 18

47 22

40 22

39 28

46 32

50 40

Mean=49.2 Mean=18.2

standard deviation=16.95542422 12.09269752

Median-45 Median=16

Mode=50 Mode=N/A

Observation=10 Observation=10

## T-Test – statistics

We have been working with one-sided and two-sided t tests but I do not know how to do this problem:

The observed time that 20 individuals spend standing and walking per day is reported in the following table:

Group n Mean Std. dev

Group 1 (lean) 10 525.751 107.121

Group 2 (obese) 10 373.269 67.498

(a) Test the null hypothesis that there is no difference in mean time per day

spend standing or walking between the two groups against a two sided

alternative at = 0:01 significance. Report your test statistic, a bound

for your p-value, and your conclusions regarding the null hypothesis.

(b) Verify your results with an appropriate confidence interval. Does this

confidence interval support your conclusions in part (a)?

## Solving various questions on hypothesis testing

______ 1. The variable about which the investigator wishes to make predictions or estimates is called the ____.

a. dependent variable b. unit of association

c. independent variable d. discrete variable

______ 2. In regression analysis, the quantity that gives the amount by which Y changes for a unit change in X is called the _____.

a. coefficient of determination b. slope of the regression line

c. Y intercept of the regression line d. correlation coefficient

______ 3. In the equation y = b0 +b1 (x), b0 is the _____.

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 4. In the equation y = b0 + b1 (x), b1 is the _____.

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 5. In regression and correlation analysis, the measure whose values are restricted to the range 0 to 1, inclusive, is the _____.

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 6. In regression and correlation analysis, the measure whose values are restricted to the range -1 to +1, inclusive, is the

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 7. The quantity is called the _______________ sum of square.

a. least b. explained

c. total d. unexplained

______ 8. If, in the regression model, b sub 1 = 0, we say there is _____________ linear relationship between X and Y.

a. an inverse b. a significant

c. a direct d. no

______ 9. If, in the regression model, b sub 1 is negative, we say there is _____________ linear relationship between X and Y.

a. an inverse b. a significant

c. a direct d. no

______ 10. If two variables are not related, we know that ________________.

a. their correlation coefficient is equal to zero.

b. the variability in one of them cannot be explained by the other.

c. the slope of the regression line for the two variables is equal to zero.

d. all of the above statements are true.

True or False

_______ 11. The usual objective of regression analysis is to predict estimate the value of one variable when the value of another variable is known.

_______ 12. Correlation analysis is concerned with measuring the strength of the relationship between two variables.

_______ 13. In the least squares model, the explained sum of squares is always smaller than the regression sum of squares.

_______ 14. The sample correlation coefficient and the sample slope will always have the same sign.

_______ 15. An important relationship in regression analysis is = .

_______ 16. If zero is contained in the 95% confidence interval for b, we may reject Ho: b = 0 at the 0.05 level of significance.

_______ 17. If in a regression analysis the explained sum of squares is 75 and the unexplained sum of square is 25, r2 = 0.33.

_______ 18. In general, the smaller the dispersion of observed points about a fitted regression line, the larger the value of the coefficient of determination.

_______ 19. When small values of Y tend to be paired with small values of X, the relationship between X and Y is said to be inverse.

_______ 20. Other things are equal, decreasing α increases β.

The purpose of hypothesis testing is to aid the manager or researcher in reaching a (an) _____________________ concerning a (an) _____________________ by examining the data contained in a (an) _____________________ from that _____________________.

The _____________________ hypothesis is the hypothesis that is tested.

If the null hypothesis is not rejected, we conclude that the alternative __________________.

If the null hypothesis is not rejected, we conclude that the null hypothesis ______________.

A Type I error occurs when the investigator ______________________________________.

A Type II error occurs when the investigator ______________________________________.

Values of the test statistic that separate the acceptance region from the rejection are called _________________ values.

Given, H0: µ= µ0, then Ha: ___________________________________.

Given H0: µ ≤ µ0, then Ha: ___________________________________.

Given H0: µ ≥ µ0, then Ha: ___________________________________.

When one is testing H0: µ= µ0 on the basis of data from a sample of size n from a normally distributed population with a known variance of σ2, the test statistic is _________________.

When one is testing H0: µ= µ0 on the basis of data from a sample of size n from a normally distributed population with an unknown variance, the test statistic is _________________.

Given: H0: µ= 100; Ha: µ ≠ 100; α = 0.03; computed z = 2.25, p = 0.0244. The null hypothesis should reject because __________________________________________.

The following is a general statement of a decision rule: If, when the null hypothesis is true, the probability of obtaining a value of the test statistic as____________ as or more _______ than that actually obtained is less than or equal to α, the null hypothesis is_______________. Otherwise, the null hypothesis is ______________________.

The probability of obtaining a value of the test statistic as extreme as or more extreme than that actually obtained, given that the tested null hypothesis is true, is called ______________ for the ________________test.

What is the null hypothesis?

What is the alternative hypothesis?

Explain the p-value.

Show 3 ways how to calculate r^2.

What is the importane of having a critical value?

## Solving various questions on hypothesis testing

______ 1. The variable about which the investigator wishes to make predictions or estimates is called the ____.

a. dependent variable b. unit of association

c. independent variable d. discrete variable

______ 2. In regression analysis, the quantity that gives the amount by which Y changes for a unit change in X is called the _____.

a. coefficient of determination b. slope of the regression line

c. Y intercept of the regression line d. correlation coefficient

______ 3. In the equation y = b0 +b1 (x), b0 is the _____.

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 4. In the equation y = b0 + b1 (x), b1 is the _____.

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 5. In regression and correlation analysis, the measure whose values are restricted to the range 0 to 1, inclusive, is the _____.

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 6. In regression and correlation analysis, the measure whose values are restricted to the range -1 to +1, inclusive, is the

a. coefficient of determination b. slope of the regression line

c. y intercept of the regression line d. correlation coefficient

______ 7. The quantity is called the _______________ sum of square.

a. least b. explained

c. total d. unexplained

______ 8. If, in the regression model, b sub 1 = 0, we say there is _____________ linear relationship between X and Y.

a. an inverse b. a significant

c. a direct d. no

______ 9. If, in the regression model, b sub 1 is negative, we say there is _____________ linear relationship between X and Y.

a. an inverse b. a significant

c. a direct d. no

______ 10. If two variables are not related, we know that ________________.

a. their correlation coefficient is equal to zero.

b. the variability in one of them cannot be explained by the other.

c. the slope of the regression line for the two variables is equal to zero.

d. all of the above statements are true.

True or False

_______ 11. The usual objective of regression analysis is to predict estimate the value of one variable when the value of another variable is known.

_______ 12. Correlation analysis is concerned with measuring the strength of the relationship between two variables.

_______ 13. In the least squares model, the explained sum of squares is always smaller than the regression sum of squares.

_______ 14. The sample correlation coefficient and the sample slope will always have the same sign.

_______ 15. An important relationship in regression analysis is = .

_______ 16. If zero is contained in the 95% confidence interval for b, we may reject Ho: b = 0 at the 0.05 level of significance.

_______ 17. If in a regression analysis the explained sum of squares is 75 and the unexplained sum of square is 25, r2 = 0.33.

_______ 18. In general, the smaller the dispersion of observed points about a fitted regression line, the larger the value of the coefficient of determination.

_______ 19. When small values of Y tend to be paired with small values of X, the relationship between X and Y is said to be inverse.

_______ 20. Other things are equal, decreasing α increases β.

The purpose of hypothesis testing is to aid the manager or researcher in reaching a (an) _____________________ concerning a (an) _____________________ by examining the data contained in a (an) _____________________ from that _____________________.

The _____________________ hypothesis is the hypothesis that is tested.

If the null hypothesis is not rejected, we conclude that the alternative __________________.

If the null hypothesis is not rejected, we conclude that the null hypothesis ______________.

A Type I error occurs when the investigator ______________________________________.

A Type II error occurs when the investigator ______________________________________.

Values of the test statistic that separate the acceptance region from the rejection are called _________________ values.

Given, H0: µ= µ0, then Ha: ___________________________________.

Given H0: µ ≤ µ0, then Ha: ___________________________________.

Given H0: µ ≥ µ0, then Ha: ___________________________________.

When one is testing H0: µ= µ0 on the basis of data from a sample of size n from a normally distributed population with a known variance of σ2, the test statistic is _________________.

When one is testing H0: µ= µ0 on the basis of data from a sample of size n from a normally distributed population with an unknown variance, the test statistic is _________________.

Given: H0: µ= 100; Ha: µ ≠ 100; α = 0.03; computed z = 2.25, p = 0.0244. The null hypothesis should reject because __________________________________________.

The following is a general statement of a decision rule: If, when the null hypothesis is true, the probability of obtaining a value of the test statistic as____________ as or more _______ than that actually obtained is less than or equal to α, the null hypothesis is_______________. Otherwise, the null hypothesis is ______________________.

The probability of obtaining a value of the test statistic as extreme as or more extreme than that actually obtained, given that the tested null hypothesis is true, is called ______________ for the ________________test.

What is the null hypothesis?

What is the alternative hypothesis?

Explain the p-value.

Show 3 ways how to calculate r^2.

What is the importane of having a critical value?

## Death Penalty as Moral Acceptance

In 2010, 65% of adult Americans thought that the death penalty was morally acceptable. In a poll conducted by the Gallup Organization, a simple random sample of 1005 adult Americans resulted in 704 respondents stating that they believe the death penalty was morally acceptable when asked, “Do you believe the death penalty is morally acceptable or morally wrong?” (The choices, “morally acceptable” and “morally wrong” were randomly interchanged in the question for each interview.) Is there significant evidence at the 5% level of significance to indicate that the proportion of adult Americans who believe the death penalty is morally acceptable, has increased from the level reported in 2010?

## Hypothesis testing for means – t scores

I don’t understand the formulas and rationales for the attached. I also don’t have the correct version of excel to use the tool pack I need so I am doing everything manually. I am doing the work on my own but want to make sure it is somewhat correct prior to turning in. Any help would be appreciated. Thank you.

## Rocky University: How prevalent is Cheating?

Note: The attached Excel file provides the data required for this question.

During the global recession of 2008 and 2009, there were many accusations of unethical behavior by Wall Street executives, financial managers, and other corporate officers. At that time, an article appeared that suggested that part of the reason for such unethical business behavior may stem from the fact that cheating has become more prevalent among business students. The article reported that 86% of business students admitted to cheating at some time during their academic career as compared to 77% of non-business students.

Cheating has been a concern of the dean of the College of Business at Rocky University for several years. Some faculty members in the college believe that cheating is more widespread at Rocky than at other universities, while other faculty members think that cheating is not a major problem in the college. To resolve some of these issues, the dean commissioned a study to assess the current ethical behavior of business students at Rocky. As part of this study, an anonymous exit survey was administered to a sample of 90 business students from this year’s graduating class. Responses to the following questions were used to obtain data regarding three types of cheating.

During your time at Rocky, did you ever present work copied off the Internet as your own?

Yes

No

During your time at Rocky, did you ever copy answers off another student’s exam?

Yes

No

During your time at Rocky, did you ever collaborate with other students on projects that were supposed to be completed individually?

Yes

No

Any student who answered Yes to one or more of these questions was considered to have been involved in some type of cheating. The complete data set is in the file named Rocky.

Managerial Report

Prepare a report (see below) for the dean of the college that summarizes your assessment of the nature of cheating by business students at Rocky University. Be sure to include the following seven (7) items in your report.

1. To summarize the data, compute the proportion of all students, male and female, who presented work copied off the Internet as their own, copied answers off another student’s exam, or collaborated with other students on projects that were supposed to be completed individually. Then comment on your findings.

2. Develop 95% confidence intervals for the proportion of all students–the proportion of male students and the proportion of female students–who were involved in some type of cheating.

3. Develop 95% confidence intervals for the proportion of all students–the proportion of male students and the proportion of female students–who were involved in copying off the Internet.

4. Develop 95% confidence intervals for the proportion of all students–the proportion of male students and the proportion of female students–who were involved in copying off another’s exam.

5. Develop 95% confidence intervals for the proportion of all students–the proportion of male students and the proportion of female students–who were involved in collaborating on what was meant to be an individual project.

6. Conduct a hypothesis test to determine if the proportion of business students at Rocky University who were not involved in some type of cheating is less than that of business students elsewhere. Use α = 0.05.

7. What advice would you give to the dean based upon your analysis of the data?