## Statistical analysis: Hypothesis Testing

Use the traditional method of hypothesis testing unless otherwise specified

1. A traffic safety expert report indicated that 21-24 years age group, 31.58% of traffic fatalities were victims who had used a seat belt. Victims who were not wearing a seat belt accounted for 59.83% of the deaths, and the status of the rest was unknown. A study of 120 traffic fatalities in a particular region showed that for this age group, 35 of the victims had used a seat belt, 78 had not, and the status of the rest was unknown. At = 0.05 is there sufficient evidence that the proportions differ from those in the report?

3. Tire Labeling The federal government has proposed labeling tires by fuel efficiency to save fuel and cut emissions. A survey was taken to see who would use these labels. At = 0.10, is the gender of the individual related to whether or not a person would use these labels? The data from a sample are shown here. Gender: Men: Yes -114 No-30 Undecided-6)Women: Yes-136 No-16 Undecided-8

5. Pension Investments A survey was taken on how a lump-sum pension would be invested by 45-year-olds and 65-year-olds. The data are shown here. At = 0.05, is there a relationship between the age of the investor and the way the money would be invested? Age 45: Large Company stocks funds 20, Small company stocks funds 10, International stock funds 10, CDs or money market funds 15, Bond 45. Age 65: Large stock funds 42, small company stock funds 24, International stocks funds 24, CDs or money market funds 6, Bonds 24

7. Employment of High School Females A guidance counselor wishes to determine if the proportions of high school girls in his school district who have jobs are equal to the national average of 36%. He surveys 80 female students, ages 16 through 18, to determine if they work. The results are shown. At = 0.01, test the claim that the proportions of girls who work are equal. Use the P-value method. 16 year olds: work 45, don’t work 35 total 80)17 years old: work 31, don’t work 49 total 80) 18 year olds: work 38, don’t work 42 total 80

9. Health Insurance coverage based on the following data showing the numbers of people (in thousands) with and without health insurance ,can it be concluded at the = 0.01 level of significance that the proportion with or without health insurance is related to the state chosen? With insurance- Arkansas 52, Montana 793, North Dakota 553,Wyoming447. Without insurance – Arkansas 123, Montana 146,North Dakota 61,Wyoming 70

## Century National Bank

Century National Bank

Refer to the Century National Bank data (attached).

Is it reasonable that the distribution of checking account balances approximates a normal probability distribution? Determine the mean and the standard deviation for the sample of 60 customers. Compare the actual distribution with the theoretical distribution. Cite some specific examples and comment on your findings.

Divide the account balances into three groups, of about 20 each, with the smallest third of the balances in the first group, the middle third in the second group, and those with the largest balances in the third group. Next, develop a table that shows the number in each of the categories of the account balances by branch. Does it appear that account balances are related to the branch? Cite some examples and prepare a short written report of your findings in Microsoft Word.

## Fortune 500 CEO Salaries

The Stocks included in the S & P 500 are those of large publicly held companies that trade on either the New York Stock Exchange or the NASDAQ. In 2008, the S&P 500 was down 38.5%, but what about financial compensation (salary, bonuses, stock options, etc.) to the 500 CEOs that run the companies? To learn more about the mean CEO compensation, an alphabetical list of the 500 companies was obtained and ordered from 1(3m) to 500 (Zions Bancorp). Next, the random number table was used to select a random number from 1 to 50. The number selected was 10. Then, the companies numbered 10,60,110,160,210, 260, 310, 360, 410, and 460 were investigated and the total CEO c ompensation recorded. The data, stored in CEO are as follows.

Number Company Compensation

10 Aflac 10,783,232

60 Big Lots 9,862,262

110 Comerica 4,108,245

160 EMC 13,874,262

210 Harley-Davison 6,048,027

260 Kohl’s 11,638,049

310 Molson Coors Brewing 5,558,499

360 Pfizer 6,629,955

410 Sigma-Aldrich 3,983,596

460 United Parcel Service 5,168,664

A. Construct a 95% confidence interval estimate for the mean 2008 compensation for CEOs of S&P 500 companies.

B. Construct a 99% confidence interval estimate for the mean 2008 compensation for CEOS of S&P 500 c companies

C. Comment on the effect that changing the level of confidence had on your answers in (a) and (b).

–

In New York State, savings banks are permitted to sell a form of life insurance called savings bank life insurance (SBLI). The approval process consists of underwriting, which includes a review of the application, a medical information bureau check, possible requests for additional medical information and medical exams, and a policy compilation stage in which the policy pages are generated and sent to the bank for delivery. The ability to deliver approved policies to customers in a timely manner is critical to the profitability of this service. During a period of one month, a random sample of 27 approved policies is selected, and the total processing time, in days, is recorded (and stored in Insurance)

73 19 16 64 28 28 31 90 60 56 31 56 22 18

45 48 17 17 17 91 92 63 50 51 69 16 17

a. In the past, the mean processing time was 45 days. At the 0.05 level of significance, is there evidence that the mean processing time has changed from 45 days?

b. What assumption about the population distribution is needed in order to conduct the t test in (a)?

c. Construct a boxplot or a normal probability plot to evaluate the assumption made in (b).

d. Do you think that the assumption needed in order to conduct the t test in (a) is valid? Explain.

## Bottled water study

4.44) A four year study of bottled water brands conducted by the natural resources defense council found that 25% of bottled water is just tap water packaged in a bottle. Consider a sample of five bottled water brands and let x equal the number of these brands that use tap water.

a. explain why x is a binomial random variable

b. give the probability distribution for x as a formula

c. find p(x=2)

d. find p(x <&= 1)

4.54) Suppose you are a purchasing officer for a large company. Ypu have purchased 5 million electrical switches, and your supplier has guaranteed that the shipment will contain no more than .1% defectives. to check the shipment you randomly sample 500 switches, test them and find that four are defective. Based on this evidence, do you think the supplier has complied with the guarantee. Explain.

## Children with ADHD and TV

Recent results suggest that children with ADHD also tend to watch more TV than children who are not diagnosed with the disorder. Tto examine this relationship a researcher obtains a random sample of n=36 children 8 to 12 years old who have been diagnosed with ADHD. Each child is asked to keep a journal recording how much time is spent watching TV. The average daily time for the sample is M=4.9 hours. It is known that the average time for the general population of 8 to 12 year old children is 4.1 hours with alpha = 1.8. are the data sufficient to conclude that children with ADHD watch significantly more TV than children without the disorder. use a two tailed test with alpha = .05. if the researcher had used a sample of n=9 children and obtained the sample mean would the results be sufficient to reject Ho? compute cohens d for this study.

## Cold Medication Effect on Mental Alertness

A researcher would like to determine whether an over-the-counter cold medication has an effect on mental alertness. A sample of n = 16 participants is obtained and each person is given a standard dose of the medication one hour before being tested in a driving simulation task. for the general population reaction time scores on the simulation task are normally distributed with the population mean = 210 and alpha = 20. The individuals in the sample had an average score of M = 222. Can the research conclude that the medication has a significant effect on mental alertness as measured by the driving simulation task? Use a two tailed test with alpha = 0.05. Compute Cohen’s d to measure the size of the effect.

## Transition pressure of bismuth as a function of temperature

Question 1

An experiment was conducted to investigate the transition pressure of bismuth as a function of temperature. Listed below in Table 3.1 are the temperature (x) in degrees Centigrade and the difference in pressure (y) from 25,000 bars in hundreds of bars for 23 samples. (So if y = 3.66 then the pressure is 25, 000 + 3.66 × 100 = 25, 366 bars.)

Table 3.1 The temperature and pressure difference for bismuth transition.

See attached

You can assume the following calculations:

See attached

a) Produce an ANOVA table to test the ‘lack of fit’ and establish if it is reasonable to assume that the expected pressure is a linear function of the temperature.

b) Test the hypothesis that the slope is zero.

c) Obtain a point estimate and 95% confidence interval for the expected decrease in pressure due to an increase of 10 degrees centigrade.

Question 2

A study was conducted to see how the amount of heat, which is generated when cement sets, is influenced by the composition of the cement. The response vari- able is the heat generated (y) in calories per gram. The explanatory variables are the percentages of cement by weight of the constituents tricalcium-silicate (x1) and tetracalcium-alumino-ferrite (x2) and the data for 13 cement samples are given in Table 3.2.

Table 3.2 Heat generated in calories per gram and percentages of cement by weight of the constituents tricalcium-silicate and tetracalcium-alumino-ferrite for 13 cement samples.

See attached

A multiple regression model was fitted to these data with heat generated as a linear function of x1 and x2 with an intercept term using MINITAB.

See attached

a) Obtain the ANOVA table for the model with predictors x1 and x2.

b) Which percentage of the variability of y is explained by the overall model?

c) Test the hypothesis that β1 = β2 = 0. What do you conclude?

d) Compare the model with two linear predictors to the one with a single predictor

x1 and decide which one is the best.

e) Using the Minitab listing, calculate βˆ0.

f) Test the hypothesis βi = 0 for each i = 0,1,2 individually. What do you conclude?

g) Calculate the 95% confidence intervals for β0, β1, β2.

## Statistical Hypothesis Testing

1. State the null and alternative hypothesis for the following situations.

a) A federal auditor believes that a health care company has overcharged its patients.

b) The editor of a magazine believes that the mean income of subscribers to its magazine is $75,000.

c) An operations manager must maintain machines that produce 50 pound bags of fertilizer.

d) A manufacturer believes that the average life of its competitor’s battery is less than 10 hours.

2. A crime reporter was told that, on the average, 3,000 burglaries per month occurred in his city. The reporter examined past data, which were used to compute a 95% confidence interval for the number of burglaries per month. The confidence interval was from 2,176 to 2,784. At a 5% level of significance, do these data tend to support the alternative hypothesis, Ha: μ ≠3,000?

3. A movie theater complex will raise its ticket price if the average ticket price of theaters in southern California exceeds $7.50. A random sample of 36 theaters resulted in a mean of $7.80. The population standard deviation is $1.00. What conclusion can be made at the 10% significance level? How about at the 5% significance level?

4. The producer of Take-a-Bite, a snack food, claims that each package weighs 175 grams. A representative of a consumer advocate group selected a random sample of 70 packages. From this sample, the mean is 172 grams. The population standard deviation is known to be 8 grams. Find and interpret the p-value for testing that the mean weight of Take-a-Bite is less than 175grams.

## Levels of significance questions

Many times researchers preselect the value of alpha of .01 or .001. What are the factors that allow selection of either of those levels of significance?

When computing Independent Sample Test, the SPSS output contains the section stating that Confidence Interval of the Difference. What is a Confidence Interval and what is the practical use or benefit of knowing its lower or upper value, especially when conducting t-test analysis?

How do you determine if the hypothesis is nondirectional or directional; is this determination made before or after you view the data?

When utilizing Z scores, is there a limit to the amount of variables? Does it affect the mean and/or the standard deviation?

In what types of situations or research are continuous variables and discrete data used and how will a corporation utilize the data in a report?

## Inequalities in the annual salary for employees with similar performance ratings, years of service, and certifications

The personnel director for a local manufacturing firm has received complaints from the employees in a certain shop regarding what they perceive to be inequities in the annual salary for employees who have similar performance ratings, years of service, and relevant certifications. The personnel director believes that an employee’s pay in this particular shop should be positively correlated to their prior performance rating, years of service, and relevant certifications. The personnel director has collected the data shown in the following table pertaining to the employees within the shop:

Employee Current Annual Salary Average Performance Years of Service Number of Relevant

(Thousands) Rating for Past 3 Years Certifications

(5 point scale)

1 56.1 2.18 9 6

2 55.3 3.31 21 7

3 48.9 3.18 18 7

4 61.8 3.75 36 7

5 56.4 2.62 31 6

6 52.5 3.75 15 6

7 52.6 4.25 25 6

8 62.6 2.43 30 5

9 45.1 1.93 7 6

10 71.1 3.50 47 8

11 53.2 2.81 26 6

12 44.3 3.06 11 6

13 55.3 5.00 19 6

14 59.1 4.06 35 7

15 60.0 4.12 38 9

16 48.6 5.00 21 4

17 50.4 3.87 9 6

18 63.0 4.37 41 8

19 53.0 2.50 35 3

20 50.9 2.81 23 4

21 55.4 3.68 33 4

22 51.8 5.00 27 4

23 62.0 3.00 37 8

24 50.1 2.43 15 5

The personnel director is interested in creating a linear regression model that can be used to estimate the annual salary an employee might expect to receive based upon his or her past performance, years of service, and/or number of relevant certifications. The regression model will be used as a basis for determining whether or not there is any validity to the employees’ complaints regarding salary inequities.

Perform each of the following seven regression analyses using a 95% confidence level.

– Annual salary vs. average performance rating for the past 3 years

– Annual salary vs. years of service

– Annual salary vs. number of relevant certifications

– Annual salary vs. average performance rating for the past 3 years and years of service

– Annual salary vs. average performance rating for the past 3 years and number of relevant certifications

– Annual salary vs. years of service and number of relevant certifications

– Annual salary vs. average performance rating for the past 3 years, years of service and number of relevant certifications

Use the results for the univariate regression analysis for annual salary vs. average performance rating for the past 3 years in order to answer questions 1 through 8.

1. What is the degree of correlation between the dependent variable and the independent variable?

2. Is the statistical significance of the model as a whole less than the desired statistical significance for the regression model? Explain the basis for your answer.

3. Is the statistical significance of the linear relationship between the dependent and independent variables less than the desired statistical significance for the regression model? Explain the basis for your answer.

4. What percentage of the observed variation between the actual values of the dependent variable and the mean value of the dependent variable in the sample data set is explained by the regression model?

5. What is the amount by which we will be off on average when predicting values for the dependent variable using the regression model?

6. What is the coefficient for the y-intercept for the regression model?

7. What is the coefficient for the independent variable for the regression model?

8. What is the point estimate for the predicted salary for an employee with an average performance rating of 3.9?

## Is there a difference between the two groups concerning age, children, and education?

Please see the attached MS Word document for the comparison charts.

Investigate whether there is a significant difference between these two groups in terms of their age, number of children, and education. Assume that x is .05 for a two-tailed test. Based on your analysis, write three 5-type statements summarizing your findinings.

## Difference in variability and P value

Sample Question:

A bank a branch location a commercial district we city developed improved process serving customers noon-to 1 pm. lunch period. the waiting time (defined as the time elapsed from when the customer enters the line until he or she reaches the teller window) needs to be shortened to increase customer satisfaction. A random sample of 15 customers is selected (and stored in Bank 1), and the results (in munities) are as follows:

4.21 5.55 3.02 5.13 4.77 2.34 3.54

4.5 6.1 0.38 5.12 6.46 6.19 3.79

Suppose that another branch, located in a residential area, is also concerned with the noon-to1 pm. Lunch period A random sample of 15 customers is selected (and stored in Bank 2), and the results (in minutes) are as follows:

9.66 5.9 8.02 5.79 8.73 3.82 8.01 8.35

10.49 6.68 5.64 4.08 6.17 9.91 5.47

A. Is there evidence of a difference in the variability of the waiting time between the two branches?

(use a=0.05)

B. Determine the P value in (a) and interpret its meaning

C. What assumption about the population distribution of the two banks is necessary in

(a)? Is the assumption valid for these data?

D. Based on the results of (a) is it appropriate to use the pooled-variance

t-test to compare the means of the two branches?

## Production process of average weight

1. The task of all hypothesis testing is to ____ H0 or ____ H0.

Answer

reject, fail to reject

reject, fail to accept

accept, fail to accept

fail to reject, discredit

accept, reject

12.When testing H0: m =m0 versus Ha: m ≠ m0, if H0 is rejected then the conclusion is:

Answer

Based on the sample data, there is sufficient evidence to conclude that m is equal to m0.

Based on the sample data, there is sufficient evidence to conclude that m is different from m0

Based on the sample data, there isn’t sufficient evidence to conclude that m is equal to m0

Based on the sample data, there isn’t sufficient evidence to conclude that m is different from m0

none of the above

17. Exhibit 8-2A production process is considered to be under control if the machine parts it makes have a mean length of 35.50 millimeters. Experience shows that the standard deviation of the lengths of the machine parts is .45 mm. Whether or not the process is in control, is decided each morning when the quality control technician takes a sample of 40 machine parts and tests, at the 5% level, whether μ = 35.50 mm.Refer to Exhibit 8-2. Is the process in control if the sample mean is 35.62 mm? What are the null and the alternative hypotheses?

Answer

H0: µ ≥ 35.50 Ha: µ < 35.50

H0: µ = 35.50 Ha: µ = 36.62

H0: µ = 35.50 Ha: µ ≠ 35.50

H0: µ ≤ 35.50 Ha: µ > 35.50

H0: µ > 35.50 Ha: µ = 35.50

18. Exhibit 8-3Two hundred people are randomly chosen at a shopping mall to taste-test a new brand of fruit drink. They are asked to rate the drink on a scale from 1 to 5, with 1 being very good and 5 being very bad. The results of the survey reveal that the average rating is 3.63 with a standard deviation of 1.22. The marketing division of the fruit drink distributor is only interested in selling this drink if the average rating is more than 3.5.Refer to Exhibit 8-3. What is the alternative hypothesis for testing whether the fruit drink distributor should sell this drink?

Answer

Ha: µ< 3.5

Ha: µ = 3.5

Ha: µ > 3.5

Ha: µ < 3.63

Ha: µ ≤ 3.5

19.Exhibit 8-8A production process is working normally if the average weight of a manufactured steel bar is at least 1.3 pounds. A sample of 50 steel bars yields a mean of 1.26 pounds with a standard deviation of .10 pounds. The question is whether there is sufficient evidence to indicate that the production process needs adjusting, using a .05 significance level. Refer to Exhibit 8-8. (Since the sample size is sufficiently large, use z.) The test procedure is to reject H0 if the value of the test statistic is

Answer

< -1.28 or > 1.28

> 1.646

< -1.645 or > 1.645

< -1.645

< -1.96 or > 1.96

20.Exhibit 8-7Records of student performance show that, in 1992, the average score in a statistics class was 79. In 2002, a statistics class of size 36 had an average score of 71 with a standard deviation of 19. Has the average score declined? Refer to Exhibit 8-7. Testing at the .05 level of significance, what is the conclusion? (Since the sample size is sufficiently large, use z.)

Answer

Based on the sample data, there is sufficient evidence to conclude that the average score in 2002 was less than 79.

Based on the sample data, there is sufficient evidence to conclude that the average score in 2002 was 79.

Based on the sample data, there isn’t sufficient evidence to conclude that the average score in 2002 was less than 79.

Based on the sample data, there isn’t sufficient evidence to conclude that the average score in 2002 was 79.

none of the above

## Scientific Method in Practice

#2 Choose a research article from your field. Write the five hypotheses that this study is testing. Write both the null and alternative hypotheses for each one.

#3 Imagine that you are going to conduct a study. Write the purpose of the study, research questions, and main hypotheses. Write both the null and alternative hypotheses for each one.

## Homosexual relations and church attendance

Homosexual relations never Several times a year Every week Total

Always wrong 62 40 109 211

Not wrong at all 114 50 35 199

total 176 90 144 400

Church Attendance

A. Which is the dependent variable in the table? Which is the independent Variable ?

B. Calculate the percentages using church attendance as the independent variable for each cell in the table. Is there a relationship between church attendance and views about homosexual relations? If so, how strong is it?

C. Suppose that you respond to your classmate by stating that it is not church attendance that explains views about homosexual relations; rather it is one’s opinion about the nature of right and wrong (I.e., morality) that explains attitudes about homosexual relations. Why might there be a potential problem with your argument? Think in terms of assigning variable to the dependent and independent categories.