## Dow Jones Industrial Average (DJIA)

The data in below represent the closing value of the Dow Jones Industrial Average (DJIA) over the 29 year period from 1979 through 2007.

Year Coded Year DJIA

1979 0 838.7

1980 1 964.0

1981 2 875.0

1982 3 1046.5

1983 4 1258.6

1984 5 1211.6

1985 6 1546.7

1986 7 1896.0

1987 8 1938.8

1988 9 2168.6

1989 10 2753.2

1990 11 2633.7

1991 12 3168.8

1992 13 3301.1

1993 14 3754.1

1994 15 3834.4

1995 16 5117.1

1996 17 6448.3

1997 18 7908.3

1998 19 9181.4

1999 20 11497.1

2000 21 10788.0

2001 22 10021.5

2002 23 8341.6

2003 24 10453.9

2004 25 10788.0

2005 26 10717.5

2006 27 12463.2

2007 28 13264.8

1. When fitting a third order autoregressive model to the DJIA, what is the parameter estimate a3 for A3?

a) 0.0863

b) 0.0076

c) 0.0017

2. When testing for the significance of the third order autoregressive parameter with ? = 0.05, can A3 be deleted?

a) Yes

b) No

c) Inconclusive

3. If necessary, when fitting a second order autoregressive model to the DJIA, what is the parameter estimate a2 for A2?

a) 0.2034

b) 0.0886

c) 0.0863

4. When testing for the significance of the second?order autoregressive parameter with ? = 0.05, can A2 be deleted?

a) Yes

b) No

c) Inconclusive

5. If necessary, when fitting a first order autoregressive model to the DJIA, what is the parameter estimate a1 for A1?

a) 1.0213

b) 1.1018

c) 1.0959

6. When testing for the significance of the first order autoregressive parameter with ? = 0.05, can A1 be deleted?

a) Yes

b) No

c) Inconclusive

7. Forecast the DJIA for 2008 using the most appropriate model considered in 1) through 6).

a) 11040.1273

b) 13856.9935

c) 13879.4358.

## Dummy Variables Required

Choose Data1 or Data2, and work the following problems:

The number of dummy variables is the number of levels of the categorical variable less one because the one left out is quantified by the intercept. Each coefficient of the dummy variables effects a shift in the intercept to reflect the effect of a different level of the categorical variable. If a categorical variable has 5 levels (A, B, C, D, E), how many dummy variables are required? If the levels are professor, teaching assistant, and student, how many dummy variables are required?

*Layout the selected data for regression by adding a dummy variable for Category and a variable for the interaction of variables Category and X.

*Perform multiple regression analysis of Y on X, the dummy variable, and the interaction variable.

*How significant is the regression model?

*Is the interaction significant?

*If the interaction is significant, can the regression coefficients be interpreted (trick question)?

*Predict Y in the highlighted cells labeled Yhat.

*If the interaction is significant, interpret the regression by examination of a plot of Yhat vs. X with separate series for Categories A and B.

*When interaction is significant, why do we examine group means or a plot of the regression equation instead of interpreting the regression coefficients directly?

*Ignoring the interaction effect, how much does Y change when X changes +1 unit? How much does Y change when Category changes +1 unit?

*Are any of the predictions invalid?

## Statistical Analysis and P-Value

A clinical psychologist is treating 25 patients with clinical depression. She wants to find out whether these patients score differently than the general population on an emotional response scale with a population mean, ?= 9.5. She is only interested in whether there is a difference, but not in the direction of the difference at this point. Perform a single-sample t-test to evaluate this hypothesis, for a TWO tail test.

Are the following answers correct, can you help me find the sample t, and can you help me with the following questions?

1. Are your results statistically significant? How did you make this decision?

2. What is your decision about your hypotheses based on these results? Include detailed statements about each hypothesis.

(range 0-15) Null Hypothesis (H0) µ = 25

Research Hypothesis (H1) µ /< = 25

One- or Two-tailed? TWO

Degrees of Freedom (df = N-1) 14.00

Population Mean (?) 25.00

Sample Standard Deviation (s) 2.29

Sample Mean (M) 5.68

Alpha 0.01

Critical t value (cut-off score) 2.98 , -2.98

Sample t __________

Critical p-value 0.01

Sample p-value 0.01

## Weight Analyzation of Boston vs Vermont Shingles

The manufacturer of Boston and Vermont asphalt shingles knows that product weight is a major factor in the customer’s perception of quality. Moreover, the weight represents the amount of raw materials being used and is therefore very important to the company from a cost standpoint. The last stage of the assembly line packages the shingles before they are placed on wooden pallets. Once a pallet is full (a pallet for most brands holds 16 squares of shingles), it is weighed, and the measurement is recorded. The data file “Pallet” (attached) contains the weight (in pounds) from a sample of 368 pallets of Boston shingles and 330 pallets of Vermont shingles. Completely analyze the differences in the weights of the Boston and Vermont shingles, using a=0.05

## Statistics: Sex, Weight, GPA, Smoker and Arm and Leg Length

1) Test of two means. You should select a hypothesis you are interested in testing and then use a test of two means to test this hypothesis. For example, you may be interested in testing whether the GPA of females is higher than that of males in that class. One is going to draw the inference by using a random sample with replacement of size 25 from each group. Report the p-value of the test. Perform the test using a significance level a = .05. Use graphical methods to present the two populations of interest.

2) Paired difference test. You should select another hypothesis that you are interested in and select it in such a way that the paired difference test is appropriate. For example, suppose one is interested in testing whether the right arm length is equal to left arm length. Then, in this case, a paired difference test is appropriate. Draw inference by using a random sample with replacement of 25 from each group. Perform the test using a significance level a that you choose.

3) Regression and correlation. Pick any two columns that have a correlation coefficient greater than 0.6 or less than -0.6. Make sure to pick the one with the highest absolute value.

a. Draw the scatter diagram of Y against X, and explain any noted significance.

b. Compute correlation coefficient (? or r), and what do you find? Make sure to explain thoroughly what you mean.

c. Obtain a and b of the regression equation defined as Y = a + b X, and the Coefficient of Determination (r2) from the Excel regression output, what can you tell? What is the relationship between r2 and ??

d. Compute the above statistics in 4) step by step using SXiYi, SXi, SYi, SXi2, SYi2 from Excel, and compare them with the results in C).

e. Draw the fitted regression line on the scatter diagram, obtain the residuals and plot them on the scatter diagram too. Explain your findings.

f. Write a paragraph or so on any observations you may have on the data, regression estimates or the regression residuals;

g. Calculate the additional y values for at least five other x values that do not appear in our data. Include that information in your report above and comment on whether you believe the calculate y value seems realistic and consistent with the other information you have calculated in each of the parts above..

## Statistics – Testing with SPSS

A home improvement store recently purchased a new paint color-mixing machine. The machine is rated to produce 6 gallons of mixed paint every minute. The store’s manager suspects that the machine is underperforming. In order to test his hypothesis, the manager tests the machine’s output by mixing 10 randomly chosen colors and measuring the output rate of the machine. The data is in mix.sav. State the hypothesis, conduct the appropriate test using SPSS and interpret the results.

## Infant Death and Mortality Rates

Is there a significant relationship between the death of infants by race and the cause of death?

Infant deaths and mortality rates for the top 3 leading cause of death for African Americans, 2007. (Rates per 100,000 live births)

Cause of Death # African African American Non-Hispanic Non-Hispanic African American &

(By rank) American Deaths Death Rate White Deaths White Death Rate Non-Hispanic Rate

(1) Low-Birthweight 1,864 297.2 1,767 76.5 3.9

(2) Congenital malformations 1,037 165.3 2,867 124.1 1.3

(3) Sudden infant death syndrome (SIDS) 677 107.9 1,341 58.0 1.9

(4) Maternal Complications 751 95.5 751 32.5 2.9

Please include hypotheses, research design, statistical methods, results, and interpretation with this project.

## Hypothesis Testing and Statistical Calculations

1. HO: µ = 70

H1: µ > 70

? = 20, n = 100, xbar = 80, ? = .01

a) calculate the value of the test statistic

b) set up the rejection region.

c) determine the p-value

d) interpret the results

attach file please

2. Draw the operating characteristic curve for n = 10, 50, and 100 for the following test:

Ho: ? = 400

Hi: ? > 400

? = .05, ? = 50

Attach file please

3. Determine the sample size necessary to estimate a population proportion to within .03 with 90% confidence assuming you have no knowledge of the approximate value of the sample proportion.

4. Some traffic experts believe that the major cause of highway collisions is the differing speeds of cars. That is, when some cars are driven slowly while others are driven at speeds well in excess of the speed limit, cars tend to congregate in bunches, increasing the probability of accidents. Thus, the greater the variation in speeds, the greater will be the number of collisions that occur. Suppose that one expert believes that when the variance exceeds 18 mph, the number of accidents will be unacceptably high. A random sample of the speeds of 245 cars on a highway with one of the highest accidents rates in the country is taken. Can we conclude at the 10% significance level that the variance in speeds exceeds 18 mph? Data is in file below.

5. How much time do executives spend each day reading and sending e-mail? A survey was conducted to obtain this information. The response (in minutes) in file below. Can we infer from these data that the mean amount of time spent reading and sending e-mail differs from 60 minutes each day? Data is in file below.

## Level of Significance (Human Body Temperature Test)

In an article in the Journal of Statistics Education (vol. 4, no. 2, 1996), Allen Shoemaker describes a study that was reported in the Journal of the American Medical Association.* It is generally accepted that the mean body temperature of adult humans is 98.6 degF . In his article, Shoemaker uses the data from the JAMA article to test this hypothesis. Here is a summary of his test.

Claim: The body temperature of adults is 98.6 degF.

Ho; U = 98.6 F (Claim) Ha: u not equal to 98.6 F

Sample size N = 130

Population: Adult human temperatures (Fahrenheit)

Distribution: Approximately normal

Test Statistics; x = 98.25, s = .73

Men’s Temperatures

96 | 3

96 | 7 9

97 | 0 1 1 1 2 3 4 4 4 4

97 | 5 5 6 6 6 7 8 8 8 8 9 9

98 | 0 0 0 0 0 0 1 1 2 2 2 2 3 3 4 4 4 4

98 | 5 5 6 6 6 6 6 6 7 7 8 8 8 9

99 | 0 0 0 1 2 3 4

99 | 5

100 | 0

Key 96/3 = 96.3

Women’s Temperatures

96 | 4

96 | 7 8

97 | 2 2 4

97 | 6 7 7 8 8 8 9 9 9

98 | 0 0 0 0 0 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4

98 | 5 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 8 9

99 | 0 0 1 1 2 2 3 4

99 | 9

100 | 8 Key: 96/ 4 = 96.4

1. Complete the hypothesis test for all adults (men and women) by performing the following steps. Use a level of significance of a = 0.05 .

(a) Sketch the sampling distribution.

(b) Determine the critical values and add them to your sketch.

(c) Determine the rejection regions and shade them in your sketch.

(d) Find the standardized test statistic. Add it to your sketch.

(e) Make a decision to reject or fail to reject the null hypothesis.

(f ) Interpret the decision in the context of the original claim.

2. If you lower the level of significance to .01 does your decision change? Explain your reasoning.

3. Test the hypothesis that the mean temperature of men is 98.6 What can you conclude at a level of

significance of .01?

4. Test the hypothesis that the mean temperature of women is 98.6 What can you conclude at a

level of significance of .01?

5. Use the sample of 130 temperatures to form a 99% confidence interval for the mean body

temperature of adult humans.

6. The conventional “normal” body temperature was established by Carl Wunderlich over 100 years

ago. What, in Wunderlich’s sampling procedure, do you think might have led him to an incorrect

conclusion?

## Develop the statistical assumption

What is the statistical assumption?

What would be considered the null hypothesis?

The alternative hypothesis?

Use the following information:

The test of interest in this problem is the t-test for comparing two population means, the population mean hours worked last week for males and the population mean hours worked last week for females. The first set of data to be aware of under Levene’s test for equality of variances column”F” and”Sig”. The F value is 1.175, and the significance level of this test, also known as the”p-value” is .279. This data is insufficient to show the variances are different. To analyze the following data we shall refer to all lines labeled “Numbers of hours worked last week” and “Equal variances assumed.” This assumption, along with the assumption that the sample sizes are large enough for the sampling distributions of the sample means to be normally distributed. The projection only confirms the necessity to use the two-subject t-test.

The Sig (two-tailed) projected a value of 0.000. This means that if the population means were the same, we have seen an event that would occur less than once in 1000 tries this is very strong evidence that the two population means are different.

At a 95% confidence interval and a degree of freedom of 1488, the two end-points at this interval were observed to be 8.374 and 13.050. Due to the positive nature of the interval, there is strong evidence that the mean of population 1 is greater than the mean of population 2.

In summation, both a hypothesis test and a confidence interval test were performed for comparison of the two population means. The procedures used were t-test, independent sampling, assuming equal variances, and sample means having a normal distribution. The p-value for the hypothesis test was very small, strong evidence that the two means are different. The confidence interval did not contain zero. Evidence at the 95% level of confidence show further proof that the two means are different.

## Hypothesis Testing With Two Samples

10.9 A problem with a telephone line that prevents a customer from receiving or making calls is disconcerting to

both the customer and the telephone company. The data in the file PHONE represent samples of 20 problems reported to two different offices of a telephone company and the time to clear these problems (in minutes) from the customers’ lines:

Central Office I Time to Clear Problems (minutes)

1.48 1.75 0.78 2.85 0.52 1.60 4.15 3.97 1.48 3.10

1.02 0.53 0.93 1.60 0.80 1.05 6.32 3.93 5.45 0.97

Central Office II Time to Clear Problems (minutes)

7.55 3.75 0.10 1.10 0.60 0.52 3.30 2.10 0.58 4.02

3.75 0.65 1.92 0.60 1.53 4.23 0.08 1.48 1.65 0.72

a. Assuming that the population variances from both offices are equal, is there evidence of a difference in the mean waiting time between the two offices? (use alpha = 0.05.)

b. Find the p-value in (a) and interpret its meaning.

c. What other assumption is necessary in (a)?

10.11 Digital cameras have taken over the majority of the point-and-shoot camera market. One of the important features of a camera is the battery life as measured by the number of shots taken until the battery needs to be recharged.

The data in the file Digitalcameras contain the battery life of 31 subcompact cameras and 15 compact cameras (data extracted from “Cameras,” Consumer Reports, November 2006, pp. 20-21).

a. Assuming that the population variances from both types of digital cameras are equal, is there evidence of a difference in the mean battery life between the two types of digital cameras.

b. Determine the p-value in (a) and interpret its meaning.

10.21 In industrial settings, alternative methods often exist for measuring variables of interest. The data in the file Measurement (coded to maintain confidentiality) represent measurements in-line that were collected from an analyzer during the production process and from an analytical lab (extracted from M. Leitnaker, “Comparing Measurement Processes: In-line Versus Analytical Measurements,” Quality Engineering, 13, 2000-2001, pp. 293-298).

a. At the 0.05 level of significance, is there evidence of a difference in the mean measurements in-line and from an analytical lab?

b. What assumption is necessary about the population distribution in order to perform this test?

10.23 A newspaper article discussed the opening of a Whole Foods Market in the Time-Warner building in New York City. The following data (stored in the file Wholefoods1) compared the prices of some kitchen staples at the new Whole Foods Market and at the Fairway supermarket located about 15 blocks from the Time-Warner building:

a. At the 0.01 level of significance, is there evidence that the mean price is higher at Whole Foods Market than at the Fairway supermarket?

b. Interpret the meaning of the p-value in (a).

c. What assumption is necessary about the population distribution in order to perform the test in (a)?

10.49 The director of training for a company that manufactures electronic equipment is interested in determining

whether different training methods have an effect on the productivity of assembly-line employees. She randomly assigns 21 of the 42 recently hired employees to a computer-assisted, individual-based training program. The other 21 are assigned to a team-based training program. Upon completion of the training, the employees are evaluated on the time (in seconds) it takes to assemble a part. The results are in the data file Training.

a. Using a 0.05 level of significance, is there evidence of a difference between the variances in assembly times (in seconds) of employees trained in a computer-assisted, individual-based program and those trained in a team-based program?

b. On the basis of the results in (a), which t test defined in Section 10.1 should you use to compare the means of the two training programs? Discuss.

10.45 A professor in the accounting department of a business school claims that there is much more variability in the final exam scores of students taking the introductory accounting course who are not majoring in accounting than for students taking the course who are majoring in accounting.

Random samples of 13 non-accounting majors and 10 accounting majors are taken from the professor’s class roster in his large lecture, and the following results are computed based on the final exam scores:

Non-Accounting: n = 13 S2 = 210.2

Accounting: n = s2 = 36.5

a. At the 0.05 level of significance, is there evidence to support the professor’s claim?

b. Interpret the p-value.

c. What assumption do you need to make in (a) about the two populations in order to justify your use of the F test?

## College Basketball

College basketball is big business, with coaches’ salaries, revenues, and expenses in millions of dollars. The data in the file Colleges-basketball (attached) contains the coaches’ salaries and revenues for college basketball at selected schools in a recent year (data extracted from R. Adams, “Pay for Playoffs,”The Wall Street Journal, March 11-12, 2006, pp. P1, P8). You plan to develop a regression model to predict a coach’s salary based on revenue.

A) Assuming a linear relationship, use the least-squares method to compute the regression coefficients b0 and b1.

B) Interpret the meaning of the Y intercept, b0, and the slope, b1, in this problem.

C) Use the prediction line developed in (a) to predict the coach’s salary for a school that has revenue of $7 million.

D) Compute the coefficient of determination, r2, and interpret its meaning.

E) Perform a residual analysis on your results and evaluate the regression assumptions.

F) At the 0.05 level of significance, is there evidence of a linear relationship between the coach’s salary and revenue?

G) Construct a 95% confidence interval estimate of the population slope.

## Variety of Statistics Problems

I have attached a series of statistical problems. Please provide as much detail as possible. I want to double check my answers for each question.

1. A chain of health-food stores is determining the relationship between the number of times its commercial is broadcast on radio or television weekly and the weekly sales volume. It randomly selects nine weeks and determines the number of times the commercial was broadcast and the corresponding weekly volume of sales as shown below:

Number of Times Weekly Sales Volume

Commercial is Broadcast (in thousands of dollars)

x y

3 42

4 47

5 52

7 72

8 85

9 100

10 115

12 185

20 225

a. Determine the least-squares prediction equation for the line of best fit.

b. Calculate the standard error of the estimate.

c. What is the predicted sales volume when the commercial is broadcast 15 times weekly on the radio or television?

2. The U.S. Food and Drug Administration (FDA) bans the use of hormones in poultry production. Nevertheless, a recent University of California study found the 10% of the consumers surveyed said that they ate less poultry because of concern over hormones. To check on this claim, 84 consumers are randomly selected and it is found that ten of them eat less poultry because of their concern over hormones. Should we reject the results of the University of California study? (Use a 5% level of significance and be sure to include the critical values and test statistic in your answer)

a. Give the null and alternative hypothesis.

b. Give the test statistic and critical values.

c. Would you reject the null or fail to reject the null hypothesis?

3. A 1993 editorial in the Journal of the American Medical Association cited a review of 43 recently published studies. In 26 of them, researchers had found that low calcium intake by humans was linked to bone mass, bone loss, or fractures. Find a 95% confidence interval for the true proportion of studies linking low calcium intake by humans to bone mass, bone loss, or fractures.

4. A publisher wishes to determine the list price for a new algebra book. A survey of the list price of eight competing books sold by other companies showed an average price of $39.95 with a standard deviation of $2.85. Construct a 95% confidence interval for the average list price of a new algebra book.

5. An obstetrician wants to learn whether the amount of prenatal care and the wantedness of the pregnancy are associated. He randomly selects 939 women who had recently given birth and asks them to disclose whether their pregnancy was intended, unintended or mistimed. In addition, they were to disclose when they started receiving prenatal care, if ever. The results of the survey are as follows:

Wantedness of Months Pregnant Before Prenatal Care Began

Pregnancy Less than 3 mos. 3 to 5 mos. More than 5 mos.

Intended 593 26 33

Unintended 64 8 11

Mistimed 169 19 16

a. Using a 5% level of significance, test the null hypothesis for whether the frequency of the wantedness of the pregnancy is independent when the prenatal care began.

b. Compute the chi-square test statistic.

6. The following data represents the flight time (in minutes) of a random sample of seven flights from Los Angeles, Nevada to Newark, New Jersey, on Continental Airlines.

282, 270, 260, 266, 257, 260, 267

a. Compute the range, mode, mean, and median.

b. Compute the variance and standard deviation.

7. The following data represent the hemoglobin (in g/dL) for 20 randomly selected cats:

5.7 8.9 9.6 10.6 11.7

7.7 9.4 9.9 10.7 12.9

7.8 9.5 10.0 11.0 13.0

8.7 9.6 10.3 11.2 13.4

a. Compute the z-score corresponding to the hemoglobin of Buttercup, 7.8 g/dL.

b. Determine the quartiles.

c. Compute the interquartile range, IQR.

8. A large trucking company that delivers fresh fruit wishes that its truck drivers be forced to work overtime. The union claims that the more hours that a truck driver works, the greater the risk of an accident (due to fatigue). To support its claim, the union has gathered the following statistics on the average number of hours worked by a truck driver (per week) and the average number of accidents (per week).

# of Hours Worked: 35 37 39 42 44 46 50

————————————————————————————————————

# of Accidents 1.6 2.2 3.8 4.3 5.6 6.1 7.3

a. Determine the least-squares prediction equation.

b. Calculate the standard error of the estimate.

c. What is the predicted number of accidents when a truck driver is forced to work 48 hours a week.

9. The manager of the Night-All Corporation recently conducted a survey of 196 of its employees to determine the average number of hours that each employee sleeps at night. The company statistician submitted the following information to the management:

∑x = 1479.8 and ∑(x – x bar)2 = 1755

Where x is the number of hours slept by each employee. Find a 95% confidence interval estimate for the average number of hours each employee sleeps at night (be sure to include the critical values in your answer).

10. The following data represent the annual number of days over 1000F for Dallas – Ft. Worth from 1`905 to 2004.

Number of Number of

Days Years

0 – 9 31

10 – 19 39

20 – 29 17

30 – 39 6

40 – 49 4

50 – 59 2

60 – 69 1

a. Find the mean number of days at over 1000

b. Find the standard deviation.

## Hypothesis Testing: Playbill Magazine

Given the following case scenario:

In 2010, Playbill Magazine contacted Boos Allen to determine the mean annual household income of its readers. Using a list of customers provided by Playbill, Boos Allen randomly sampled 300 Playbill customers. From that sample, Boos Allen is confident that the population average Playbill reader’s household income is $119,155, and has a population sample household income standard deviation of $30,000.

Recently two Playbill executives suggested that Playbill’s reader mean average household income has increased and the magazine price should be raised. As Playbill’s new marketing manager, you convince Playbill’s chief operating officer to complete a second survey with Boos Allen to confirm that assertion. Yesterday the new Boos Allen report appeared on your desk. From another sample of Playbill customers taken from a recent list of customers you e-mailed Boos Allen, the 2012 Playbill customer’s profile is a mean annual household income of $124,450 with a population standard deviation of household income unchanged at $30,000.

Now answer the following below

– What is the null hypothesis-both explanation and math equation?

– what is the alternative hypothesis -both explanation and math equation?

– Solve the equation. Would you accept or reject null hypothesis?

-Does the p-value indicate acceptance or rejection of the null using alpha is .05?

– Why why you can be statistically confident that the average amount a food court’s customer spends has increased, decreased, or remains the same, and what would happen if alpha was .01 or .10?

## Business Statistics: Vacation Occupancy Rates

Vacation occupancy rates were expected to be up during March 2008 in Myrtle Beach, SC. Data in the file Occupancy will allow you to replicate the findings presented in the newspaper. The data show units rented and not rented for a random sample of vacation properties during the first week of March 2007 and 2008.

1) Estimate the proportion of units rented during the first week of March 2007 and the first week of March 2008.

2) Provide a 95% confidence interval for the difference in proportions.

3) On the basis of your findings, does it appear March rental rates for 2008 will be up from those a year earlier?