## SPSS Experimental Analysis – Diet of Rats

A researcher interested in studying the effects of three experimental diets with varying fat contents on the total lipid (fat) level in plasma collected the data in the file FatInDiet.txt.

Fifteen male subjects who were within 20 percent of their ideal body weight were grouped into five blocks according to age. Within each block the experimental diets were randomly assigned to the three subjects. The outcome measure was the total reduction in lipid level after the subjects were on the diet for a fixed period of time.

The columns of the data set correspond to (1) reduction in lipid level; (2) block (1=15-24 years; 2=25-34 years; 3=35-44 years; 4=45-54 years; and 5=55-64 years; and (3) fat content of diet (1=extremely low; 2=fairly low; and 3=moderately low).

Perform a complete analysis of the completely randomized block experiment to determine if fat content affects the mean reduction in lipid level of male subjects. As part of your analysis determine if there is evidence that the effect of diet varies by age of the subject. Report in your analysis confidence intervals comparing the difference in mean lipid reduction between all pairs of treatment conditions for the oldest group of patients in the study.

## Interpreting the P-Value

In 280 trials with a professional touch therapist, correct responses to a question were obtained 1223 times. The P-value of 0.979 is obtained when testing the claim that p > 0.5 (the proportion of correct responses is greater than the proportion of 0.5 that would be expected with random chance). What is the value of the sample proportion? Based on the P-value of 0.979, what should we conclude about the claim that p > 0.5?

a. Refer to Exercise 3 and distinguish between the value of p and the P-value

b. If the P is low the null must go. If the p is high, the null will fly. What does this mean?

## Using Statistics to Analyze a Scenario

Scenario: You are interested in researching a new ambulance dispatching method. The new method is supposed to be more efficient and uses half the communications personnel of the old method, saving over $300,000 annually. You want to know how the new method compares to the old method in terms of how many minutes it takes ambulances to get to dispatched calls, since you are not willing to switch if the new method increases the dispatch time significantly. For the research study, calls are randomly assigned to one dispatch method or the other. The Old and the New methods are then to be compared. Your level of significance is set at p=.05. You measure the dispatch time on the next 500 calls. At the completion of the study, you obtain the following results for the comparison of the two methods.

Results:

– The p value you obtain is p = .061.

– Mean dispatch time “Old” method=2.5 minutes.

– Mean dispatch time “New” method=2.8 minutes.

– Mean difference .3 minutes.

– 95% CI of difference -.5 to + 1.1 minutes.

Complete the following exercises based on this scenario. Use only 1-2 sentences to answer each question!

1. Write a potential null hypothesis for the study.

2. Write a potential non-directional alternative hypothesis for the study.

3. Write a potential directional alternative hypothesis for the study.

4. Do the two dispatch methods differ significantly? How do you know?

5. Based on your result in #4, which dispatch method should you use? Why?

6. If the true situation is that the two methods do not differ significantly (null is true) and your research shows that the “Old” method is significantly faster, (reject null), what type of error have you made? What are the potential consequences in this case?

7. If the true situation is that the “Old” method is significantly faster (null is false) and your research shows that the two methods do not differ significantly (accept null), what type of error have you made? What are the potential consequences in this case?

## Statistics Problem Set: Confidence Level

1) For the following sequence: 2,4,7,3,9,4,7,11 find the range, mode, median, mean, IQR, standard deviation, variance and whether it is normally distributed (with explanation)

2) An investor wants to assess at confidence level of 98% if a medicament can improve the marks of students. He notices that for the following daily dozes: 2,2.5,3,4.5,7 mg, the improvement over the students that studied the same but took no mental boosters were of: 3,4,4.6,6, 8.5 percent. At a confidence interval of 99%, what results would you expect for the students who take a daily dose of 6.5 mg? show full calculations.

3) A T 95 tank weighs 75 tons. A businessman wants to buy a couple of thousand tanks for hunting expedition in Texas. He will buy only if the tanks are really as specified and refuse to buy them if they statistically they weigh less. He takes a sample of 8 tanks and finds their average weight at 74.7 tons with a standard deviation of 0.3 tons. He wants to use a significance level of 2%. What will be his best course of action. Show all the work and justify your answer.

4) A car has a gas mileage of 500 miles per gallon of gasoline, with a standard deviation of 10 miles per gallon. What would be the gas mileage of the top 40% of such cars? What would be the mileage range for 70% of the cars? What %of cars will have a mileage of 530 miles per gallon of less?

5) The probability that a student answers a multiple question correctly is 27%, in an exam with 6 questions. What would be the probability the students answers correctly at random: no question, all questions, 6 questions, less than two and more than 4 questions? Answer individually all these questions, show all the work and explain your logic.

6) Mr. G. just won the jackpot with one ticket. He chose correctly 6 numbers out of 52 numbers and 2 stars our out of 12 stars. What was his probability of winning when he bought the ticket? Show full work and explain your answer.

## Conducting and Econometrics Analysis

An investigator analysing the relationship between food expenditure, disposable income and prices in the US using annual data over the period 1959-83 computes the following regression

log(FOOD) = 4.7377 + 0.1069TIME + 0.3506log(PDI) – 0.5086log(PRICE)

(0.6805) (0.0033) (0.0899) (0.1010)

FOOD Total household expenditure on food

TIME A time trend

PDI Personal disposable income

PRICE The price of food deflated by a general price index

Figures in parentheses are standard errors

(i) Give an economic interpretation of the coefficients on log(PDI) and log(PRICE)

(ii) Test the hypothesis (using a 5% significance level) that the coefficient of log(PRICE) is equal to zero against the alternative that it is nonzero.

(iii) Test the hypothesis (using a 5% significance level) that the coefficient of log(INCOME) is equal to 1 against the alternative that is significantly different from 1.

You are now given the following extra information

SST = sum(y_t – mean(Y))^2 = 0.53876

SSR = sum(e_t)^2 = 0.0046276

(iv) Compute the SSE and R^2 for the above regression

(v) Test the joint hypothesis (at the 5% level) that the three ‘slope’ coefficients are all equal to zero against the alternative that at least one ‘slope’ coefficient is non-zero.

## Statistics Problem: Bacteria in Carpeted Rooms

Researchers wanted to determine if carpeted rooms contained more bacteria than uncarpeted rooms. To determine the amount of bacteria in a room, researchers pumped the air from the room over a Petri dish at a rate of 1 cubic foot per minute for eight carpeted rooms and eight uncarpeted rooms. Colonies of bacteria were allowed to form in the 16 Petri dishes. The results were collected. A normal probability plot and box plot indicate the data are approximately normally distributed with no outliers. The data is as follows in bacteria per cubic foot:

Carpeted: 11.8, 10.8, 8.2, 10.1, 7.1, 14.6, 13.0, 14.0

Uncarpeted: 12.1, 12.0, 8.3, 11.1, 3.8, 10.1, 7.2, 13.7

Determine using the appropriate hypothesis testing technique if carpeted rooms have more bacteria than uncarpeted rooms at the .05 level of significance.

## Do actively managed funds fail to outperform the overall market?

It is believed that most actively managed funds fail to outperform the overall market. To empirically test this statement, an analyst has collected the following secondary data to examine whether or not the mean return of an index which represents all the actively managed funds in the market is statistically different from the average return of the overall market.

Indexes # Observations Average Annualized

Annualized Standard

Return Deviation

Active

Fund 101 12.10% 15.73%

Index

Benchmark

Market 101 4.60% 9.98%

Index

a. Conduct an appropriate test to examine whether the two samples have the same population variances at the 2% level of significance.

b. Assume that the two samples have the same variances; conduct an appropriate test to examine whether the two indexes have the same mean at the level of 0.01.

c. Conduct appropriate test to examine whether the annualized mean return for the active fund index is different from 9% at the 0.01 significance level.

d. Construct a 95% confidence interval for the mean monthly returns of benchmark market index.

e. Compare and contrast the use of test statistic/critical value approach and the p-value approach to condut a hypothesis test.

## Statistics Problem Set: Buena School District Bus

1. In a market test of a new chocolate raspberry coffee, a poll of 400 people from Dobbs Ferry showed 250 preferred the new coffee. In Irvington, 170 out of 350 people preferred the new coffee. To test the hypothesis that there is no difference in preferences between the two villages, what is the alternate hypothesis?

a. H1: p1 < p2

b. H1: p1 > p2

c. H1: p1 = p2

d. H1: p1 1p2

2. The regression equation is Y = 29.29 – 0.96X, the sample size is 8, and the standard error of the slope is 0.22. What is the test statistic to test the significance of the slope?

a. z = -4.364

b. z = 4.364

c. t = -4.364

d. t = -0.96

3. Which condition must be met to conduct a test for the difference in two sample means using a z-statistic?

a. Data must be at least of nominal scale

b. Populations must be normal

c. Standard deviations of the two populations must be known

d. Samples are dependent

4. What chart helps to identify the relatively few factors that impact the performance of a manufacturing or service process?

a. SPC

b. Pareto analysis

c. Fishbone chart analysis

d. Diagnostic chart

5. Using a 5% level of significance and a sample size of 25, what is the critical value for a one-tailed hypothesis test?

a. 1.708

b. 1.711

c. 2.060

d. 2.064

6. Assuming the population variances are known, the population variance of the difference between two sample means is

a. The sums of the two means

b. The sum of the variances for each population

c. The sum of the standard deviations for each population

d. The sum of the sample sizes for each population

7. Which of the following can be used to test the hypothesis that two nominal variables are related?

a. A contingency table

b. A chi-square table

c. An ANOVA table

d. A scatter diagram

Homework help:

Refer to Buena School District bus data.

a. Find the median maintenance cost and the median age of the buses. Organize the data into a two-by-two contingency table, with buses above and below the median of each variable. Determine whether the age of the bus is related to the amount of the maintenance cost. Use the .05 significance level.

b. Is there a relationship between the maintenance costs and the manufacturer of the bus? Use the breakdown in part (a) for the buses above and below the median maintenance costs and the bus manufacturers to create a contingency table. Use the .05 significance level.

c. Use statistical software and the .05 significance level to determine whether it is reasonable to assume that the distributions age of the bus, maintenance cost, and mile traveled last month follow a normal distribution.

## Testing the Different Performance of Students with and without Sleep

a. Research has shown that losing a few hours of sleep can have a significant effect on performance of some simple tasks. To demonstrate this phenomenon, a sample of n = 25 college students was given a simple problem solving task at noon one day and again at noon on the following day. The students were not permitted any sleep a full night between the two tests. For each student, the difference between the first and second score was recorded. For this sample, the students averaged Md = 4.7 better on the first test with a variance of 64 for the difference of scores.

b. Do the data indicate a significant change in problem solving ability? Use a two tailed test with ?=.01. Be sure to show all formulas, steps, processes and calculations for all parts of the

## Statistical T-Tests for Means

The publisher of Celebrity Living claims that the mean sales for personality magazines that feature people such as Angelina Jolie or Paris Hilton are 1.5 million copies per week. A sample of 10 comparable titles shows a mean weekly sales last week of 1.3 million copies with a standard deviation of 0.9 million million copies. Does this data contradict the publisher’s claim? Use a 0.01 significance level. Show all work.

## “t” Value & Developing a Research Question

Calculate the “t” value for independent groups for the following data using the formula presented in the module. Check the accuracy of your calculations. Using the raw measurement data presented above, determine whether or not there exists a statistically significant difference between the salaries of female and male human resource managers using the appropriate t-test.

Develop a research question, testable hypothesis, confidence level, and degrees of freedom. Draw the appropriate conclusions with respect to female and male HR salary levels. Report the required “t” critical values based on the degrees of freedom.

Salary Level

Female HR Directors Male HR Directors

$50,000 $58,000

$75,000 $69,000

$72,000 $73,000

$67,000 $67,000

$54,000 $55,000

$58,000 $63,000

$52,000 $53,000

$68,000 $70,000

$71,000 $69,000

$55,000 $60,000

*Do not forget what we all learned in high school about “0”s.

## Statistics Variance Problem

We are interested in determining whether or not the variances of the sales at two music stores (A & B) are equal. A sample of 10 days of sales at store A has a standard deviation of 30, while a sample of 16 days of sales from store B has a standard deviation of 20.

1) What is the value of the Observed test statistic?

2) What are the results of the hypothesis test at a significance level of 0.05?

## Distribution Analysis: Statistics for Business Textbook Problems

6.8 For each of the following rejection regions, sketch the sampling distribution for z and indicate the location of the rejection region.

a. z>1.96

b. z>1.645

c. z>2.575

d. z<-1.28

e. z<-1.645 or z>1.645

f. z<-2575 or z>2.575

g. For each of the rejection regions specified in parts a-f, what is the probability that a Type I error will be made?

6.10 Play Golf America Program. The Professional Golf Association (PGA) and Golf Digest have developed the Play Golf America program, in which teaching professionals at participating golf clubs provide a free 10-minute lesson to new customers. According to Golf Digest (July 2008), golf facilities that participate in the program gain, on average, $2,400 in green fees, lessons, or equipment expenditures. A teaching professional at a golf club believes that the average gain in greens fees, lessons, or equipment expenditures for participating golf facilities exceeds $2,400.

a. In order to support the claim made by the teaching professional, what null and alternative hypotheses should you test?

b. Suppose you select alpha = 0.05. Interpret this value in the words of the problem.

c. For alpha = 0.05, specify the rejection region of a large sample test.

6.20 A random sample of 100 observations from a population with standard deviation 60 yielded a sample mean of 110.

a. Test the null hypothesis that u = 100 against the alternative hypothesis that u > 100 using alpha = 0.05. Interpret the results of the test.

b. Test the null hypothesis that u = 100 against the alternative hypothesis that u =/ 100 using alpha = 0.05. Interpret the results of the test.

c. Compare the results of the two tests you conducted. Explain why the results differ.

6.22 Accounting and Machiavellianism. Refer to the Behavioral Research in Accounting (Jan. 2008) study of Machiavellian traits in accountants, Exercise 5.17 (p. 279). A Mach rating score was determined for each in a random sample of 122 purchasing managers with the following results: x-bar = 996, s = 12.6. Recall that a director of purchasing at a major firm claims that the true mean Mach rating score of all purchasing managers is 85.

a. Suppose ou want to test the director’s claim. Specify the null and alternative hypothesis for the test.

b. Give the rejection region for the test using alpha = 0.10.

c. Find the value of the test statistic.

d. Use the result, part c, to make the appropriate conclusion.

## Statistical Testing – T Tests and Proportions

1. Do men and women have different beliefs on the ideal number of children in a family? Based on the following GSS2008 Data and the obtained t statistics, what would you conclude? (0.05)

Men Women

Mean ideal number of children 3.06 3.22

Standard Deviation 1.92 1.99

N 610 610

2. Data from the MTF2008 reveal that 75.7% (493 out of 651) of males and 62.2% (405 out of 651) of females reported trying alcohol. You wonder whether there is any difference between males and females in the population trying alcohol. Use a test of the difference between proportions when answering this question. What is the research hypothesis? Should you conduct a one-or a two-tailed test? Why? Test your hypothesis at the 0.05 level. What do you conclude?

## Non-Parametric Test on Software Adoption

Company W is testing a sales software. Their sales force of 500 people is divided into four regions: Northeast, Southeast, Central and West. Each sales person is expected to sell the same amount of products. During the last 3 months, only half of the sales representatives in each region were given the software program to help them manage their contacts.

The VP of Sales at WidgeCorp, who is comfortable with statistics, wants to know the possible null and alternative hypotheses for a non-parametric test on this data using the chi-square distribution. A non-parametric test is used on data that is qualitative or categorical, such as gender, age group, region, and color. It is used when it doesn’t make sense to look at the mean of such variables.