## Research Paper: Major, Age and Gender

Examine the variables in the data file 2004GSS.sav in terms of their labels and values. Develop a research paper with the five sections described below. You should submit your report as one MSWord document with all data and tables copied into that document.

Introduction: The purpose of the paper. Rewrite this section after completing Sections 2-5.

Research Hypotheses: Choose one from the following three variables to be the dependent variable for three alternative hypotheses you will establish:

– Grass

– Fear

– Gunlaw

Choose three other variables in 2004GSS.sav to establish three research hypotheses with the same dependent variable you have selected in Step 4. Each hypothesis should state clearly the direction of the relationship between the pair of variables.

Methods – Secondary Data Analysis: Provide a brief description, in about half a page, of the GSS data in terms of 1) who collected the data, 2) the purpose of the data collection program, 3) data collection method (Experimentation? Self-administered survey? Personal interview? Or existing data?), 4) the study population (i.e., who does the sample represent), and 5) sampling in terms of sample type (e.g., probability/random or none probability/non random?), and 6) sample size. (see 2.2 – 2.5 in the textbook)

Table 1

Descriptive statistics of the variables

Variable Frequency % Mean Std Dev.

Major

Business 253 47.1

Nonbusiness 284 52.9

Gender

Male 253 47.3

Female 282 52.7

Age

Under 30 324 60.6

30 and over 211 39.4

Income $38,620 $17,261

Report descriptive statistics in a table including AGE, RACE, SEX, EDC, INCOME of the respondents. (Hint: Perform descriptive statistics on these variables according to the nature of each variable. For INCOME, you may want to record it into less categories). The following is an example as how to structure the table.

Describe statistical methods you will use to test your three research hypotheses. (Hint: Determine the level of measurement for the variables in each your three hypotheses (in terms of categorical/discrete or continuous/scale).

Findings: Report the results of the observed existence, strength and direction of the relationship (Insert the proper table to where you report the statistics.). (First, perform proper bivariate statistical analysis to test each alternative hypothesis against the H0).

Discussions and conclusion: Do the data bear evidence that support your hypotheses? Any surprises or unexpected results? Your suggestions or recommendations for future studies in terms of data and methods.

## Statistics: College Writing Scores

State college is evaluating a new english composition course for freshmen. A random sample of N=25 freshmen is obtained and the students are placed in the course during their first semester. One year later, a writing sample is obtained for each student and the writing samples are graded using a standard evaluation technique. The average score for the sample is m=76. For the general population of college students, writing scores form a normal distribution of m = 70.

A. If the writing scores for the population have a standard deviation of 20, does the sample provide enough evidence to conclude that the new composition course has a significant effect? Assume a two tailed test with alpha level of 0.05.

B. If the population standard deviation is 10 , is the sample sufficient to demonstrate a significant effect? Again assume a two tailed test with alpha level of 0.05.

C. comparing your answer for parts a and b explain how the magnitude of the standard deviation influences the outcome of a hypothesis test.

## Goodness of Fit Test: Telephone Calls Received

8. During 200 randomly selected minutes, the number of telephone calls received each minute were recorded. At the .05 level of significance, is there any evidence that these data are not Poisson with a mean of 2.8?

Number of

Calls Per Observed

Minute Frequency

0 11

1 46

2 50

3 47

4 28

5 11

6 or more 7

(a) What are the null and alternate hypotheses?

(b) Let X = number of calls per minute. What is the expected frequency of the class X = 3?

9. The following distribution shows the frequencies of aptitude test scores for a random sample of 60 test takers. At the .10 level of significance, is there any evidence that these data are NOT normally distributed with a mean of 72 and a standard deviation of 8?

Observed

CLASS Frequency

50 ? X < 60 10

60 ? X < 70 18

70 ? X < 80 24

80 ? X < 90 6

90 ? X < 100 2

(a) What are the null and alternate hypotheses?

(b) Let X = an aptitude test score. Find the expected frequency of the class: 80 ? X < 90.

## Sensitivity and Specificity

Hypertension

As part of a study performed in Norway, 70,000 people in the general population had their blood pressure measured; two readings were obtained and the second reading was used in the analysis. The people were followed for mortality outcome over a 10-year period after the blood-pressure measurement using death files in the Norwegian Central Bureau of Statistics. The results shown below were obtained from the subgroup of 5,034 men ages 50-59 at baseline:

10-year Mortality Outcome

DBP (mm Hg) Dead Alive Total

100+ 124 295 419

<=99 764 3,851 4,615

Total 888 4,146 5,034

1. If we regard a diastolic blood pressure of >=100 mm Hg as a screening test for predicting mortality over the next 10 years, then what is the sensitivity of the test?

2. What is the specificity of the test?

3. If the subjects in the study sample are considered representative of the general population, then what is the predictive value positive and negative of the test?

4. Suppose the threshold for positivity were changed from 100+ to 95+. Would the sensitivity and specificity increase, decrease, or remain the same?

## Testing the Different Performance of Students with and without Sleep

a. Research has shown that losing a few hours of sleep can have a significant effect on performance of some simple tasks. To demonstrate this phenomenon, a sample of n = 25 college students was given a simple problem solving task at noon one day and again at noon on the following day. The students were not permitted any sleep a full night between the two tests. For each student, the difference between the first and second score was recorded. For this sample, the students averaged Md = 4.7 better on the first test with a variance of 64 for the difference of scores.

b. Do the data indicate a significant change in problem solving ability? Use a two tailed test with ?=.01. Be sure to show all formulas, steps, processes and calculations for all parts of the

## Statistical T-Tests for Means

The publisher of Celebrity Living claims that the mean sales for personality magazines that feature people such as Angelina Jolie or Paris Hilton are 1.5 million copies per week. A sample of 10 comparable titles shows a mean weekly sales last week of 1.3 million copies with a standard deviation of 0.9 million million copies. Does this data contradict the publisher’s claim? Use a 0.01 significance level. Show all work.

## “t” Value & Developing a Research Question

Calculate the “t” value for independent groups for the following data using the formula presented in the module. Check the accuracy of your calculations. Using the raw measurement data presented above, determine whether or not there exists a statistically significant difference between the salaries of female and male human resource managers using the appropriate t-test.

Develop a research question, testable hypothesis, confidence level, and degrees of freedom. Draw the appropriate conclusions with respect to female and male HR salary levels. Report the required “t” critical values based on the degrees of freedom.

Salary Level

Female HR Directors Male HR Directors

$50,000 $58,000

$75,000 $69,000

$72,000 $73,000

$67,000 $67,000

$54,000 $55,000

$58,000 $63,000

$52,000 $53,000

$68,000 $70,000

$71,000 $69,000

$55,000 $60,000

*Do not forget what we all learned in high school about “0”s.

## Statistics Variance Problem

We are interested in determining whether or not the variances of the sales at two music stores (A & B) are equal. A sample of 10 days of sales at store A has a standard deviation of 30, while a sample of 16 days of sales from store B has a standard deviation of 20.

1) What is the value of the Observed test statistic?

2) What are the results of the hypothesis test at a significance level of 0.05?

## Distribution Analysis: Statistics for Business Textbook Problems

6.8 For each of the following rejection regions, sketch the sampling distribution for z and indicate the location of the rejection region.

a. z>1.96

b. z>1.645

c. z>2.575

d. z<-1.28

e. z<-1.645 or z>1.645

f. z<-2575 or z>2.575

g. For each of the rejection regions specified in parts a-f, what is the probability that a Type I error will be made?

6.10 Play Golf America Program. The Professional Golf Association (PGA) and Golf Digest have developed the Play Golf America program, in which teaching professionals at participating golf clubs provide a free 10-minute lesson to new customers. According to Golf Digest (July 2008), golf facilities that participate in the program gain, on average, $2,400 in green fees, lessons, or equipment expenditures. A teaching professional at a golf club believes that the average gain in greens fees, lessons, or equipment expenditures for participating golf facilities exceeds $2,400.

a. In order to support the claim made by the teaching professional, what null and alternative hypotheses should you test?

b. Suppose you select alpha = 0.05. Interpret this value in the words of the problem.

c. For alpha = 0.05, specify the rejection region of a large sample test.

6.20 A random sample of 100 observations from a population with standard deviation 60 yielded a sample mean of 110.

a. Test the null hypothesis that u = 100 against the alternative hypothesis that u > 100 using alpha = 0.05. Interpret the results of the test.

b. Test the null hypothesis that u = 100 against the alternative hypothesis that u =/ 100 using alpha = 0.05. Interpret the results of the test.

c. Compare the results of the two tests you conducted. Explain why the results differ.

6.22 Accounting and Machiavellianism. Refer to the Behavioral Research in Accounting (Jan. 2008) study of Machiavellian traits in accountants, Exercise 5.17 (p. 279). A Mach rating score was determined for each in a random sample of 122 purchasing managers with the following results: x-bar = 996, s = 12.6. Recall that a director of purchasing at a major firm claims that the true mean Mach rating score of all purchasing managers is 85.

a. Suppose ou want to test the director’s claim. Specify the null and alternative hypothesis for the test.

b. Give the rejection region for the test using alpha = 0.10.

c. Find the value of the test statistic.

d. Use the result, part c, to make the appropriate conclusion.

## Statistical Testing – T Tests and Proportions

1. Do men and women have different beliefs on the ideal number of children in a family? Based on the following GSS2008 Data and the obtained t statistics, what would you conclude? (0.05)

Men Women

Mean ideal number of children 3.06 3.22

Standard Deviation 1.92 1.99

N 610 610

2. Data from the MTF2008 reveal that 75.7% (493 out of 651) of males and 62.2% (405 out of 651) of females reported trying alcohol. You wonder whether there is any difference between males and females in the population trying alcohol. Use a test of the difference between proportions when answering this question. What is the research hypothesis? Should you conduct a one-or a two-tailed test? Why? Test your hypothesis at the 0.05 level. What do you conclude?

## Non-Parametric Test on Software Adoption

Company W is testing a sales software. Their sales force of 500 people is divided into four regions: Northeast, Southeast, Central and West. Each sales person is expected to sell the same amount of products. During the last 3 months, only half of the sales representatives in each region were given the software program to help them manage their contacts.

The VP of Sales at WidgeCorp, who is comfortable with statistics, wants to know the possible null and alternative hypotheses for a non-parametric test on this data using the chi-square distribution. A non-parametric test is used on data that is qualitative or categorical, such as gender, age group, region, and color. It is used when it doesn’t make sense to look at the mean of such variables.

## Baby Birth Weights and Mother Cocaine Use Study

A random sample of the birth weights of 186 babies has a mean of 3103g and a standard deviation of 696g (based on data from “Cognitive Outcomes of Preschool Children with Prenatal Cocaine Exposure,” by Singer et al., Journal of the American Medical Association, Vol. 291, No. 20). These babies were born to mothers who did not use cocaine during their pregnancies. Further, a random sample of the birth weights of 190 babies born to mothers who used cocaine during their pregnancies has a mean of 2700g and a standard deviation of 645g. Does cocaine use appear to affect the birth weight of a baby? Substantiate you conclusion.

## Standard Deviation, Average Number & P-Value Questions

1) The club professional at a difficult public course boasts that his course is so tough that the average golfer loses a dozen or more golf balls during a round of golf. A dubious golfer sets out to show that the pro is fibbing. He asks a random sample of 15 golfers who just completed their rounds to report the number of golf balls each lost. Assuming that the number of golf balls lost is normally distributed with a standard deviation of 3, can we infer at the 10% significance level that the average number of golf balls lost is less than 12?

1 14 8 15 17 10 12 6

14 21 15 9 11 4 4 8

2) A random sample of 12 second-year university students enrolled in a business statistics course was drawn. At the course’s completion, each student was asked how many hours he or she spent doing homework in statistics. The data are listed here. It is known that the population standard deviation is ? = 8.0. The instructor has recommended that students devote 3 hours per week for the duration of the 12-week semester, for a total of 36 hours. Test to determine whether there is evidence that the average student spent less than the recommended amount of time. Compute the p-value of the test.

31 40 26 30 36 38 29 40 38 30 35 38

3) Spam e-mail has become a serious and costly nuisance. An office manager believes that the average amount of time spent by office workers reading and deleting spam exceeds 25 minutes per day. To test this belief, he takes a random sample of 18 workers and measures the amount of time each spends reading and deleting spam. The results are listed here. If the population of times is normal with a standard deviation of 12 minutes, can the manager infer at the 1% significance level that he is correct?

35 48 29 44 17 21 32 28 34

23 13 9 11 30 42 37 43 48

## Hypothesis Testing – Milk Volume

Quart cartons of milk should contain at least 32 ounces. A sample of 22 cartons contained the following amounts in ounces.

31.5 32.2 31.9 31.8 31.7 32.1 31.5 31.6 32.4 31.6 31.8

32.2 32.1 32.1 31.6 32.0 31.6 31.7 32.0 31.5 31.9 32.8

a) What set of hypotheses should be tested if we want to demonstrate the mean amount of milk in all cartons of this brand is actually less than 32 ounces?

b) Select the distribution to use. Explain briefly why you selected it.

c) Assuming that they wish to test the claim at a = 0.025, determine the rejection and non rejection regions based on your hypotheses in a). State the critical value.

d) Calculate the value of the test statistic. What does the p-value mean for this problem? Explain.

e) Applying the hypothesis test, can we conclude that there is sufficient evidence to claim that the mean amount is less than 32 ounces in all cartons of this brand?

## Statistics Problem: Mean Wait Time

A bank manager has developed a new system to reduce the time customers spend waiting for teller service during peak hours. The manager hopes that the new system will reduce waiting times from the current 9 to 10 minutes to less than 6 minutes. Suppose that the manager wishes to use 100 waiting times to support the claim that the mean waiting time under the new system is shorter than six minutes. The random sample of 100 waiting times yields a sample mean of 5.46 minutes. Further, let’s assume that the population standard deviation is 2.475.

a) State the null and alternative hypotheses, letting u represent the mean waiting time under the new system.

b) Select the distribution to use. Explain briefly why you selected it.

c) Assuming that she wishes to test the claim at alpha = 0.05, determine the rejection and non-rejection regions based on your hypotheses in (a). State the critical value.

d) Calculate the value of the test statistic.

e) What do you conclude about whether the new system has reduced the mean waiting time to below six minutes? Explain your conclusion in words.