## ANOVA (Tukey HSD)

Please help me with this question if possible. Thank you!

The following is hypothetical data similar to the actual research results about birds. The numbers represent relative brain size for the individual birds in each sample.

NON-MIGRATING SHORT DISTANCE MIGRANTS LONG DISTANCE MIGRANTS

18 6 4 N=18

13 11 9 G=180

19 7 5 EX2=2150

12 9 6

16 8 5

12 13 7

M=15 M=9 M=6

T=90 T=54 T=36

SS=48 SS=34 SS=16

USE AN ANOVA WITH o=.05 to determine whether there are any significant mean differences among the three groups of birds. (use two decimal places)

SOURCE SS df MS F F-critical

BETWEEN ______ _____ _____ _____ _______

WITHIN ______ _____ _____

TOTAL ______ ______

F distribution

numerator degrees of freedom=6

denominator degrees of freedom=16

conclusion

* fail to reject the null hypothesis; there are significant differences among the three groups of birds

* reject the null hypothesis; there are significant differences among the three groups of birds

* reject the null hypothesis; there are no significant differences among the three groups of birds

* fail to reject the null hypothesis; there are no significant differences among the three groups of birds.

n2=

the results show significant differences in the 3 groups of birds______________.

use the Tukey HSD posttest to determine which groups are significantly different (use 2 decimal places)

q=________

non-migrating vs. short distance migrants:

* not enough information

*no significant mean difference

* significant mean difference

non-migrating vs. long distance migrants

* no significant mean difference

* not enough information

* significant mean difference

short distance migrants vs. long distance migrants

* no significant mean difference

* not enough information

* significant mean difference.

## One Factor ANOVA Using Excel

Based on the amount of time spent on Facebook, students are classified into 3 groups and their grade point averages are recorded. The following data show the typical pattern of results.

FACEBOOK USE WHILE STUDYING

NON-USER RARELY USE REGULARLY USE

3.70 3.51 3.02

3.45 3.42 2.84

2.98 3.81 3.42

3.94 3.15 3.10

3.82 3.64 2.74

3.68 3.20 3.22

3.90 2.95 2.58

4.00 3.55 3.07

3.75 3.92 3.31

3.88 3.45 2.80

USE AN ANOVA WITH o=.05 to determine whether there are significant mean differences among the three groups. Complete the following table by selecting the correct values in each cell (rounding)

SOURCE SS df MS F Fcritical

BETWEEN TREATMENTS ________ _____ _____ _____ _______

WITHIN TREATMENTS ________ _____ _____

TOTAL ________ _____

F DISTRIBUTION

NUMERATOR DEGREES OF FREEDOM=6

DENOMINATOR DEGREES OF FREEDOM=16

*REJECT THE NULL HYPOTHESIS. THERE IS SIGNIFICANT MEAN DIFFERENCE AMOUND THE 3 GROUPS

*FAIL TO REJECT THE NULL HYPOTHESIS. THERE IS A SIGNIFICANT MEAN DIFFERENCE AMOUNG THE THREE GROUPS

* FAIL TO REJECT THE NULL HYPOTHESIS. THERE IS NO SIGNIFICANT MEAN DIFFERENCE AMOUNG THE THREE GROUPS.

N2=_____

THE RESULTS SHOW SIGNIFICANT DIFFERENCE IN MEAN GRADE POINT AVERAGES BETWEEN GROUPS, ________________.

Solution Preview

I input the run the data using one factor ANOVA, the result is as follows:

One factor ANOVA

Mean n Std. Dev

3.710 10 0.3013 Non-user

3.460 10 0.2986 Rarely use

3.010 10 0.2677 Regularly …

## Business Statistics – Null and Alternative Hypothesis

Writing Hypotheses

Read the situation. Then write the hypotheses in correct mathematical notation. Do not conduct any statistical tests. Just write the hypotheses. Insert your answers between the problems.

Here are some things to keep in mind:

1) On the Hypothesis Testing Worksheet, all you need to do is write the null and alternative hypotheses for each situation.

2) The null hypothesis will always be “=”.

3) You can use either “≠” or “not =” for “does not equal”. Greater than and less than is “>” or “<“, respectively.

4) The alternative hypothesis wil be “not =” (2 tailed test) of “>” or “<” (one tailed test).

5) When determining what the null and alternative hypotheses are, realize that the alternative is the new information, what you are trying to prove. The null is what has been believed to be true up until now.

==================================================================

1) A bowler who has averaged 196 pins in the past year is asked to experiment with a ball made of a new kind of material. He rolls several games with the new ball. Has the new ball improved his game?

2) An advertisement claims that chewing NoCav gum reduces cavities. To test the claim, you conduct a study in which participants who chew the gum are compared to the national average of 3 cavities found per year.

3) In a speech to the Chamber of Commerce, a city councilman claims that in his city less than 15% of the adult male population are unemployed. An opponent in the upcoming election wants to test the councilman’s claim.

4) The councilman is starting to get worried about the upcoming election. He has enjoyed 63% support for several years, but the political climate has been changing. He wants to know if his support has changed.

5) A production process is considered to be under control if the machine parts it makes have a mean length of 35.50 mm with a standard deviation of 0.45 mm. Whether or not the process is under control is decided each morning by a quality control engineer who bases his decision on a random sample of size 36. Should he ask for an adjustment of the machine on a day when he obtains a mean of 35.62 mm?

6) Jim, the owner of Jim’s Grocery, knows that Plain Chips have always outsold Spicy chips. However, sales of Spicy chips have been increasing. Jim wants to determine if the average weekly sales of Spicy chips have indeed surpassed that of Plain chips.

7) Jim now wants to know if Plain and Spicy chips have the same percentage of defective product (i.e. underfilled bags, torn bags, wrong flavor in the bags, etc.).

8) The Great Vehicle Co. just introduced New SUV, claiming it can pull more weight than Old SUV. After testing 150 vehicles of each model, Old SUV had a mean pull weight of 5032 pounds with a standard deviation of 72 pounds. New SUV had a mean pull weight of 5462 pounds with a standard deviation of 154 pounds. Is the claim valid at a .05 level of significance?

9) The Great Vehicle Co. has a competitor, Amazing Autos, that claims people who purchase its competing vehicle, the Sport Off Road Vehicle (SORV), have higher customer satisfaction than New SUV. Out of 736 people who purchased the SORV last month, 534 said they were satisfied. Out of 521 people who purchased New SUV last month, 375 said they were satisfied. Is there a higher percentage of people who are satisfied with the SORV than with New SUV?

10) The Great Vehicle Company wants to counter Amazing Autos’s claim by making its own claim that New SUV has a lower percentage of defective vehicles. The research team tested 536 vehicles of each model and found that SORV had 53 defective units, while New SUV had only 51 defective units.

Statistics Problems

1) Ask 10 people (get 5 males and 5 females) the following questions

A) Their ages

B) How many vitamins they take daily

C) How many carbonated sodas they drink each day

D) How many alcoholic beverages they drink per month

E) Write your own question. Ask your participants if they agree with something or if they do something. For example, you may want to ask them if they eat popcorn when they go to the movies or if they support a political issue. It must be a yes/no question.

SHOW & SAVE YOUR DATA – You will use the data you gathered above for the problems below

Practice Problems:

1) Use your data from above. This week assume that historically the average person takes 3 vitamins on a daily basis. Conduct a hypothesis test analysis to determine if 3 is still the correct average number. Write your hypotheses in correct statistical notation. Finally use the important numbers from your output to explain your results. Use alpha = 0.05. Post only the relevant numbers, not all of the output; then explain your results.

2) Use your data from above. Analyze if more than 58% support an issue or partake in an activity. (Question E above). Write the hypotheses. Show the relevant numbers. Then explain your results. Use alpha = 0.05.

## MCQs: ANOVA

THESE ARE SOME OF THE QUESTIONS I AM WORKING ON, IF POSSIBLE CAN YOU PLEASE LET ME KNOW IF MY ANSWERS ARE CORRECT/SOLVE? THANK YOU!!

For an ANOVA comparing three treatment conditons, what is stated by the null hypothesis (Ho)?

* at least one of the 3 population means is different from another mean

* none of these choices are correct (this is the one I chose)?

* there are no differences between any of the population means

* all 3 of the population means are different from each other

For an ANOVA comparing 3 treatment conditions, what is stated by the alternative hypothesis (H1)?

* all 3 of the population means are different from each other

* none of these choices is correct

* at least one of the 3 population means is different for another mean (this is the one I had chosen?)

* there are no differences between any of the population mean

When comparing more than 2 treatment means, why should you use an analysis of variance instead of using several t tests?

* there is no advantage to using an analysis of variance instead of several t test

* the analysis of variance is more likely to detect a treatment effect

* using several t test increases the risk of a Type I error (this is the one I had chosen)?

* using several t test increases the risk of a Type II error

In an analysis of variance, differences between participants contribute to which of the following variances?

* neither between treatments variance nor within treatment variance (this is the one I had chosen?)

* between treatments variance but not within treatment variance

* both between treatments variance and within treatment variance

* within treatments variance but not between treatments variance

In an analysis of variance differences caused by treatment effects contribute to which of the following variances?

* between treatment variance but not within treatment variance

* neither between treatments variance nor within treatments variance

* within treatments variance but not between treatments variance

* both between treatments variance and within treatments variance (this is the one I had chosen?)

On average what value is expected or the F-ratio if the null hypothesis is true?

* k-1

* 1.00

* N-k

* 0 (this is the one I had chosen)?

On average what value is expected for the F-ratio if the null hypothesis is false?

* much greater than 1.00

* between 0 and 1.00

* 1.00 (this is the one I had chosen)?

*0

A research study comparing three treatments with n=5 in each treatment produces T1=5, T2=10, T3=15, with SS1=6, SS2=9, SS3=9 and EX2=94. for this study what is SS total?

*34

* 10 (?)

* 68

* 24

Solution Preview

For an ANOVA comparing three treatment conditons, what is stated by the null hypothesis (Ho)?

* at least one of the 3 population means is different from another mean

* none of these choices are correct (this is the one I chose)?

* there are no differences between any of the population means

* all 3 of the population means are different from each other

For an ANOVA comparing 3 treatment conditions, what is stated by the alternative hypothesis (H1)?

* all 3 of the population means are different from each other

* none of these choices is correct

* at least one of the 3 population means is different for another mean (this is the …

## Testing Hypotheses and Confidence Intervals

A random sample of size 64 has sample mean 24 and sample standard deviation 4.

d. Is it appropriate to use the t distribution to compute a confidence interval for the population mean? Why or why not?

e. Construct a 95% confidence interval for the population mean.

f. Explain the meaning of the confidence interval you just constructed.

How much do adult male grizzly bears weigh in the wild? Six adult males were captured, tagged and released in California and here are their weights:

480, 580, 470, 510, 390, 550

g. What is the point estimate for the population mean?

h. Construct at 90% confidence interval for the population average weight of all adult male grizzly bears in the wild.

i. Interpret the confidence interval in the context of this problem.

After going to a fast food restaurant, customers are asked to take a survey. Out of a random sample of 340 customers, 290 said their experience was “satisfactory.” Let p represent the proportion of all customers who would say their experience was “satisfactory.”

j. What is the point estimate for p?

k. Construct a 99% confidence interval for p.

l. Give a brief interpretation of this interval.

Suppose the p-value for a right-tailed test is .0245.

a. What would be your conclusion at the .05 level of significance?

b. What would the p-value have been if it were a two-tailed test?

A random sample has 42 values. The sample mean is 9.5 and the sample standard deviation is 1.5. Use a level of significance of 0.02 to conduct a left-tailed test of the claim that the population mean is 10.0.

a. Are the requirements met to run a test like this?

b. What are the hypotheses for this test?

c. Compute the test statistic and the p-value for this test.

d. What is your conclusion at the 0.02 level of significance?

MTV states that 75% of all college students have seen at least one episode of their TV show “Jersey Shore”. Last month, a random sample of 120 college students was selected and asked if they had seen at least one episode of the show. Out of the 120, 85 of them said they had seen at least one episode. Is there enough evidence to claim the population proportion of all college student that have watched at least one episode is less than 75% at the 0.05 level of significance?

a. Are the requirements met to run a test like this?

b. What are the hypotheses for this test?

c. Compute the test statistic and the p-value for this test.

d. What is your conclusion at the 0.05 level of significance?

## Nonparametric Statistics, Chi-Square, etc.

1. Which national park has more bears? Random samples of plots of ten square miles were taken in different parts of Yellowstone National Park, Yosemite National Park and Glacier National Park. The bear counts per square mile were recorded as shown below:

Yellowstone Yosemite Glacier

2 3 8

1 0 3

4 4 5

2 1 8

We want to test whether there is a difference in the mean number of bear per ten square mile plot in these three different parks using a 5% level of significance.

a. State the hypotheses.

b. Calculate the SSTOTAL, SSBETWEEN, and SSWITHIN.

c. Using these values, create the summary table for your ANOVA test.

d. From the table, state the test statistic and p-value, and state your conclusion at the 5% level of significance.

## Calculate probability, building confidence interval and performing hypothesis testing

* * * For any hypothesis testing problems in this set, please do only the following: (1) set up the hypotheses in plain English, (2) set up the hypotheses in statistical terms, (3) specify type I and type II errors for the hypothesis testing context for the problem, and (4) provide a brief description of the cost implications of each of the two errors for the problem context * * *

1. In 1993, women took an average of 8.5 weeks unpaid leave from their jobs after the birth of a child (U.S. News & World Report, 12/27/1993). Assume that 8.5 weeks is the population mean and 2.2 weeks is the population standard deviation. What is the probability that a simple random sample of 50 women provides a sample mean unpaid leave of 7.5 to 9.5 weeks after the birth of a child?

2. A sample of 532 Business Week subscribers showed that the mean time a subscriber spends using the Internet and online services is 6.7 hours per week (Business Week 1996 Worldwide Subscriber Study). If the sample standard deviation is 5.8 hours, what is the 95% confidence interval for the mean time that the Business Week subscriber population spends on the Internet and online services?

3. Suppose that scores on an aptitude test used for determining admission to graduate study in business are distributed with a mean of 500 and a population standard deviation of 100. If a random sample of 64 applicants from Stephan College has a sample mean of 537, is there any evidence that their mean score is higher than the mean expected of all applicants? (Use = .01)

4. To help your restaurant marketing campaign target the right age levels, you want to find out if there is a statistically significant difference, on the average, between the age of your customers and the age of the general population in town, 43.1 years. A random sample of 50 customers shows an average age of 33.6 years with a standard deviation of 16.2 years. What can you conclude? (Use = .05)

5. External organizational development (OD) professionals provide consulting services in such areas as human resources, training, planning, skills education, industrial psychology, and organizational behavior. The Training and Development Journal (Feb. 1984) conducted a survey of OD professionals. A random sample of 440 external OD consultants yielded the following summary statistics on daily fees charged:

Sample mean = $720 S = $275

Suppose it is known that other management consultants charge, on average, $800 per day. Do the data provide sufficient evidence to indicate that the mean daily fee charged by external OD consultants is less than $800? Test using = .10.

## Hypothesis Testing: P-Value and Null Value

This problem contains 4 parts based on the following information (also see attachment).

“The Florida Home Energy Commission lists the mean annual air-conditioning (A/C) expenditure for a well-insulated Central Florida home as $1.20 per square foot. Cool Crib Insulation (a home insulation company) is working hard to attract new customers. In the company’s advertising, they claim that homes with their exclusive, patented insulation material have A/C costs that are below average for the region. A consumer-affairs investigator has surveyed a sample of 80 recent Cool Crib customers and listed their A/C expenditures (per square foot) in the following tab. The investigator is interested in examining whether the mean A/C expenditures for Cool Crib’s clients really is less than the $1.20 per square foot average for the entire region. Based on these data, and using a significance level of 5% (i.e. = .05), provide the answers to the following questions in the spaces provided:

Report all answers to this problem to a minimum of 4 decimal places.

Using the provided sample data,(in the attached excel spreadsheet), to answer the following questions:

a. Write the expression for the correct alternative hypothesis statement for the test that must be carried out.

Ha: __________________ _________ _________

(or H1): (Population Parameter) (Operator) (Null Value)

b. Perform the hypothesis test and record the “p-value” for this test.

The p-value for this hypothesis test is: _________________ 4 decimal places

c. Which of the following statements is correct?

i) i. The mean annual A/C expenditure for Cool Crib clients appears to be significantly less than the regional average of $1.20 per square foot.

ii) ii. The mean annual A/C expenditure for Cool Crib clients appears NOT to be significantly less than the regional average of $1.20 per square foot.

Copy and past your choice from the statements above, in the space below.

d. The investigator would next (separately) like to estimate, with 99% confidence, the mean annual A/C expenditures for all Cool Crib clients. Based on the same sample data (used in parts a. thru c. above), find the range in which he can be 99% confident that the actual mean annual A/C expenditures (for all Cool Crib clients) will fall?

Upper Limit __________________

Lower Limit __________________

This entire problem in excel is attached. It has ‘data and workspace’ spreadsheet to do all calculations.

## Assumptions for a Statistical Test

Why does it matter whether the assumptions required for a statistical test are met?

## Design of Experiments: Variables, Blocking, Factors, and Effects

Explain the difference between multiple independent variables and multiple levels of independent variables. Which is better? What is blocking and how does it reduce “noise”? What is a disadvantage of blocking? What is a factor? How can the use of factors benefit a design? Explain main effects and interaction effects. How does a covariate reduce noise? Describe and explain three trade-offs present in experiments.

## Construting a 95% confidence interval for the population mean

Question 1: Clothing for runners. Your company sells exercise clothing and equipment on the Internet. To design the clothing, you collect data on the physical characteristics of your different types of customers. Here are the weights for a sample of 24 male runners. Assume that that these runners can be viewed as a random sample of your potential customers. The weight are expressed in kilograms.

67.8 61.9 63.0 53.1 62.3 59.7 55.4 58.9

60.9 69.2 63.7 68.3 64.7 65.6 56.0 57.8

66.0 62.9 53.6 65.0 55.8 60.4 69.3 61.7

Exercise 6.20 asks you to find a 95% confidence interval for the mean weight of the population of all such runners, assuming that the population standard deviation is sigma = 4.5 kg

(a) Give the confidence interval from that exercise or calculate the interval if you did not do the exercise.

(b) Based on this confidence interval, does a test of:

H0: u = 61.3 kg

Ha: u (does not = ) 61.3 kg

reject H0 at the 5% significance level?

(c) Would H0: u = 63 be rejected at the 5% level if tested against a two-sided alternative.

Question 2: Hypotheses. Translate each of the following research questions into appropriate H0 and Ha.

(a) Census Bureau data show that the mean household income in the area served by a shopping mall is $62,500 per year. A market research firm question shoppers at the mall to find out whether the mean household income of mall shoppers is higher than that of the general population.

(b) Last year your company’s service technicians took an average of 2.6 hours to respond to trouble calls from business customers who had purchased service contracts. Do this year’s data show a different average response time?

Question 3: Apartment rental rates. You want to rent an unfurnished one-bedroom apartment for next semester. The mean monthly rent for a random sample of 10 apartments advertised in the local newspaper is $640. Assume that the standard deviation is $90. Find a 95% confidence interval for the mean monthly rent for unfurnished one-bedroom apartments available for rent in this community.

## Problem on Confidence interval of proportion

A study was carried out to understand the amount of time put in by the students in three management institutes. One hundred students from different institutions were surveyed. Information regarding the hours put in by the students per week is summarized in the following contingency table:

Institution <= 40 hours/week between 40 and

<= 45 hours/week More than 45 and

<=50 hours/week more than

50 hours/week

A 2 8 20 20

B 4 6 10 10

C 3 4 7 6

a) What is the proportion of students who put in more than 45 hours a week? Calculate a 95 percent confidence interval for the population proportion of students who work more than 45 hours a week.

b) Is there any statistically significant evidence of any dependence between the number of hours put in by the students and the institution they belong to?

## Statistical Analysis Techniques: ANOVA and Post-Hoc Tests

A market researcher is interested in knowing the type of training that works best for DVD users. Thirty consumers are randomly selected from a population of known DVD owners (i.e., users). Ten users are trained by giving them the DVD user’s manual and allowing them to read it. Another ten users are trained from a 30 minute DVD user training video. Another ten users are trained from a self-paced computer tutorial. The users are then timed in their ability to setup and program the DVD by performing a series of operations. Which statistical analysis technique should be used? What is the null hypothesis? Can the market researcher get an answer? Why or why not?

## Conducting a one-way ANOVA

A medical researcher wants to determine whether there is a difference in the mean length of time it takes three types of pain relievers to provide relief from headache pain. Several headache sufferers are randomly selected and given one of three medications. Each headache sufferer records the time (in minutes) it takes the medication to begin working. The results are shown in the following table. At alpha=0.01, can you conclude that the mean times are different?

Medication 1 Medication 2 Medication 3

12 16 14

15 14 17

17 21 20

12 15 15

19

## Biostat Case Study

There are seven questions at the end of this article. This assignment is worth 80 points. Place answers in this table:

1.

2.

3.

4.

5.

6.

7.

BIOSTAT Case Study: Tests of Association for Categorical Data

LEARNING OBJECTIVES

At the completion of this Case Study, participants should be able to:

Compare two or more proportions

Calculate and interpret confidence intervals for proportions

Understand the impact of expected values on the choice of statistical test used to compare proportions

Interpret the results of tests of association

Interpret logistic regression results.

Suggested Citation: New Jersey Medical School Global Tuberculosis Institute. /Incorporating Tuberculosis into Public Health Core Curriculum./ 2009: BIOSTATISTICS CASE STUDY 2: Tests of Association for Categorical Data STUDENT Version 1.0.

Introduction

This exercise is based on the following study. Sections of this document have been reprinted with permission of the journal.

Factors influencing the successful treatment of infectious pulmonary tuberculosis W-S. Chung,*† Y-C. Chang,† M-C. Yang†, * Department of Internal Medicine, Hualien General Hospital, Hualien, † Institute of Health Care Int J Tuberc Lung Dis 11:59-64 © 2007 The Union

The abstract states that “(t)his study used a population-based…design. All PTB [pulmonary TB] patients residing in southern Taiwan recorded in the tuberculosis registry from 1 January to 30 June 2003 were identified. Each patient’s medical record was requested from treating hospitals and retrospectively reviewed for 15 months after the date PTB was confirmed.” 1

Following is the methods section of this article1.

METHODS

We carried out a population-based medical record review in southern Taiwan, where the only chest specialty hospital geared towards specialized thoracic disease care, mainly for TB, is located. Hospitals and primary practitioners that provided TB care in the same region can be used as comparative care providers. Study areas include Chiayi County, Chiayi City, Tainan County and Tainan City. As mandated by law in Taiwan, all suspected and confirmed TB cases must be reported in a timely manner to the national computerized registry maintained by the Taiwan Center for Disease Control (CDC). Reporting of cases has been encouraged and reinforced through the implementation of a no-notification, no-reimbursement policy and a notification-for-fee policy since 1997. 7 We requested data on all suspected and confirmed TB patients residing in the studied areas and recorded in the registry for the period 1 January to 30 June 2003. The study team, including four registered nurses (each with a minimum of 6 years’ clinical experience), two head nurses (each with a minimum of 12 years’ clinical experience) and one pulmonologist, had undergone a series of training courses designed to ensure proper validation of data consistency. Site visits were arranged to review the medical record of each patient, and the 15-month follow-up of medical records after start of treatment was reviewed.

Health care institutions

Health care institutions that had ever reported cases in the study areas included the chest hospital, two academic medical centers, 11 regional hospitals and 15 district hospitals and primary practitioners (district hospitals and primary practitioners are regarded as being at the same level in terms of TB treatment). In Taiwan, institutions are classified by the government as follows: ‘medical centers’ are health care, training and research facilities that house over 500 acute-care beds; ‘regional hospitals’ have no fewer than 250 acute care beds and are staffed by physicians of various specialties with the purpose of providing health care services to patients and training for specialists; and ‘district hospitals’ provide primary health care services similar to those offered by primary practitioners but with the added availability of in-patient care.

Infectious PTB

Infectious PTB is defined as sputum culture-confirmed disease caused by Mycobacterium tuberculosis, or two sputum smear examinations positive for acid-fast bacilli (AFB) or one positive sputum examination, radiological signs and a clinician’s decision to treat.8

Directly observed treatment

For directly observed treatment (DOT), a health worker or other trained person who is not a family member watches as the patient swallows anti-tuberculosis medicines for at least the first 2 months of treatment.1 DOT thus shifts the responsibility for cure from the patient to the health care system. In Taiwan, whether or not the patient is receiving DOT, TB is treated using WHO-recommended regimens; the initial phase consists of 2 months of isoniazid (H), ethambutol (E), rifampicin (R) and pyrazinamide (Z), followed by a 4-month continuation phase consisting of H, E and R (2HERZ/4HER).9,10

Treatment success

Treatment success is defined as a patient who has been cured or has received a complete course of treatment. A cured case is defined as a PTB patient who has finished treatment with a negative bacteriology result during and at the end of treatment. A case recorded as completed treatment is defined as a PTB patient who has finished treatment, but who has not met the criteria to be defined as a cure or a failure.11,12

Ethical consideration

The study was approved by the Taiwan CDC. All staff members involved in the study signed a statement of agreement to maintain patient confidentiality.

Data analysis

Bivariate analyses with 2 tests were used to compare differences in proportions of dichotomous and categorical variables, which extracted potential predictors of successful treatment. We then performed multivariate logistic regression analyses on the potential predictors with P < 0.10 obtained from bivariate analyses. We constructed a full model that included all the potential predictors identified through bivariate analyses and then applied the forward substitution model building procedure to construct a reduced model in which all the predictors were statistically significant. Odds ratios (ORs) and 95% confidence intervals (CIs) of dichotomous and categorical risk variables on the binary outcome variables were calculated. All analyses were conducted using SPSS 10.0 software (SPSS Inc, Chicago, IL, USA), and all the tests were performed at the two-tailed significance level of 0.05.

References that appear in the excerpt from this article:

1 World Health Organization. Tuberculosis Fact Sheet. Geneva,Switzerland: WHO. http://www.who.int/mediacentre/factsheets/fs104/en/index.html Accessed August 2006.

7 Chiang C Y, Enarson D A, Yang S L, Suo J, Lin T P. The impactof National Health Insurance on the notification of tuberculosis in Taiwan. Int J Tuberc Lung Dis 2002; 6: 974-979.

8 Migliori G B, Raviglione M C, Schaberg T, et al. Tuberculosis management in Europe. Task Force of the European Respiratory Society, the World Health Organization and the International Union Against Tuberculosis and Lung Disease, EuropeRegion. Eur Respir J 1999; 14: 978-992.

9 National Tuberculosis and Lung Disease Research Institute/World Health Organization Collaborating Centre for Tuberculosis.Report on the Second Meeting of National TB Programme managers from Central and Eastern Europe and the former USSR. Bulletin No 3. Warsaw, Poland: WHO Collaborating Centre for Tuberculosis, 1997: 1-30.

10 American Thoracic Society/Centers for Disease Control and Prevention/Infectious Diseases Society of America. Treatment of tuberculosis. Am J Respir Crit Care Med 2003; 167: 603-662.

11 World Health Organization. Global tuberculosis control. WHO Report 1999. WHO/CDS/CPC/TB/99.259. Geneva, Switzerland: WHO, 1999.

12 Farah M G, Tverdal A, Steen T W, Heldal E, Brantsaeter A B, Bjune G. Treatment outcome of new culture positive pulmonary tuberculosis in Norway. BMC Public Health 2005; 5: 14.735-739.

Table 1, on the next page, presents the characteristics of the 399 patients eligible for this study.1

Question 1

What type of study design is described in the abstract? (10 pts)

a. Observational

b. Case Control

c. Retrospective

d. Cross Sectional

e. a and c

Question 2

What proportion of patients was successfully treated? (10 pts)

Question 3

Calculate a 95% Confidence Interval (CI) for the true population proportion with successful treatment. Hint: The SE of p is the square root of (pq)/n. (10 pts)

Upper limit CI = _____

Lower Limit CI = _____

Question 4

Which of the following is true with regard to the confidence interval computed in Question 3 above: (10 pts)

a. 95 times out of 100 one would expect a the sample of 399 taken from the same population to have a proportion of successfully treated patients to be between the upper and lower limits of the confidence interval computed in Question 2.

b. 95 times out of 100 one would expect a the sample of 399 taken from the same population to have a proportion of successfully treated patients to be outside the upper and lower limits of the confidence interval computed in Question 2.

c. 5 times out of 100 one would expect a the sample of 399 taken from the same population to have a proportion of successfully treated patients to be outside the upper and lower limits of the confidence interval computed in Question 2.

d. a and c.

Question 5 (15 pts)

Using the information from Table 1, construct a 3 x 2 table to test the association between DOT status and successful treatment.

Observed Treatment Success

DOT Yes No Total

Yes 250

No 146

Unknown 3

Total 275 124 399

Generate the expected values for the empty cells below. Hint: the expected value for any cell is the row total x column total divided by the grand (overall) total.

Expected Treatment Success Values

DOT Yes No Total

Yes 250

No 146

Unknown 3

Total 275 124 399

Question 6

Using the DOT status groups, generate the chi-squared test statistic, by hand, using a calculator, or using a computer.

Alpha = 0.05

df = _____ (5 pts)

Critical value = _____ (5 pts)

Chhi-squared test statistic = _____ (5 pts)

Based on comparing the Chi square statistic to the critical value which of the following is true? (5 pts)

a. Successful outcome is dependent on DOT status.

b. Successful outcome is independent of DOT status.

c. No conclusion can be made.

d. The Chi square test is invalid because of only 1 degree of freedom.

Multiple logistic regression analysis allows us to look at the impact of independent variables (potential predictor variables) on a dichotomous outcome variable such as successful treatment completion (yes/no) when controlling for other independent variables. Table 3 presents some of the results of the multiple logistic regression model.1 The outcome is successful treatment.

Note:

Other footnotes are intentionally excluded from this table.

One way to assess the importance of a potential predictor variable is to examine the odds ratios (ORs) and associated 95% CIs that are estimated from the logistic regression model.

Question 7 (10 pts)

Which independent variables listed below is (are) positively significantly associated with successful treatment?

a. Institutions

b. Physician

c. DOT

d. CXR

e. a, b, and c.

References

1. Chung,*† Y-C. Chang,† M-C. Yang†, * Department of Internal Medicine, Hualien General Hospital, Hualien, † Institute of Health Care Int J Tuberc Lung Dis 11:59-64 © 2007 The Union

2. Dawson, B and Trapp, R Basic &Clinical Biostatistics, 4th edition, Lange Basic Science, 2004 page 152.