# MST256 Section A,   Final Exam Fall

(Please do NOT share this exam with anyone else; my other section’s exam will be different, but there will be enough similarities in topics that I do not want them to have an advantage over this section)

1.  Discuss the 4-step procedure that is utilized throughout the book, including how the steps interrelate with one another.

2. Consider the following setting:  A good sample of n=25 individual tax returns is selected from the 1000 individual tax returns completed by an accountant.  For each return, the following information is recorded for each individual’s return:  earned wages , obtained  interest, gender (for simplicity M or F), race (for simplicity Whites (W), Blacks (B), or Other(O)), and tax owed  (note that if tax owed is negative, then, the individual may actually get a refund, but such is also a negative value for tax owed).

a. Express the model that linearly predicts (tax) owed by (earned) wages.

What criteria and/or methods would be used to estimate the betas  and the “standard error”, i.e. the estimated standard deviation of error?

What is the interpretation of the estimate of the slope associated with linear relationship?

b. Expand the above model to also include  race  and (linear) interest , and the interaction between race and interest only – express such a model in the multiple linear regression framework, including beta’s and their associated terms , etc.

What will be the degrees of freedom for the respective model’s components  .

How would one determine the R2  for such a model?

What assumptions must be met for this model, and what things would one look for in order to determine if these assumptions are met?

c. Provide the details for a nested F test , where the full model is the model in part b, while the reduced model is the model in part a.  These details should include the hypotheses statements,  the terms going into the F test statistics, what the degrees of freedom of each component is, and how one would determine whether to reject the null hypothesis.

3. Refer to the problem above on taxes, and now consider the one-way ANOVA involving the predictor variable race (which has 3 categories: B, W, O), as the only predictor for wages .

a. Write this model in the form of one-way ANOVA (i.e. without indicator variables, but in terms of functions of means and error)

What are the corresponding estimates for each term in the model

b. How would one get the sums of squares of the model , error, and total, respectively.

What are the degrees of freedom of the model and the error, respectively in this problem, and why?

From these, how would one compute the mean squares of the model and the error and the F test statistic, along with its degrees of freedoms and decide whether race is significant or not

c. If one was to do a pre-planned contrast of  Whites (W) versus  Non-Whites (B or O) , how would one compute a 95% confidence interval for such a (population) mean contrast.

4. Consider two-way (involving race and gender) ANOVA, with interaction, for predicting wages .

a. Write this model in terms of mu’s , alpha’s, beta’s , and gamma’s, and errors.

b. How could one test if there was at least some interaction effect?

5. Note that also from the original problem on taxes, that we can construct a random variable, say labelled as debt , that is yes (i.e 1 or equivalently “success”) if  tax owed is positive;  alternatively debt is  no (i.e. 0 or “failure”) if  tax owed is not positive.

a. Now consider the (yes, no) random variable, debt , as the response variable, while (linear) wages is the predicting variable.

Give the details of the required modeling, being sure to define each term utilized, e.g. one such term is odds, and why is this modeling useful.

What is the precise interpretation of the estimated slope component in this setting, being sure to define each term that has not already been defined.

b. How would one construct  a 95% confidence interval for this slope component and how would one use this to determine at a critical level of 0.05, if one can reject Ho: slope = 0  , versus not.

c. What assumptions are required for the above model to be appropriate, as part of an answer, one can give a picture demonstrating one of the important assumptions.

d. Consider now that the categorical variable race (having 3 levels: B, W, O) is added as an additional predicting ingredient, and the race by wage interaction is assumed not to exist, then write the new updated model.  Then, being sure to define each new term, show how one could test if race is a significant ingredient to add to the previously existing (“wage only”) model.