# MST256 Section A, Final Exam Fall

2018: SHOW YOUR WORK!!

JUSTIFY YOUR ANSWERS!!

(Please do NOT share this exam with anyone else; my other section’s exam will be different, but there will be enough similarities in topics that I do not want them to have an advantage over this section)

1. Discuss the 4-step procedure that is utilized throughout the book, including how the steps interrelate with one another.

2. Consider the following setting: A good sample of n=25 individual tax returns is selected from the 1000 individual tax returns completed by an accountant. For each return, the following information is recorded for each individual’s return: earned **wages**** , **obtained

**gender (for simplicity M or F),**

**interest,****(for simplicity Whites (W), Blacks (B), or Other(O)),**

**race****and tax**

**note that if tax owed is negative, then, the individual may actually get a refund, but such is also a negative value for tax owed).**

**owed (**a. Express the model that linearly predicts (tax) ** owed** by (earned)

**.**

**wages**What criteria and/or methods would be used to estimate the ** betas** and the “

**, i.e. the estimated standard deviation of error?**

**standard error”**What is the interpretation of the estimate of the slope associated with linear relationship?

b. Expand the above model to also include ** race** and (linear)

**,**

**interest****and the interaction between race and interest only – express such a model in the multiple linear regression framework, including beta’s and their associated terms , etc.**

What will be the degrees of freedom for the respective model’s components .

How would one determine the R^{2} for such a model?

What assumptions must be met for this model, and what things would one look for in order to determine if these assumptions are met?

c. Provide the details for a nested F test , where the full model is the model in part b, while the reduced model is the model in part a. These details should include the hypotheses statements, the terms going into the F test statistics, what the degrees of freedom of each component is, and how one would determine whether to reject the null hypothesis.

3. Refer to the problem above on taxes, and now consider the one-way ANOVA involving the predictor variable ** race** (which has 3 categories: B, W, O), as the only predictor for

**.**

**wages**a. Write this model in the form of one-way ANOVA (i.e. without indicator variables, but in terms of functions of means and error)

What are the corresponding estimates for each term in the model

b. How would one get the sums of squares of the model , error, and total, respectively.

What are the degrees of freedom of the model and the error, respectively in this problem, and why?

From these, how would one compute the mean squares of the model and the error and the F test statistic, along with its degrees of freedoms and decide whether** race** is significant or not

c. If one was to do a pre-planned contrast of Whites (W) versus Non-Whites (B or O) , how would one compute a 95% confidence interval for such a (population) mean contrast.

4. Consider two-way (involving ** race **and

**ANOVA, with interaction, for predicting**

**gender)**

**wages .**a. Write this model in terms of mu’s , alpha’s, beta’s , and gamma’s, and errors.

b. How could one test if there was at least some interaction effect?** **

5. Note that also from the original problem on taxes, that we can construct a random variable, say labelled as ** debt , **that is

**(i.e 1 or equivalently “success”) if tax owed is positive; alternatively**

**yes****is**

**debt****(i.e. 0 or “failure”) if tax owed is not positive.**

**no**a. Now consider the (yes, no) random variable, ** debt ** , as the response variable, while (linear)

**is the predicting variable.**

**wages**Give the details of the required modeling, being sure to define each term utilized, e.g. one such term is odds, and why is this modeling useful.

What is the precise interpretation of the estimated slope component in this setting, being sure to define each term that has not already been defined.

b. How would one construct a 95% confidence interval for this slope component and how would one use this to determine at a critical level of 0.05, if one can reject H_{o}: slope = 0 , versus not.

c. What assumptions are required for the above model to be appropriate, as part of an answer, one can give a picture demonstrating one of the important assumptions.

d. Consider now that the categorical variable ** race **(having 3 levels: B, W, O) is added as an

**l predicting ingredient, and the race by wage interaction is assumed not to exist, then write the new updated model. Then, being sure to define each new term, show how one could test if**

**additiona****is a significant ingredient to add to the previously existing (“wage only”) model.**

**race**