Winter 2019

Assignment 3

It’s due on February 28 (midnight) by email. You can submit it individually or as a team (maximum 3 members).

• You can only use linear algebra. You cannot use any function or command (ex-cept for var(), mean() etc.) from relevant libraries in the software that you use in your estimations (i.e. lm() in R is NOT allowed)

• You can only use R, Python, C++, or Java. Not Stata!

• What you submit must be your own work.

• I need ONLY your rmd file (or the source code in C++), nothing else!

• The solutions will be provided later only for R and C++.

This assignment has 3 parts:

A. Big Andy: OLS estimations, F-tests, Hypothesis Testing, linear restrictions, Confi-dence intervals.

B. California Test Scores: Multicollinearity.

C. Simulation with corelated errors following an AR(1) scheme.

Sources: Your ECON 3303 notes and the relevant sections in the textbook. The data for part A is on Brightspace. The following link will give you information you need in Part B. The data file and instruction on the file that we use in Section B can be found on this webpage.

A. Big Andy’s Burger Chain

Big Andy is a burger chain. The management would like to make some changes in their current marketing policy. But before implementing it, they need to make sure that their new strategy would work. They usually spend more in advertisement to increase their revenue. Now, they would like to use their pricing, instead. First, they want to know the price elasticity of their sales. For example, if they reduce their price by $0.40 per burger how would their revenue be affected? If the price elasticity would allow, they would like to cut their advertisement expenses by $0.8 thousand per chain. If they do it with the price reduction, would their revenue still rise? The management has collected data, (andy.csv – sales($000) and advert($000)), and you are in charge of analyzing it!

1. Estimate the model: & = 1 + 2 & + 3 & + & (again, do not use lm()).

2. Interpret the results.

3. Find the price and advertisement elasticities by transforming the model (do not calculate them manually). Why these elasticities are different than the one when sales is 71 and price is $5 (now you can calculate them manually)?

4. What is the optimal price for Andy Burger to maximize the revenue? What’s the total sales at this price. What’s the price elasticity at this revenue maximizing price?

5. What are the confidence intervals for both elasticities? Are the elasticities statisti-cally significant? Test if the elasticity of sales in terms of advertisement is less than 1.2.

6. One suggests to run the following model:

& = 1 + 2 & + 3 2& + 4 & + 5 2& + &

What would be the reason? How would you test if this extended model fits the data better (F-test)?

Would you go with & = 1 + 2 & + 4 & + 5 2& + &, in-stead? Is it better than the restricted model in (1)? Explain.

7. Use the updated second model to suggest an optimal advertisement. (Each dollar spent on ads should create at least $1 sales)

8. Now we will test the policy suggested earlier. But first, calculate the total change in sales if price goes down 0.4 cents and advert goes up 0.8 thousand dollars. Use the estimates of the restricted model in (1).

9.  Now test if this total changewould be larger than zero.  That is if̂&) =  .  You have 2 ways to do that. 2̂&+3̂&> 0 ∆  (            ∆           ∆                  

a. Use Hypothesis testing:   0: −0.4 2 + 0.8 3 ≥ 0 vs.   1: −0.4 2 + 0.8 3 < 0

b. Directly test by re-writing the model:& = 1 + ( 2 + 2 3)   & +

3( & − 2 &) + &. Please make sure that the restriction above can be incorporated in the model by this way.

Hint: (9) is one-tail test. And can be re-written 0: 2 + 2 3 = 0 vs.   1: 2 +

2 3 < 0. Check relevant sections in the book. You need to have VCM(betahat). And in t-test, make sure that you calculate se( 2 + 2 3) correctly. In 6(b) you just need to run the model and apply a “significance” test to the coefficient of price.

B. California Test Scores Please read the “claiforniatestscores.docx” to understand the data. You will build and run your own model that predicts “educational achievements” and its determinants among 5th grade students. This educational outcome can be defined many different ways but for this dataset we can use test scores (pick one or use an average test score). A simpler model is always better. The model that you will build should make sense so that each explanatory variable should be justified as to why you have to include each of them in your model. 1. Build your model and give a short explanation how your model(s) is justified. In building a model, we usually start with a couple of alternative models. Similar to the models in Part A, these alternatives are defined by “extended” and “nested” models, or type of functions1. Explain your reasoning why you have 2-3 models and why you rank them 1st, 2nd, and 3rd “best” (if you have). You need mathematical expressions of each model with a proper notation. Search Google to see how you insert math equations into Markdown. 2. Estimate your 1st (unrestricted) and 2nd (restricted) models with linear algebra and lm() separately. Check if the results are the same. Interpret the results. Which model seems to have a better explanatory power? (Look at R2, apply an F-test) 3. Now you are going to examine most influential factors in achieving higher test scores. Normalized your variables and estimate “Beta” coefficients. Interpret the results. 4. Now we are going to test the multicollinearity in your regressions. Before running some diagnostics, do you have any sign of multicollinearity in your estimations?5. Remember we talked about “simple (or gross) correlation” vs. “higher-order partial correlation coefficients”. Now read Section 6.5 (5th ed. Poor Data, Collinearity, and Insignificance) in your textbook (or other books). Steps: a. We will first calculate VCM(X), which is a (k x k) and symmetric matrix. How would you get VCM(X)? Compare it with the results obtained from cov(X) in R. (Note that VCM(X) reports only variance-covariances) b. Manually calculate a partial correlation with X2 and X3 by running an “auxiliary regression” (see the textbook and your notes) holding other Xs fixed. c. Now use a package (search Google about “partial correlation in R”. Here is a simple instruction

Y=f(X), Y=f(log(X)), log(Y)=f(X), log(Y)=f(log(X)). Remember from ECON 3303, these are called lin-lin, lin-log, log-lin, and log-log. Moreover, the model could be nonlinear in variables and you may need to add nonlinear extensions to the function (x and x2, for example).