It’s due on February 5 by email (by midnight). You can submit it individually or as a team (maximum 3 members). Sections C is optional. If you choose to answers it, you’ll get some bonus points. Remember: the true reward will not be the mark you receive from these assignments, but it will be what you learn.
• You can use R, Stata, Stata – Mata, Python, C++, or Java.
• What you submit must be your own work.
• Pay attention to our format restrictions as stated in A1. I need ONLY rmd, smcl or sources code fifiles, nothing else!
• The solutions will be provided later only for R, Stata, and C++. This
assignment has 3 parts:
A. Defifining a data generating model (DGM).
B. OLS with linear algebra using one sample.
C. Monte Carlo simulations: sampling distributions with multiple samples. (BONUS) Sources: See A1 and read the textbook. There are multiple sections in the book about MC simulations.
In this exercise we will create our own sampling distribution for the OLS estimators. Remember, since these estimators are linear functions of y (L in BLUE), they themselves are random variables too. Hence each of them has a sampling distribution. The population y is a product of a data generating model that is defifined by a systematic part (model that contains beta’s and x’s) and an unknown stochastic part (error) that follows a Gaussian distribution (that is i.i.d, for now). We will generate 5000 samples by this DGM. Each sample will have 500 observations. Thus, we will calculate 5000 beta_hat vectors. The objective is to see the distribution on these 5000 OLS estimators, or more specififically whether beta_hat is BLUE of beta. But before that we have to decide whether x’s are stochastic or not. In this assignment, we will assume that x’s are fifixed in repeated samples. So, if we have 5000 samples with 500 observations, we will have the same set of x’s in each sample. We will not see in this assignment how we relax this assumption and make x’s random changing sample to sample.
A. Data Generating Model
1. Create a vector of 1000 1’s and assign it to x1.
2. Create a vector with 1000 random integers between 0 and 20 and assign it to x2.
3. Create a vector with 1000 random 0s and 1s and assign it to x3.
4. Create a vector with 1000 random numbers drawn from a normal distribution (mean = 5.2, sd = 1.25) and assign it to x5.
5. Create a coeffiffifficient vector with your choice of values. For example, beta = (12, – 0.7, 34, -0.17, 5.4).
6. For the following DGM, you need to have a vector of 1000 random “errors” drawn from a “Gaussian” distribution (Call it u ~ 0,1)).
7. The above model is y = Xbeta + u, where X is a 1000 x 5 matrix with 5 x’s. So don’t forget to create X!
8. Congratulations! You have defifined your fifirst DGM. This DGM will create our population in Section C.
B. OLS with linear algebra with one sample: For the following operations, you need to create only one sample with 50 observations randomly selected from 1000 observations you generated in (A). In R, to sample 50 observations from x5, for example, the following code would be enough: sample(x5, 50, replace=TRUE). Notice that you have 2 separate “containers” to sample from: y and X. If you sample separately, this would create 2 difffferent orders of sampling. Therefore, you have to put y and X together and then sample 50 observations. This can be done in R by using cbind for y and X to create a new matrix. Here is my code for this operation:
X <- cbind(x1, x2, x3, x2x3, x5)
beta <- c(12, -0.7, 34, -0.17, 5.4)
y <- X%*%beta + u
data <- cbind(y, X)
samp <- data[sample(nrow(data), 50, replace = TRUE), ]
Xs <- cbind(samp[ ,2:6])
ys <- samp[ ,1]
b <- solve(t(Xs)%*%Xs)%*%t(Xs)%*%ys
1. Now you have X (50×5) and y (50×1). You’ll fifind the estimators of beta. But fifirst, is this problem solvable? Apply the rank-test.
C. Monte Carlo Simulations for sampling distributions
We will create 5000 samples with our DGM defifined in (A). Each of these samples will have 500 observations. There are multiple ways to do that. In this assignment we have fifixed (non-stochastic) x’s. Therefore, we will create x’s out of the loop to fifix their values for repeated samples and then we use DGM to create y’s within a loop for each sample. Here I show you how this can be done with a simple one-variable example for Stata with beta1 = 2.1 and beat2 = -0.7:
Here are the steps/questions for this section: