MFIN6210 Week 4

(Lab Sessions are in your regular classrooms, Mondays and Thursday Evening. Lab session: Data analysis I (manipulation and basic STATA commands). See Week 4 folder on Moodle for additional readings and data. The Computer Lab classes in Weeks 4 and 5 are in your regular classes using “myAccess” which gives you access to both MICTOSOFT EXCEL and STATA-SE 14 on your own laptop/computer. It also gives you access at home, as well as at UNSW.)

Objective: Manipulate an IPO dataset and estimating some basic STATA commands

The objective of the lab is to create a dataset to model IPO underpricing in STATA. The lab will require you to

(1) create dependent and independent variables in excel using several functions;

(2)  create a csv file for input  to STATA;

(3) estimate standard summary statistic commands;

(4) estimate multivariate regression models, including diagnostic tests; and

(5) write a STATA do file to automate the process.

Useful readings:

• Ritter, J and I. Welch, ‘A review of IPO activity, pricing and allocations, Journal of  Finance 57, 1795-  1828, 2002 (covered in the lecture).

•  Loughran, T, J. Ritter, ‘Why has IPO underpricing changed overtime, Financial Management, autumn, 5-

.37, 2004 (available from the lab folder).


1. Download the IPO lab dataset (IPO_lab) from Moodle and open in MS Excel. The first step is to think about  a theoretical framework or model to explain underpricing (Loughran and Ritter’s 2004 paper provides a useful model). For this exercise, we will simply define underpricing using the % change in stock price at close, which is given in the spreadsheet. The next step is to create independent variables that proxy underpricing theories. From the IPO lecture we know several ‘reputation’ or ‘quality’ metrics could capture offer quality (e.g., information asymmetry related to the lemons problem). I suggest you create an underwriter dummy and an auditor dummy that capture underwriter and auditor quality. We will also include other financial characteristics that might be correlated with quality, e.g., offer price or offer size, profitability prior to offer, and industry characteristics (e.g., high tech). You should think about the  expected correlation between your independent variables and your proxy for underpricing (your understanding of the theories should inform this).

2. In the STATA_input worksheet, fill in the columns (names are already provided) with the  requisite  variables. Use the vlookup function to find the name of the underwriter. I have provided a list of all underwriters in the ‘Underwriter worksheet’. Auditor name is already given. Create underwriter and  auditor dummies. For the underwriter, it will simply equal=1 if the issue was underwritten, and 0 otherwise (Hint: use ‘Non underwritten’ to create an IF command). Do the same for auditor, where ‘na’ denotes no auditor. The high-­­tech dummy can also be created using the IF function. The remaining columns contain some profitability data and issue expenses. You will need to create dummies to capture if ebitda is positive, negative, or no ebitda was reported. Copy and save the worksheet values as a new spreadsheet called STATA_input_IPOs. Note that your worksheet will contain many nonnumeric characters which STATA will not be able to read. We will deal with this within STATA.

3. Open STATA. Import your csv file and note the number of observations and variables (should be 357obs  and 15 vars). Run the summary statistics command (statistics-­­summary statistics). You will note that several columns have no summary stats due to nonnumeric data. For those variables (i.e., first-day-return, ebitda_before, expenses_percent) you will need to use the destring command (destring var, replace force).

4. Estimate a pair-wise correlation matrix. Are the correlations between underpricing and the independent variables consistent with expectations? Note any collinearity issues.

5. Generate a new variable called ln_book_value, defined as ln (offer price * shares outstanding). Use the  gen command (gen var=ln(offer_price*shares_outstanding)).

6. Estimate a regression model of underpricing on offer price, underwriter and auditor dummies, high-­­ tech dummy, positive ebitda dummy, book value and expenses. Interpret the results.

7. Estimate post regression diagnostics for heteroscekdascity, normality and multicollinearity.

8. Generate a new variable called underpricing_dum (=1 if first_day_return > 0, and 0 otherwise). Use this variable as the dependent variable in a logit model using the same independent regressors as your OLS model. Are the results consistent with those for OLS?

9. Create your own STATA do file to automate the above. Open a new do file (from menu: window-do-file- editor-new-do-file). See useful commands below.

Some very basic commands for STATA

Set more off (turns off the requirement to press the enter key) To

read in your excel file: insheet using “F:file name.csv”

To replace nonnumeric data: destring variable name, replace force

To generate a new variable, e.g., log book value

gen ln_book_value=ln(offer_price*shares_outstanding)

To generate summary statistics: summarize list variable names To generate a correlation matrix: pwcorr list variable names

To estimate an ols regression model:

regress dependent variable independent variables

To save your results:

save F:file name, replace (the replace command replaces any previous version with the same name)

Some useful commands in Excel for summary stats and data manipulation

=AVERAGE(number1, [number2],…)-­­ calculates the mean value

=STDEV(number1,number2,…) – calculates the standard deviation for a sample

=QUARTILE(array,quart) -­­ calculates the quartile value, where quart = 0 to 4. Note 1= 1st quartile (25th percentile), 2=2nd or median (50th percentile), 3= 3rd (75 percentile) and 0=min and 4= max values.

=INTERCEPT(known_y’s,known_x’s) – calculates the intercept (constant ) in an OLS regression

=SLOPE(known_y’s,known_x’s) – calculates the slope (beta) in an OLS regression

=PERCENTILE(array,k) – calculates the percentile value, where k = percentile (e.g., 0.9 = 90th percentile. Values are ranked largest to smallest)

=IF(logical_test, value_if_true, [value_if_false]) = use the IF function to create dummy variables

=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]) – Use Vlookup to find values for firms which you identify using a unique code (e.g., cusip, gvkey, ticker). You can use this command to combine two datasets into one spreadsheet or to search for financial data from a larger set (e.g., ATP data) for a list of firms that make-­­up your estimation sample.

=SUMIF(range, criteria, [sum_range]) – Use SUMIF to sum the values in a range that meet criteria that you specify. For example, suppose that in a column that contains numbers (e.g., ROA) you want to sum only the values for each industry in a given year, which you will identify using a unique code (e.g., cusip code plus year). This could be useful in calculating industry averages, where totals are divided by the number of firms in an industry to give a mean value. Naturally, this could also be achieved in STATA for those that feel more comfortable with using STATA code.