* Andy Philips
* Texas A&M
* 12/27/2013
*
* Bootstrapping, Jack-knifing, and Monte Carlo
*
* -----------------------------------------------------------------------
* This .do file contains code mostly for the canned stata procedures
* for bootstrap, jackknife, and monte carlos. Examples are given from
* generated data
* DGP:
clear
set seed 345
set obs 120
gen e1 = rnormal()
gen x1 = rnormal()
gen x2 = rnormal()
gen x3 = rnormal()
gen y = 2*x1 + 3*x2 + e1
* BOOTSTRAPPING --------------------------------------------------------------
/* bootstrapping standard errors from a statistic can be used by the following:
1. write program (if you have a custom statistic program)
2. load in data
3. drop missing values (STATA will not discern if you have missing
values)
4. drop unneeded variables (this speeds up a bootstrap)
5. set seed
6. run bootstrap
*/
* first drop any missing obs
foreach var in y x1 x2 {
drop if `var' == .
}
reg y x1 x2
bootstrap, reps(1000): regress y x1 x2
/* note how the t-scores are slightly lower in the bootstrap, but the coeffs.
remain the same. for speed we can drop unneeded vars.
*/
timer clear
timer on 1
bootstrap, reps(1000): regress y x1 x2
timer off 1
timer on 2
drop x3
bootstrap, reps(1000): regress y x1 x2
timer off 2
timer list
* we get moderate gains even with only 120 obs and 4 variables
* Jack-knifing ---------------------------------------------------------------
jackknife coef=_b[x1]: reg y x1 x2
/* bootstrap is more efficient than JK-ing, but jackknife is still good at
getting influential observations. first let's add an influential point
*/
replace x1 = 30 in 20
hist x1
* now we add a standard deviation jknife and skewness, keeping the variables
jackknife sd=r(sd) skew=r(skewness), rclass keep: summarize x1, det
summarize sd skew, detail
list x1 if sd > 10
* Monte Carlo Simulation -----------------------------------------------------
* MCs simulate finite data from a hypothesized model...usually we want to
* examine our confidence intervals to verify that they are the `true' intervals
* first, lets program a simple regression where we can set the observations
* and some c that changes the x vars
capture program drop myreg
program myreg
version 12
clear
args obs c
set obs `obs'
gen x1 = rnormal()
gen x2 = rnormal()
gen y = 2*x1*`c' + 3*x2*`c' + rnormal()
regress y x1 x2
end
* now the monte carlo part. we specify 100 observations and a coefficient
* `changer' of 1
set seed 345
simulate _b _se, reps(1000) nodots: myreg 100 1
su
simulate _b _se, reps(1000) nodots: myreg 1000 1
su
* to see what's going on, another way to do so is to hand-write our program:
clear
set seed 2355
cd "/Users/andyphilips/Documents/Methods/Stata Help"
set obs 100
* we can set our values as before:
scalar constant = 0
scalar x1slope = 2
scalar x2slope = 3
scalar sigma = 1
scalar alpha = 0.05
* now we make up our equation
gen x1 = rnormal()
gen x2 = rnormal()
gen y_true = constant + x1slope*x1 + x2slope*x2
gen epsilon = .
gen y = .
tempname simulation
* how many reps you want
global reps = 1000
* now we can loop our program
postfile `simulation' b1 b2 se1 se2 coverage1 coverage2 using results, replace
quietly {
forv i = 1/$reps {
replace epsilon = rnormal(0,sigma)
replace y = y_true + epsilon
reg y x1 x2
scalar b1 = _b[x1]
scalar b2 = _b[x2]
scalar se1 = _se[x1]
scalar se2 = _se[x2]
scalar lb1 = b1 -se1*invttail(e(df_r),alpha/2)
scalar ub1 = b1 +se1*invttail(e(df_r),alpha/2)
scalar lb2 = b2 -se2*invttail(e(df_r),alpha/2)
scalar ub2 = b2 +se2*invttail(e(df_r),alpha/2)
scalar pv1 = x1slopelb1
scalar pv2 = x2slopelb2
post `simulation' (b1) (b2) (se1) (se2) (pv1) (pv2)
}
}
postclose `simulation'
use results, clear
summarize