------------------------------------------------------------------------------ name: log: \\cantus.ads.warwick.ac.uk\User45\u\u1874478\Documents\warwick\te > aching\econometrics-1\exercises\term-1\stata\ex1_201920_log.log log type: text opened on: 9 Sep 2019, 10:19:20 . . ***************************************** . * Econometrics 1 . * University of Warwick . * Exercise 1: Introduction . ****************************************** . *NOTE: Ensure that the right directory is set before running the do files . *Type help cd in the command window to know how to set your work directory . . *q2: load data into Stata . use icfforworkbook, clear . . *q3: review data in data editor . browse . . *q4: open a log file to record output . *see line 1 of this DO file . . *q5: describe data . describe Contains data from icfforworkbook.dta obs: 5,144 vars: 19 01 JUL 2019 09:32 ------------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------------ casenew int %8.0g New random ID number weighta double %12.0g Annual weight P550tpr double %12.0g Total expenditure (top coded) P344pr double %12.0g Gross normal weekly household income (top coded) P425r byte %8.0g P425r Main source of household income (recoded) A172 byte %8.0g A172 Internet connection in household A093r byte %8.0g A093r Economic position of Household Reference Person (recoded) A094r byte %8.0g A094r NS-SEC 3 Class of Household Reference Person (recoded from NS-SEC 12) A121r byte %8.0g A121r Tenure - type (recoded) SexHRP byte %8.0g SexHRP Sex of Household Reference Person A049r byte %8.0g A049r Household size; number of persons in HH (recoded) G018r byte %8.0g G018r Number of adults (recoded) G019r byte %8.0g G019r Number of children (recoded) Gorx byte %8.0g Gorx Government Office Region modified weightar double %12.0g Weight (rescaled) maininc byte %8.0g maininc Main source of household income (recoded, P425-1) income float %9.0g Income expenditure float %9.0g Total expenditure (top coded, formerly P550tpr) hhsize byte %8.0g Household size, number of people in household (recoded)formerly A049r ------------------------------------------------------------------------------ Sorted by: . . *q6: data codebook to examine the dataset . codebook, compact Variable Obs Unique Mean Min Max Label ------------------------------------------------------------------------------ casenew 5144 4362 7487.702 1 14999 New random ID number weighta 5144 3283 5.21809 .001 24.5 Annual weight P550tpr 5144 4887 479.7584 30.525 1175 Total expenditure (... P344pr 5144 4244 620.4336 0 1184.99 Gross normal weekly... P425r 5144 2 1.447706 1 2 Main source of hous... A172 5144 2 1.177294 1 2 Internet connection... A093r 5144 4 2.336703 1 4 Economic position o... A094r 5144 5 2.916213 1 5 NS-SEC 3 Class of H... A121r 5144 3 2.502722 1 3 Tenure - type (reco... SexHRP 5144 2 1.38647 1 2 Sex of Household Re... A049r 5144 5 2.328538 1 5 Household size; num... G018r 5144 4 1.810848 1 4 Number of adults (r... G019r 5144 3 1.471229 1 3 Number of children ... Gorx 5144 12 6.140747 1 12 Government Office R... weightar 5144 3283 .9999981 .0001916 4.695196 Weight (rescaled) maininc 5144 2 .4477061 0 1 Main source of hous... income 5144 4244 620.4336 0 1184.99 Income expenditure 5144 4887 479.7584 30.525 1175 Total expenditure (... hhsize 5144 5 2.328538 1 5 Household size, num... ------------------------------------------------------------------------------ . . *q7: summarise expenditure . su expenditure Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- expenditure | 5,144 479.7584 292.3652 30.525 1175 . su expenditure, detail Total expenditure (top coded, formerly P550tpr) ------------------------------------------------------------- Percentiles Smallest 1% 64.05 30.525 5% 111.0905 31.952 10% 153.5125 32.13598 Obs 5,144 25% 254.0006 32.355 Sum of Wgt. 5,144 50% 419.9034 Mean 479.7584 Largest Std. Dev. 292.3652 75% 645.0512 1175 90% 934.4652 1175 Variance 85477.43 95% 1171.163 1175 Skewness .8262061 99% 1175 1175 Kurtosis 2.937132 . *summarise income . su income Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- income | 5,144 620.4336 359.1557 0 1184.99 . su income, detail Income ------------------------------------------------------------- Percentiles Smallest 1% 56 0 5% 138.1 0 10% 180.03 0 Obs 5,144 25% 314.0237 0 Sum of Wgt. 5,144 50% 563.15 Mean 620.4336 Largest Std. Dev. 359.1557 75% 928.087 1184.99 90% 1184.99 1184.99 Variance 128992.8 95% 1184.99 1184.99 Skewness .2958553 99% 1184.99 1184.99 Kurtosis 1.802326 . *histograms of expenditure and income . histogram expenditure, percent title("Histogram of weekly expenditure") (bin=37, start=30.525, width=30.931757) . histogram income, percent title("Histogram of weekly income") (bin=37, start=0, width=32.026756) . . *q8: create variables after trimming the top 1% of income and expenditure . clonevar inc = income . qui sum income, detail . replace inc =. if income>=`r(p99)' (785 real changes made, 785 to missing) . la var inc "Income" . . clonevar exp = expenditure . qui sum expenditure, detail . replace exp =. if expenditure>=`r(p99)' (256 real changes made, 256 to missing) . la var exp "Total expenditure" . . *q9: histogram after trimming top 1% of expenditure . histogram exp, /// > percent title("Histogram of weekly expenditure, Top 1% trimmed") (bin=36, start=30.525, width=31.717523) . graph save fig1_exercise_sheet1.gph, replace (file fig1_exercise_sheet1.gph saved) . . histogram inc, /// > percent title("Histogram of weekly income, Top 1% trimmed") (bin=36, start=0, width=32.914235) . graph save fig2_exercise_sheet1.gph, replace (file fig2_exercise_sheet1.gph saved) . . *q10: summary of exp by main source of income . bys maininc: su exp ------------------------------------------------------------------------------ -> maininc = earnings Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- exp | 2,637 540.3983 237.4025 38.945 1172.356 ------------------------------------------------------------------------------ -> maininc = other so Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- exp | 2,251 329.6521 217.9593 30.525 1161.778 . . *q11: histogram of exp by main income source . histogram exp, percent by(maininc) . . *q12: mean of expenditure by main income source and internet connection . *ren A172 internet . gen internet = 0 . replace internet = 1 if A172 == 1 (4,232 real changes made) . . bys maininc internet: su exp ------------------------------------------------------------------------------ -> maininc = earnings, internet = 0 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- exp | 158 321.5789 169.7532 64.01518 1034.258 ------------------------------------------------------------------------------ -> maininc = earnings, internet = 1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- exp | 2,479 554.3448 234.2653 38.945 1172.356 ------------------------------------------------------------------------------ -> maininc = other so, internet = 0 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- exp | 749 209.0654 133.4172 30.525 982.8839 ------------------------------------------------------------------------------ -> maininc = other so, internet = 1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- exp | 1,502 389.7849 226.8636 45.58 1161.778 . . *q13: Testing equality of means by main income sources in weekly expenditure . ttest exp, by(maininc) unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- earnings | 2,637 540.3983 4.623068 237.4025 531.3331 549.4635 other so | 2,251 329.6521 4.593965 217.9593 320.6433 338.661 ---------+-------------------------------------------------------------------- combined | 4,888 443.3464 3.598868 251.6121 436.291 450.4018 ---------+-------------------------------------------------------------------- diff | 210.7462 6.517459 197.969 223.5233 ------------------------------------------------------------------------------ diff = mean(earnings) - mean(other so) t = 32.3356 Ho: diff = 0 Satterthwaite's degrees of freedom = 4860.16 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000 . . *q14: Testing equality of means by access to internet in weekly expenditure . ttest exp, by(internet) unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- 0 | 907 228.6654 4.870152 146.6717 219.1073 238.2234 1 | 3,981 492.2577 3.880372 244.8327 484.65 499.8654 ---------+-------------------------------------------------------------------- combined | 4,888 443.3464 3.598868 251.6121 436.291 450.4018 ---------+-------------------------------------------------------------------- diff | -263.5923 6.227011 -275.8037 -251.3809 ------------------------------------------------------------------------------ diff = mean(0) - mean(1) t = -42.3305 Ho: diff = 0 Satterthwaite's degrees of freedom = 2217.97 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000 . . *q15: Testing equality of means by main income sources and internet access i > n weekly expenditure . anova exp maininc internet maininc#internet Number of obs = 4,888 R-squared = 0.2531 Root MSE = 217.519 Adj R-squared = 0.2526 Source | Partial SS df MS F Prob>F -----------------+---------------------------------------------------- Model | 78305396 3 26101799 551.67 0.0000 | maininc | 8790355.4 1 8790355.4 185.79 0.0000 internet | 19576606 1 19576606 413.76 0.0000 maininc#internet | 310169.89 1 310169.89 6.56 0.0105 | Residual | 2.311e+08 4,884 47314.485 -----------------+---------------------------------------------------- Total | 3.094e+08 4,887 63308.644 . . *q16: Plotting exp agains income categories . gen inc_cat = 1 if (income <250) (4,224 missing values generated) . replace inc_cat = 2 if (income >= 250 & income < 500) (1,382 real changes made) . replace inc_cat = 3 if (income >= 500 & income < 750) (1,045 real changes made) . replace inc_cat = 4 if (income >= 750 & income < 1000) (675 real changes made) . replace inc_cat = 5 if (income >= 1000) (1,122 real changes made) . . label define inc_cat_l 1 "<250" 2 "250-499" 3 "500-749" 4 "750-999" 5 ">= 10 > 00" . label val inc_cat inc_cat_l . . graph bar (mean) exp, over(inc_cat, label(angle(forty_five))) by(, title(Ave > rage expenditure over Income categories by income source)) by(maininc) . . *q17: Testing equality of means of exp over income categories . oneway exp inc_cat Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 144218311 4 36054577.8 1065.89 0.0000 Within groups 165171032 4883 33825.7284 ------------------------------------------------------------------------ Total 309389343 4887 63308.644 Bartlett's test for equal variances: chi2(4) = 201.4349 Prob>chi2 = 0.000 . oneway exp inc_cat, tabulate | Summary of Total expenditure inc_cat | Mean Std. Dev. Freq. ------------+------------------------------------ <250 | 203.15214 140.87343 918 250-499 | 340.42794 169.71232 1,374 500-749 | 476.85692 191.6519 1,023 750-999 | 569.8331 199.32621 645 >= 1000 | 708.47917 218.61116 928 ------------+------------------------------------ Total | 443.34642 251.61209 4,888 Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 144218311 4 36054577.8 1065.89 0.0000 Within groups 165171032 4883 33825.7284 ------------------------------------------------------------------------ Total 309389343 4887 63308.644 Bartlett's test for equal variances: chi2(4) = 201.4349 Prob>chi2 = 0.000 . . *q18: Plotting and testing equality of mean expenditure against income categ > ories by main income source, for hosueholds with an internet connection . graph bar (mean) exp if internet == 1, over(inc_cat, label(angle(forty_five) > )) by(, title(Average expenditure over Income categories by income source) > subtitle("(For households with internet)")) by(maininc) . . oneway exp inc_cat if (internet == 1) Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 93899684.9 4 23474921.2 645.15 0.0000 Within groups 144673693 3976 36386.7438 ------------------------------------------------------------------------ Total 238573378 3980 59943.0598 Bartlett's test for equal variances: chi2(4) = 75.7364 Prob>chi2 = 0.000 . . *save data . save icfforworkbook_1, replace file icfforworkbook_1.dta saved . . * close do file . log close name: log: \\cantus.ads.warwick.ac.uk\User45\u\u1874478\Documents\warwick\te > aching\econometrics-1\exercises\term-1\stata\ex1_201920_log.log log type: text closed on: 9 Sep 2019, 10:19:24 ------------------------------------------------------------------------------