stata$in$the$everyday$life$of$a$ health$economist · blog •...
TRANSCRIPT
Stata in the everyday life of a health economist
Pedro Pita Barros
1
Stata and me: what I do?
• Research ar7cles • Books • Data management • Blog and other things
2
Blog • Need treatment of simple data, usually small samples
• Problem: they are in a variety of formats • Solu7on: import from Excel files has become the easier solu7on – most data sources now have this possibility – Sta7s7cs Portugal, Pordata, AMECO / Eurostat, OECD Health Data
• Problem: formaOng changes from 7me to 7me • Solu7on: import to “my” Excel format, which is read by Stata easily
3
4
5
• Next step: export outputs • Graph tools from Stata are great – high degree of flexibility; standard “schemes” are fine (though beYer in color than black/white)
• Export format is ok (png works well with wordpress)
6
7
8
500
1000
1500
2000
Jan-12 Jul-12 Jan-13 Jul-13 Jan-14 Jul-14 Jan-15 Jul-15Mês-Ano
Dívida Dez2011/Mai2012Jan/Out2013 Dez2013/Ago2014Set/Nov2014 Dez2014/Fev2015Fev/Jul2015
• Expor7ng tables of regressions – could be easier – user-‐made commands help, but are not perfect for html/blog
• Print screens look nice graphically but not easy to read
• Wish: have direct export to html (the copy table as html does not work well in my Mac)
9
10
• Version:0.9
• StartHTML:0000000105
• EndHTML:0000001132
• StartFragment:0000000136
• EndFragment:0000001099
• <HTML><BODY><!-‐-‐StartFragment-‐-‐><TABLE BORDER><tr><td>. esYab m1 m2, b not
• (1) (2) • divida_epe divida_epe
• tend1 80.42*** 80.42*** • tend2 -‐190.5*** -‐190.5*** • tend3 34.07*** • tend4 28.47* • tend5 -‐8.370 • tend6 41.56 • tend7 -‐1.611 • ct1 554.7 554.7 • ct2 5406.7*** 5406.7*** • ct3 -‐152.8 -‐96.77 • ct4 825.0*** 825.0*** • ct5 -‐430.7 -‐583.9* • ct6 1206.6 941.9 • ct7 -‐1472.3 -‐1004.6** • ct8 613.8 674.7 • tend346 32.20*** • tend57 -‐2.737
• N 44 44
• * p<0.05, ** p<0.01, *** p<0.001 • </td></tr></TABLE><!-‐-‐EndFragment-‐-‐></BODY></HTML>
11
Data management
• Management of large data sets – Volume of data – 3,3 million observa7ons per month, 24 months – using cycles in “do files”, size of joint database is very large (each file about 168-‐180 MB)
– Joining repeated cross-‐sec7ons – Ex: Standard Income and Living Condi7ons – Eurostat
– Merging different data sets (ex: individual hospital admissions with residen7al data)
12
13
14
Average price per day, each month from a different file All data in a single file makes it very slow (close to 2,5 GB)
Research papers
• Use of regression models available • Flexibility for new likelihood func7ons • New commands available • User-‐built commands
15
Example
16
-‐ Health – latent variable – 5 states -‐ Pharma consump7on (yes/no) -‐ Visits to doctor – number of visits – count variable
17
18
19
20
About 11,000 observa7ons; 2 itera7ons takes roughly 5 hours
Results
21
22
23
Our interest: role of subsystems – impact on number of visits, but not pharma or health directly
• Problems: a “do file” that runs in Stata 13 does not in Stata 14 – not easy to find out why…
• Look for more efficient coding to reduce 7me to solve the problem
24
New commands
• Example: – Treatment effects – matching es7mator (new command)
– Krls – user command for machine-‐learning technique to fit non-‐linear func7on
– Tradi7onal regression • Suicides during the crisis period
25
What happened
26
46
810
12suicidio_ine
1980 1990 2000 2010 2020ano
Aver 2004 (5 years each side)
27
46
810
12suicidio_ine
1980 1990 2000 2010 2020ano
28
29
89
1011
12
2004 2006 2008 2010 2012 2014ano
Linear prediction s2suicidio_ine
Non-‐significant differences – the three methods are roughly giving the same info
Go back 10 years: since 1994
30
46
810
12suicidio_ine
1980 1990 2000 2010 2020ano
31
Impact of crisis – there are differences but not too different
32
-10
010
2030
1995 2000 2005 2010 2015ano
Linear prediction s2_93suicidio_ine
We look at predicted effect – the krls is wildly non-‐linear (unreasonable so)
Stata in the life of a health economist
• Helpful in many ways • Could be improved to interact with new social media
• Works very well for management of large data sets (speed can be an issue)
• Flexible enough to accommodate new models / likelihood func7ons
• Great to test different technique using both official and community-‐based commands
33