Reproducible Research and Dynamic Documents in Stata

 
Reproducible Research
And
Dynamic Documents
in Stata
 
 
 
E. F. Haghish
University of Freiburg
 
Part 1: Reproducible Analyses
 
Why data analysis should be reproducible?
How can we communicate the results of the
analysis effectively?
What kind of errors/obstacles might make the
procedure inefficient.
 
It Is The Statistics Era!
 
Faster and 
cheaper
 computers
Data is 
everywhere
Internet makes gathering data easy and cheap
Many jobs are available in data science
Quantitative studies are flourishing
 
 
Changes in the traditional
Statistics practice
 
Doing more data analysis 
compared to the past
Many exploratory analyses might be done which
never make it to the published work
Analyses are 
shared over the internet 
with
colleagues
Writing scientific publication has become more
cooperative
 than before
Statistical programming 
has become popular
Web-based and 
interactive statistical applications
are emerging
 
Reproducible Analysis
 
Reproducible Research 
is more general term,
pertaining to broader scope
Unreproducible quantitative research 
= Unreliable
Results
Unintentional Errors 
can happen at any stage of
research. Design, procedure, assessments, data
collection, data management and preparation,
analysis procedure, witting, and publication
We can devide these errors to two parts. Pre-
digitalized data and after digitalized data
Reproducible Analysis focuses on the procedure
from the time data is digitalized till witting the results
 
Obstacles of Statistical
Analysis
 
A major problem in social sciences where students are
tought statistics using ”
mouse
 & 
click
”, although most
software support writing syntax
Easy to forget
Not reproducible
Slow
Mistakes cannot be corrected
Cannot be supervised or checked
The procedure cannot be shared
Cannot be reused
 
Improving
Reproducibility?
 
Writing syntax alone does not guarantee reproducibility
Make the codes easily rerunable, by connecting the
whole procedure within a ”
main file
do
 and 
run
 cause Stata to execute the commands
stored in filename just as if they were entered from the
keyboard.  do echoes the commands as it executes
them, whereas run is silent.
Store different codes in separete files and connect them
into the master file. Then, running the master file will run
every code in the right order.
Always use the raw data to avoid confusion.
Comment your codes. 
Always assume you are intending
to share your codes
. Write beautifully and explain the
code, when it’s needed.
 
Example of Master file
 
use
 rawdata.dta, clear
 
do
 preparation.do
do
 descriptive.do
do
 analysis.do
do
 report.do
 
In the Master file, the procedure becomes observable in a
logical order. If the data analysis is complicated and the
number of files increases, it makes re-reading the project
analysis much easier, faster, and more efficient.
 
Dynamic document can be written within the do files. This
procedure is specially useful for Weaver package.
 
Dynamic Documents
 
Literate Programming
Producing analysis reports
Taking notes of complex statistics procedures
Teaching statistics
 
Markup Language
 
Markup, in a broad sense is a ”computer language”
used for annoting, formatting, and styling a
document using text tags.
Example: HTML, RTF, XML, LaTeX
 
HTML Markup Example
 
Try it now 
http://www.onlinehtmleditor.net/
 
 
 
 
LaTeX Markup Example
 
Very sophisticated, and you can do literally
anything with it, yet, keep your document light and
fast. 
Try it online at 
http://papeeria.com/
 
Markdown
 
Invented by John Gruber (2004), it is a light-weight
markup language
There are different versions of it available, which are
developed by other programmers
It’s very 
popular
It has very 
simple syntax
 for annoting document
But it is limited and is not as sophisticated as HTML or
LaTeX
It is used for creating a ”Standard Document” that
only has the most essentials
.
It supports headings, paragraph, basic tables,
adding image and link, making text bold, italic, etc
 
Markdown
 
In contrast to HTML and LaTeX, Markdown focuses
merely on the ”content” of the document and does
not provide anything for changing the formatting of
the document.
 
The streangth of Markdown, is its simplicity.
 
After exporting Microsoft Word 
docx
,  reduce the
left and right margins of the document to 
1 cm
.
 
Markdown
 
Make text Italic
 
*text*
 
_text_
 
Mak text bold
 
**text**
 
__text__
Italic and bold
 
***text***
 
___text___
 
Markdown
 
Header 1
 
 
This is Header 1
 
============
 
Header 2
 
 
This is Header 2
 
---------------------
 
Markdown
 
Alternatively, headers can be specified at the
beginning of the text using hashtags
 
 
#
 This is header 1
 
##
 This is header 2
 
###
 This is header 3
 
####
 This is header 4
 
#####
 This is header 5
 
######
 This is header 6
 
Markdown
 
Markdown
 
Adding web link
[text](
http://url.com/
)
 
Adding Image
!
 [explanation](
./path.png
)
 
Note that the image CANNOT be resized or aligned
in the document. It will be imported in its current
dimentions, and always placed at the left side of
the document. If the image is in a large size, it will
ruin the document, especially in 
Microsoft Office
,
Office Libre
, and 
PDF
 formats.
 
Markdown
 
Creating an ordered list
 
1.
Apple
2.
Orange
3.
Cherry
 
Markdown
 
Creating unordered list, which also can be nested
using tab.
 
* Abacus
* answer
* Bubbles
 
1. bunk
 
2. bupkis
* BELITTLER
 
3. burper
* Cunning
 
Markdown
 
To add a horisontal line
 
---
 
* * *
 
To begin a new line, 
leave one line empty between
the paragraphs.
 
To avoid line wrap, leave 
2 or more spaces 
at the
end of the line
 
Remember!
 
Write your document 
ONLY WITH ONE MARKUP
LANGUAGE
 
Markdown’s simplicity can improve the readability
of your document, so consider writing with
Markdown unless you want to write a very
sophisticated document in LaTeX or HTML…
 
Part 2: Lab Session
 
3 software are taught in the Lab session
 
o
MarkDoc
o
Weaver
o
CodeMap
 
 
 
ssc install 
markdoc
ssc install 
weaver
http://www.haghish.com/codemap/download.php
 
CodeMap only works on Mac
 
MarkDoc vs Weaver
 
Weaver only creates 
HTML
 and 
PDF
 
MarkDoc creates 
HTML
, 
PDF
, Microsoft Word 
DOCX
, Open
Office 
ODT
, and 
LaTeX
 
Weaver is very robust and is completely programmed in Stata
MarkDoc relies on third-party software, named 
Pandoc
 which
is a document convertor
MarkDoc
 is suitable for writing documents that include a lot of
text. Also, when the author intends to do further work on the
generated Docx, LaTeX, etc.
Weaver
 is suitable for briefly explaining the results of a data
analysis and sharing the PDF. Weaver also provides live-
preview of the document while weaving.
 
MarkDoc
 
Everything should be wrapped in smcl log file
 
qui log 
using example, 
replace
 
 
qui log c
 
  
//removes this command from the document
markdoc
 example, export(html) replace
 
MarkDoc
 
See 
markdoc-text.do
Text is written as comment inside the log file and can be
written using 3 markup languages, Markdown, HTML,
and LaTeX
This do file includes 3 documents written in Markdown,
HTML, and LaTeX. Which one is nicer?
 
/*
Writing text in MarkDoc
=======================
 
This is heading 2
-----------------
 
Text should be written as comment
*/
 
MarkDoc
 
Stata commands are used between the
commands as usual. MarkDoc automatically
include them in the document, regardless of the
markup language you are writing with.
 
There are many ways for adding an image/figure to
the document. HTML, PDF, and LaTeX formats are
very versatile but for adding an image to Microsoft
Word document only Markdown can be used.
 
Writing Dynamic Text
 
Use macros or returned values with text to refer to
them. The 
txt
 command allows writing text to the
document. This cannot be done within the
comments signs because the macros will not be
interpreted.
 
The 
txt
 command can also contain markup signs.
 
Hiding Commands
 
Use 
/**/ 
before a command to hide it. This DOES
NOT hide the output. To hide the output use Stata
quietly
” command
 
Using “
qui log on
” and “
qui log off
” you can
excludes some parts of the codes and results from
the document.
 
See 
markdoc_dynamic_text.do
 
Dynamic Tables
 
MarkDoc can also create dynamic tables
 
use 
tble
 command
 
 
Stata Journal Publications
 
Use Markdown to create LaTeX files
Use 
style(stata)
Use 
texmaster
 option
 
Weaver
 
Weaver has a set of commands for writing the
document.
weave
 for starting a new document
div
 puts the commands and results in separate
frames
img
 works the same as in markDoc
knit
 for writing dynamic text
report
 for printing a PDF while working on the
document
weavend
 for closing the document
 
Weaver
 
codes
 only shows the command
results
 only shows the results
 
CodeMap
 
For understanding the structure of a complicated
Statistical package or data analysis
It reveals the connections between code files and
functions
Useful for high-end 
users are interested to learn
statistical programming
.
Slide Note

----- Meeting Notes (01/09/15 12:14) -----

What's your experience with Stata?

You have your Laptops

Embed
Share

This content discusses the importance of reproducible analyses in Stata, challenges in traditional statistical practices, and methods to improve reproducibility. The focus is on enhancing research integrity and efficiency through reproducible analysis techniques.

  • Reproducible Research
  • Stata
  • Data Analysis
  • Statistical Programming
  • Research Integrity

Uploaded on Mar 02, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Reproducible Research And Dynamic Documents in Stata E. F. Haghish University of Freiburg

  2. Part 1: Reproducible Analyses Why data analysis should be reproducible? How can we communicate the results of the analysis effectively? What kind of errors/obstacles might make the procedure inefficient.

  3. It Is The Statistics Era! Faster and cheaper computers Data is everywhere Internet makes gathering data easy and cheap Many jobs are available in data science Quantitative studies are flourishing

  4. Changes in the traditional Statistics practice Doing more data analysis compared to the past Many exploratory analyses might be done which never make it to the published work Analyses are shared over the internet with colleagues Writing scientific publication has become more cooperative than before Statistical programming has become popular Web-based and interactive statistical applications are emerging

  5. Reproducible Analysis Reproducible Research is more general term, pertaining to broader scope Unreproducible quantitative research = Unreliable Results Unintentional Errors can happen at any stage of research. Design, procedure, assessments, data collection, data management and preparation, analysis procedure, witting, and publication We can devide these errors to two parts. Pre- digitalized data and after digitalized data Reproducible Analysis focuses on the procedure from the time data is digitalized till witting the results

  6. Obstacles of Statistical Analysis A major problem in social sciences where students are tought statistics using mouse & click , although most software support writing syntax Easy to forget Not reproducible Slow Mistakes cannot be corrected Cannot be supervised or checked The procedure cannot be shared Cannot be reused

  7. Improving Reproducibility? Writing syntax alone does not guarantee reproducibility Make the codes easily rerunable, by connecting the whole procedure within a main file do and run cause Stata to execute the commands stored in filename just as if they were entered from the keyboard. do echoes the commands as it executes them, whereas run is silent. Store different codes in separete files and connect them into the master file. Then, running the master file will run every code in the right order. Always use the raw data to avoid confusion. Comment your codes. Always assume you are intending to share your codes. Write beautifully and explain the code, when it s needed.

  8. Example of Master file use rawdata.dta, clear do preparation.do do descriptive.do do analysis.do do report.do In the Master file, the procedure becomes observable in a logical order. If the data analysis is complicated and the number of files increases, it makes re-reading the project analysis much easier, faster, and more efficient. Dynamic document can be written within the do files. This procedure is specially useful for Weaver package.

  9. Dynamic Documents Literate Programming Producing analysis reports Taking notes of complex statistics procedures Teaching statistics

  10. Markup Language Markup, in a broad sense is a computer language used for annoting, formatting, and styling a document using text tags. Example: HTML, RTF, XML, LaTeX

  11. HTML Markup Example Try it now http://www.onlinehtmleditor.net/

  12. LaTeX Markup Example Very sophisticated, and you can do literally anything with it, yet, keep your document light and fast. Try it online at http://papeeria.com/

  13. Markdown Invented by John Gruber (2004), it is a light-weight markup language There are different versions of it available, which are developed by other programmers It s very popular It has very simple syntax for annoting document But it is limited and is not as sophisticated as HTML or LaTeX It is used for creating a Standard Document that only has the most essentials. It supports headings, paragraph, basic tables, adding image and link, making text bold, italic, etc

  14. Markdown In contrast to HTML and LaTeX, Markdown focuses merely on the content of the document and does not provide anything for changing the formatting of the document. The streangth of Markdown, is its simplicity. After exporting Microsoft Word docx, reduce the left and right margins of the document to 1 cm.

  15. Markdown Make text Italic *text* _text_ Mak text bold **text** __text__ Italic and bold ***text*** ___text___

  16. Markdown Header 1 This is Header 1 ============ Header 2 This is Header 2 ---------------------

  17. Markdown Alternatively, headers can be specified at the beginning of the text using hashtags # This is header 1 ## This is header 2 ### This is header 3 #### This is header 4 ##### This is header 5 ###### This is header 6

  18. Markdown

  19. Markdown Adding web link [text](http://url.com/) Adding Image ! [explanation](./path.png) Note that the image CANNOT be resized or aligned in the document. It will be imported in its current dimentions, and always placed at the left side of the document. If the image is in a large size, it will ruin the document, especially in Microsoft Office, Office Libre, and PDF formats.

  20. Markdown Creating an ordered list 1. Apple 2. Orange 3. Cherry

  21. Markdown Creating unordered list, which also can be nested using tab. * Abacus * answer * Bubbles 1. bunk 2. bupkis * BELITTLER 3. burper * Cunning

  22. Markdown To add a horisontal line --- * * * To begin a new line, leave one line empty between the paragraphs. To avoid line wrap, leave 2 or more spaces at the end of the line

  23. Remember! Write your document ONLY WITH ONE MARKUP LANGUAGE Markdown s simplicity can improve the readability of your document, so consider writing with Markdown unless you want to write a very sophisticated document in LaTeX or HTML

  24. Part 2: Lab Session 3 software are taught in the Lab session o MarkDoc o Weaver o CodeMap

  25. ssc install markdoc ssc install weaver http://www.haghish.com/codemap/download.php CodeMap only works on Mac

  26. MarkDoc vs Weaver Weaver only creates HTML and PDF MarkDoc creates HTML, PDF, Microsoft Word DOCX, Open Office ODT, and LaTeX Weaver is very robust and is completely programmed in Stata MarkDoc relies on third-party software, named Pandoc which is a document convertor MarkDoc is suitable for writing documents that include a lot of text. Also, when the author intends to do further work on the generated Docx, LaTeX, etc. Weaver is suitable for briefly explaining the results of a data analysis and sharing the PDF. Weaver also provides live- preview of the document while weaving.

  27. MarkDoc Everything should be wrapped in smcl log file qui log using example, replace qui log c markdoc example, export(html) replace //removes this command from the document

  28. MarkDoc See markdoc-text.do Text is written as comment inside the log file and can be written using 3 markup languages, Markdown, HTML, and LaTeX This do file includes 3 documents written in Markdown, HTML, and LaTeX. Which one is nicer? /* Writing text in MarkDoc ======================= This is heading 2 ----------------- Text should be written as comment */

  29. MarkDoc Stata commands are used between the commands as usual. MarkDoc automatically include them in the document, regardless of the markup language you are writing with. There are many ways for adding an image/figure to the document. HTML, PDF, and LaTeX formats are very versatile but for adding an image to Microsoft Word document only Markdown can be used.

  30. Writing Dynamic Text Use macros or returned values with text to refer to them. The txt command allows writing text to the document. This cannot be done within the comments signs because the macros will not be interpreted. The txt command can also contain markup signs.

  31. Hiding Commands Use /**/ before a command to hide it. This DOES NOT hide the output. To hide the output use Stata quietly command Using qui log on and qui log off you can excludes some parts of the codes and results from the document. See markdoc_dynamic_text.do

  32. Dynamic Tables MarkDoc can also create dynamic tables use tble command

  33. Stata Journal Publications Use Markdown to create LaTeX files Use style(stata) Use texmaster option

  34. Weaver Weaver has a set of commands for writing the document. weave for starting a new document div puts the commands and results in separate frames img works the same as in markDoc knit for writing dynamic text report for printing a PDF while working on the document weavend for closing the document

  35. Weaver codes only shows the command results only shows the results

  36. CodeMap For understanding the structure of a complicated Statistical package or data analysis It reveals the connections between code files and functions Useful for high-end users are interested to learn statistical programming.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#