class: center, middle, inverse, title-slide .title[ # Reproducible reports with Markdown ] --- <!-- ## Literate programming Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, **let us concentrate rather on explaining to humans what we want the computer to do.** - Donald E. Knuth, Literate Programming, 1984 ## Writing reports - **HTML** - HyperText Markup Language, used to create web pages. Developed in 1993 - **LaTeX** - a typesetting system for production of technical/scientific documentation, PDF output. Developed in 1994 - **Sweave** - a tool that allows embedding of the R code in LaTeX documents, PDF output. Developed in 2002 - **Markdown** - a lightweight markup language for plain text formatting syntax. Easily converted to HTML, PDF, Word, and other formats ## HTML example - HTML files have `.htm` or `.html` extensions - Pairs of tags define content/formatting - `<h1> Header level 1 </h1>` - `<a href="http://www..."> Link </a>` - `<p> Paragraph </p>` .small[ ``` html <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> </head> <body> <h1>Markdown example</h1> <p>This is a simple example of a Markdown document.</p> You can emphasize code with <strong>bold</strong> or <em>italics</em>, or <code>monospace</code> font. </body> </html> ``` ] ## LaTeX example - LaTeX files usually have a `.tex` extension - LaTeX commands define appearance of text, and other formatting structures .small[ ``` latex \documentclass{article} \usepackage{graphicx} \begin{document} \title{Introduction to \LaTeX{}} \author{Author's Name} \maketitle \begin{abstract} This is abstract text: This simple document shows very basic features of \LaTeX{}. \end{abstract} \section{Introduction} ``` ] .small[ http://www.electronics.oulu.fi/latex/examples/example_1/ ] ## Sweave example - Sweave files typically have an `.Rnw` extension - LaTeX for text, `<<chunk_name>>= <code> @` syntax outlines code blocks .small[ ``` latex \documentclass{article} \usepackage{amsmath} \DeclareMathOperator{\logit}{logit} % \VignetteIndexEntry{Logit-Normal GLMM Examples} \begin{document} First we attach the data. <<gapminder>>= library(dslabs) data(gapminder) attach(gapminder) @ ``` ] --> ## Markdown - Lightweight markup language - Uses plain text - Simple, human-readable syntax - Used for formatting documents, including slides - Simplifies reproducible content creation and editing --- ## R Markdown - R Markdown is a Markdown markup language extended with the ability to add R code - The goal is to separate code and content, but also to prioritize human-readability - You can learn Markdown in about 5 minutes. If you can write an email, you can write Markdown. See "Help -> Markdown Quick Reference" (also, "Cheatsheets") - RStudio has "Visual" mode for RMarkdown editing <!-- - Can be converted in various document types, including HTML, PDF, MS Word, Beamer, HTML5 slides, Tufte-style handouts, books, dashboards, shiny applications, scientific articles, websites, and more --> <!-- - Fully supported in RStudio. Or, use a desktop Markdown editor like `MarkdownPad` (Windows) or `MacDown` (Mac). --> .small[ http://bioconnector.github.io/markdown - Markdown Reference ] <!-- ## Basic Markdown Syntax Regardless of your chosen output format, some basic syntax will be useful: - Section headers - Text emphasis - Lists - `R` code --> --- ## Section Headers To set up different sized header text in your document, use `#` for Header 1, `##` for Header 2, and `###` for Header 3. ``` rmarkdown # Table of content ## Chapter 1 ### Introduction ``` Renders as # Table of content ## Chapter 1 ### Introduction --- ## Text emphasis - *Italicize* text via `*Italicize*` or `_Italicize_` - **Bold** text via `**Bold**` or `__Bold__` --- ## Unordered Lists This code ``` text - Item 1 - Item 2 + Item 2a + Item 2b ``` Renders these bullets (sub-lists need 1 tab or 4 spaces!) - Item 1 - Item 2 + Item 2a + Item 2b --- ## Ordered Lists This code ``` text 1. Item 1 2. Item 2 + Item 2a + Item 2b ``` Renders this list (be advised - the bullets may not look great in all templates) 1. Item 1 2. Item 2 + Item 2a + Item 2b --- ## Markdown syntax ``` text superscript^2^ ~~strikethrough~~ Links http://example.com [linked phrase](http://example.com) Images   ``` --- ## Markdown syntax ``` text Blockquotes A friend once said: > It's always better to give > than to receive. Horizontal Rule / Page Break ****** ------ Tables First Header | Second Header ------------- | ------------- Content Cell | Content Cell Content Cell | Content Cell ``` .small[ https://www.tablesgenerator.com/markdown_tables ] --- ## Creating R markdown document .pull-left[ - Regular text file with `.Rmd` extension - Create manually, or use RStudio - "File -> New File" menu provides options for creating various R documents ] .pull-right[<img src="img/create_rmarkdown.png" height=450 > ] --- ## R Markdown formats from RStudio - **Documents**: HTML, PDF, Word, ODT, RTF - **Presentations**: ioslides, beamer, reveal.js, Slidy, Xaringan, PowerPoint - **Journals**: template packages for major journals/publishers - **Web sites**: bookdown, blogdown, pkgdown, flexdashboard, Shiny, GitHub document More .small[https://rmarkdown.rstudio.com/formats.html] --- ## YAML header (think settings) - YAML - YAML Ain't Markup Language - YAML is a simple text-based format for specifying data, like JSON ``` yaml --- title: "Untitled" author: ”Your Name" date: ”Current date" output: html_document --- ``` `output` is the critical part - it defines the output format. Can be `pdf_document` or `word_document`, other formats available --- ## YAML header for a PDF presentation ``` yaml --- title: "Reproducible reports with Markdown, knitr" author: "Mikhail Dozmorov" date: "2024-07-17" output: beamer_presentation: # colortheme: seahorse colortheme: dolphin fig_caption: no fig_height: 6 fig_width: 7 fonttheme: structurebold # theme: boxes theme: AnnArbor --- ``` --- ## YAML header for a Word document ``` yaml --- bibliography: [3D_refs.bib,brain.bib] csl: styles.ref/genomebiology.csl output: word_document: reference_docx: styles.doc/NIH_grant_style.docx pdf_document: default html_document: default --- ``` - The `reference_docx: styles.doc/NIH_grant_style.docx` is a Word document with the desired formatting. Change font styles by right-clicking on the font (e.g., "Normal") and select "Modify" --- ## Inline `R` Code - To use `R` within a line, use the syntax, wrapped in single forward ticks (can't be displayed) ``` ## `r dim(mtx)` ``` - This can be useful to refer to estimates, confidence intervals, p-values, etc. in the body of an article/homework without worrying about copy errors. --- ## Large code chunks Marked with triple backticks ```{r optionalChunkName, echo=TRUE, results='hide'} # R code here ``` - `Command+Option+I` (`Ctrl+Alt+I` on Windows) inserts R code chunk - `Insert` button in RStudio  --- ## Modifying the behavior of R code chunks Chunk options, comma-separated - `echo=FALSE` - hides the code, but not the results/output. Default: TRUE - `eval=FALSE` - disables code execution. Default: TRUE - `cache=TRUE` - turn on caching of calculation-intensive chunk. Default: FALSE - `fig.width=##`, `fig.height=##` - customize the size of a figure generated by the code chunk <!-- - `include`: (`TRUE` by default) if this is set to `FALSE` the R code is still evaluated, but neither the code nor the results are returned in the output document --> <!-- ## Modifying the behavior of R code chunks - `results="hide"` - hides the results/output. - `markup` (the default) takes the result of the R evaluation and turns it into markdown that is rendered as usual - `hold` - will hold all the output pieces and push them to the end of a chunk. Useful if you're running commands that result in lots of little pieces of output in the same chunk - `hide` will hide results - `asis` writes the raw results from R directly into the document. Only really useful for tables .small[ The full list of options: http://yihui.name/knitr/options/ ] ## Global chunk options - Some options you would like to set globally, instead of typing them for each chunk ``` r knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='img/’, cache.path='cache/', cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE) ``` - `warning=FALSE` and `message=FALSE` suppress any R warnings or messages from being included in the final document - `fig.path='img/'` - the figure files get placed in the `img` subdirectory. (Default: not saved at all) ## Caching - The `cache=` option is automatically set to `FALSE`. That is, every time you render the Rmd, all the R code is run again from scratch - If you use `cache=TRUE`, for this chunk, knitr will save the results of the evaluation into a directory that you specify, e.g., `cache.path='cache/'`. When you re-render the document, `knitr` will first check if there are previously cached results under the cache directory before really evaluating the chunk - if cached results exist and this code chunk has not been changed since last run (use MD5 sum to verify), the cached results will be (lazy-) loaded, otherwise new cache will be built - if a cached chunk depends on other chunks (see the `dependson` option) and any one of these chunks has changed, this chunk must be forcibly updated (old cache will be purged) .small[ [Documentation for caching](http://yihui.name/knitr/demo/cache/) ] ## An example of R Markdown document .center[ <img src="img/rmarkdown_doc_example.png" height=500> ] --> --- ## KnitR - `KnitR` - Elegant, flexible, and fast dynamic report generation written in R Markdown. PDF, HTML, DOCX output. Developed in 2012 by Yihui Xie ``` r install.packages('knitr', dependencies = TRUE) ``` To render a pdf from R Markdown, you need to have a version of TeX installed on your computer. Like R, TeX is open source software. The `tinytex` R package is the fastest way to have TeX on your computer .small[ https://github.com/yihui/knitr, https://yihui.org/tinytex/] --- ## Displaying data as tables - KnitR has built-in function to display a table ``` r data(mtcars) knitr::kable(head(mtcars)) ``` - `pander` package allows more customization ``` r pander::pander(head(mtcars)) ``` --- ## Displaying data as tables - `xtable` package has even more options ``` r xtable::xtable(head(mtcars)) ``` - DT package, an R interface to the DataTables library ``` r DT::datatable(mtcars) ``` .small[ http://bioconnector.github.io/markdown/#!rmarkdown.md#Printing_tables_nicely https://cran.r-project.org/web/packages/xtable/vignettes/xtableGallery.pdf https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html ] --- ## Including figures - Plots may be generated by R code and displayed in the output document - Existing image files like `*.jpg`, `*.png`, may be inserted like: ``` text   ``` - Alternatively, use `knitr` capabilities: ``` r {r, out.width = '300px', echo=FALSE} knitr::include_graphics('img/bandThree2.png') ``` - For PDF output, use LaTeX syntax: ``` latex \begin{center} \includegraphics[height=170px]{img/bioinfo3.png} \end{center} ``` --- ## Customizing Figures The `fig.cap` option allows you to specify the caption for the figure generated by a given chunk: ````r ```{r caption, fig.cap="I am the caption"} plot(pressure) ``` ```` The `fig.height` and `fig.width` options let you specify the dimensions of your plots: ````r ```{r caption, fig.height = 4, fig.width = 8} plot(pressure) ``` ```` <!-- ## Creating the final report - Markdown documents (`*.md` or `*.Rmd`) can be converted to HTML using `markdown::markdownToHTML('markdown_example.md', 'markdown_example.html')` - Another option is to use `rmarkdown::render('markdown_example.md’)`. At the backend it uses `pandoc` command line tool, installed with Rstudio - Rstudio - one button that invokes `knit2html()`, `knit2pdf()` functions in the background **Note**: `KnitR` compiles the document in an R environment separate from yours (think `Makefile`). Do not use `./Rprofile` file - it loads into your environment only. .small[ http://pandoc.org/ ] --> <!-- ## Things to include in your final report `set.seed(12345)` - initialize random number generator Include `session_info()` at the end - outputs all packages/versions used ```{r sessionInfo} diagnostics <- devtools::session_info() platform <- data.frame(diagnostics$platform %>% unlist, stringsAsFactors = FALSE) colnames(platform) <- c('description') pander(platform) packages <- as.data.frame(diagnostics$packages) pander(packages[ packages$`*` == '*', ]) ``` Alternatively ```{r sessionInfo} xfun::session_info() ``` ## Making default RMarkdown document on your own Altering the default Rmarkdown file each time you write a homework, report, or article would be a pain. - Fortunately, you don't have to! ## Templates You can create your own templates which set-up packages, fonts, default chunk options, etc. - https://bookdown.org/yihui/rmarkdown/document-templates.html - Some packages (e.g `rticles`) provide templates that meet journal requirements or provide other. - Journal of Statistical Software - The R Journal - Association for Computing Machinery - ACS publications (Journal of the American Chemical Society, Environmental Science & Technology) - Elsevier publications .small[https://github.com/mdozmorov/MDtemplate] ## Parameters You may also set parameters in your document's YAML header ``` yaml --- output: html_document params: date: "2017-11-02" --- ``` or pass new values with the `render` function. - This creates a read-only list `params` containing the values declared - e.g. `params$date` returns `2017-11-02` --> --- ## Quarto Quarto is an open-source scientific and technical publishing system built on Pandoc. - Python, R, Julia, and Observable support. - Author documents as Markdown or Jupyter notebooks. - Articles, reports, presentations, websites, blogs, and books in HTML, PDF, MS Word, ePub, and more. - Supports scientific markdown, including equations, citations, crossrefs, figure panels, callouts, advanced layout, and more. - Improved figure/table cross-referencing, labeling. https://quarto.org/docs/get-started/ [Quarto - J.J. Allaire](https://youtu.be/9jGc0TxoRco) 1h video