R Markdown Basics

Andy Lin, IDRE Statistical Consulting


How to use this seminar

This seminar aims to teach the user basic R Markdown syntax to make beautiful, reproducible reports.

First we will discuss what R Markdown is, how it is used, and how it works. The rest of the seminar focuses on R Markdown sytnax, specifically:

This seminar does not attempt to explain all of the R code used in the example reports.

Text that appears with this typeface and background is usually code syntax you can use when authoring your R Markdown files. Buttons and menus in RStudio will also appear formatted this way.

Text that appears blockquoted like this is a set of instructions to alter an R Markdown file. Click the Knit button after finishing all instructions within a block to view the results of your modifications.

Go ahead and press the ‘k’ key to disable advancing with mouse click. This will make it easier to copy-and-paste code.

What is R Markdown?

“An authoring framework for data science” – R Markdown creators

R Markdown allows us to create reproducible documents that weave narrative text together with R code and the output it produces when executed.

For example, here is an R code block inserted into the R Markdown file that generates this slide show. Underneath the code is its output:

        col=c("#4d4d4d", "#bf812d", "#f4a582", "#f6e8c3"),
        legend.text=TRUE, xlab="Eye Color", 
        args.legend=list(title="Hair Color"))

These documents are dynamically generated – whenever we need to change code or data, we can simply update the R Markdown file, compile it, and the output will be automatically updated in the resulting docuemnt.

These documents can then be shared with an audience to provide the most up-to-date content.

Installing R Markdown and working in RStudio

We highly recommend working with R Markdown in RStudio, which has many features that facilitate R Markdown file editing, including:

Once R and RStudio are installed, you can install R Markdown with install.packages("rmarkdown") as usual.

Starting our first markdown file

We can open a new Markdown file template through the File menu in RStudio.

A. Choose File -> New File -> R Markdown...
B. Fill the Title field and Author fields with “Practice” and your name, respectively.
C. In the left menu, select Document, and for Default Output Format select option HTML (these are the defaults).
D. Click OK

R Markdown files typically use the extension .Rmd or .rmd

A file initiated through this method will have a skeleton of the elements of an R Markdown file:

Note: If you have not installed package rmarkdown and try to open a .rmd file through the File menu, RStudio may ask you to install rmarkdown immediately.

Elements of an R Markdown file - YAML header

At the top of our newly intiated R Markdown file, enclosed in --- tags, we see the first of the essential elements of an R Markdown file, the YAML header.

YAML stands for “YAML Ain’t Markup Language” or “Yet Another Markup Language”, and is a human-readable language, which we use here to communicate with Pandoc.

Pandoc converts between document formats and controls their overall appearance. Pandoc is installed with RStudio.

The YAML header is also used to control:

The YAML header may also contain the document’s metadata, information about the the R Markdown file itself, such as title:, author:, and date:.

Elements of an R Markdown file - Markdown

Within the body of the document we find some examples of text with special characters that have been highlighted blue, including the following:

## and ** are Markdown tags which format the text enclosed within them.

Markdown is a markup language, a system of code shortcuts to annotate and format plain text – once the .rmd file is compiled (rendered), the text will be formatted.

Elements of an R Markdown file - R code chunks

The final element of R Markdown files are the R code chunks, highlighted with gray backgrounds and enclosed within ```{r } and ```.

The R code chunks are actually processed by the package knitr, which is installed with rmarkdown.

When the R Markdown file is compiled and rendered, the output of the code chunk will be embedded in the document underneath the code.

rmarkdown (via knitr) provides a large array of options to control the appearance of both the R code and its output.

Compiling and rendering an R Markdown file

Once we are pleased with its contents, we can compile the R Markdown file and render it into its final output format in two ways:

The output document will be rendered and saved in the same directory as where the .rmd file is locakted.

RStudio will also provide a preview of the output document.

Clicking on the Knit button simply calls render() on the current .rmd file.

Progress and messages produced by rendering the .rmd file will be displayed in the R Markdown console, which appears when you render.

Click the Knit button. Name the file whatever you want, but make sure to use the .rmd extension. Observe how elements of the .rmd file appear in the output.

How it all works

R Markdown is the unification of 3 frameworks:

When we render the document, the following happens:

First, knitr converts all of the R code chunks, code and output, into text and Markdown tags, resulting in a Markdown file (.md) of just text and Markdown. Images are saved to files and are included/embedded in the output via links.

Then, pandoc converts the .md file into the desired final output format, such as an HTML web page, a LaTex pdf document, or an ioslides slide-show presentation, etc.

Then, pandoc converts the .md file into the desired final output format, such as an HTML web page, a LaTex pdf document, or an ioslides slide-show presentation, etc.

Then, pandoc converts the .md file into the desired final output format, such as an HTML web page, a LaTex pdf document, or an ioslides slide-show presentation, etc.

Learning R Markdown

Thus, to use R Markdown proficiently, we must learn a bit about each of the 3 coding frameworks, Markdown, knitr and YAML (pandoc).

Fortunately, the creators have done a lot of the work for you by setting many options to have smart defaults (like large font size for slide shows, smaller font sizes for .pdf documents).

Furthermore, none of the coding is particuarly difficult, but very fine control over documents will require that you learn more advanced coding (such as HTML or LaTeX).

R Markdown keeps it simple for the user by requiring a single format, the .rmd file, to produce this wide variety of output.


First, what is a markup language and what is HTML?

A markup language is a system of tags (code) used to format documents. Tags are used to define sections, change the appearance of text, build tables, link images, and so on.

Hypertext Markup Language (HTML) is a markup language designed to be used for web pages. The placement and appearance of text, images, frames, tables, and other elements can all be specified in HTML.

HTML tags generally include both an opening and closing tag, and have this form <tag> and </tag>.

For example, enclosing text in <em> tags results in italicized text.

<em>hello</em> becomes hello when viewing the document in a web browser.

What is Markdown?

Markdown is a lightweight markup language, lightweight in the sense that its tags are simple and easy to type.

Markdown was originally designed to be a shorthand for HTML. For example we just learned to italicize text in HTML, you enclose it in <em> </em>. In Markdown, you just enclose it in * *.

However, with Pandoc involved, we can convert Markdown to many different output formats.

R Markdown uses Pandoc’s version of Markdown, which differs a bit from standard Markdown.

Because of its simplicity, Markdown is very easy to use, so let’s dive in!

In your currently open .rmd file, erase all content except for the YAML header.

Spacing and paragraphs

Newlines (carriage returns) are considered spaces in Markdown.

First, try adding the following text broken by a single new-line (and no additional spaces) after the first period:


In order to begin a new paragraph, insert a blank line (i.e. 2 newlines) before beginning the new paragraph. This will double-space paragraphs.

Now, try retyping the text with 2 newlines after the first period:



Or, you can add 2 spaces before the new-line to single-space the paragraphs.

Finally, try retyping the text with 2 spaces and a single newline after the period:



Section headers are often displayed in larger, bolder fonts.

To treat text as a header, place one to six # tags at the beginning of the header text. The number of # signs indicate the level of the header (higher levels will be displayed in larger fonts).

Headers should be preceded by a blank line, and there should be a space between the # tags and the header text.

Add a level 1 header called “Big Header” and a level 3 header called “Small Header”

Bold, indent, underline, strikethrough

Markdown provides simple tags to format text for emphasis as well as super- and subscripting:

Do not insert spaces in between formatting tags and the text.

Any character preceded by a backslash will be treated as a literal character and not as a code tag:

\*italics\* produces *italics*

Try recreating this formatted text with Markdown syntax:

We multiplied x by z2 to create the interaction variable x*z2

Bulleted Lists

Markdown bulleted lists are much easier to specify than HTML lists.

To create a list, precede each list item with * (or + or -) and a space:

* item1
* item2

To add sublists, indent 4 spaces:

* item1
    + sub1.1
    + sub1.2
* item2

Use numbers with periods as tags for numbered lists:

1. item1
2. item2
3. item3

Add a bulleted list with a sublist to your .rmd file.

External images

Embedding external images (not created by R code within the document) in a document uses syntax very similar to linking:


Note: Do not put the image path/name in quotes!

So, assuming there is a file named “densities.png” in the same directory as the .rmd file:

![Fig 1 Densities by diet](densities.png) produces

Fig 1 Densities by diet

Tex Math

Text enclosed by $ symbols will be treated as TeX Math, another set of markup tags used to format mathematical expressions.

For example, $mean(X) = \frac{\sum\_{i=1}^nX}{n}$ will be rendered as \(mean(X) = \frac{\sum_{i=1}^nX}{n}\)

A Tex expression enclosed by two $ sybmols on each side will be displayed as a equation, generally centered in the document. So, $$mean(X) = \frac{\sum_{i=1}^n X}{n}$$ produces \[mean(X) = \frac{\sum_{i=1}^n X}{n}\]

Although TeX Math is often associated with LaTeX documents, it can be used in any document type supported by RMarkdown.

R code chunks

R Markdown, knitr, and R code chunks

The R package knitr was conceived before R Markdown to weave text and R code output together into reports.

R code is placed in code chunks that can be interleaved with the text of the document. All of the code chunks execute sequentially in one session when the .rmd file is rendered, so objects created in one chunk are available to all subsequent chunks.

A large array of knitr options allows some control over the appearance of R code, text output and graphical output in the final document.

R Markdown builds upon knitr by allowing Markdown tags and using Pandoc to convert between document formats.

Please open the sleep_study.rmd file to practice using knitr code chunks. You may close the .rmd file we used to practice Markdown.

knitr R code chunk delimiters

R code chunks are delimited by ```{r chunk_label, options} at the beginning and ``` at the end. The chunk_label and options are indeed optional and are separated by commas (much more on this soon).

You can type out the tags, or use the shortcut Ctrl + Alt + I (Cmd + Option + I on Macs), or use the Insert button at the top right of the script editor (will appear if the file extension is .rmd).

Add an R code chunk using the keyboard shortcut or the Insert button to the sleep_study.rmd file after the text “Here are the data:”

Our first output

Data sets stored as data.frame objects in R can be printed to the document by simply specifying the object name.

Inside our new code chunk, add the following code:

Let’s look closely at the output:

We can control all of this!

Code chunk options

Much of the power of rmarkdown via knitr lies in its wide array of options to control the appearance of R code output.

See here for a full list of knitr chunk options.

To specify chunk options, after ```{r, specify a chunk label (name), and comma, and then a list of options separated by commas. This is known as the chunk header.

All of the chunk options must be specified on one line (no line breaks).

Avoid the use of characters beside alphabetic characters and -.

Change the first line of the R code chunk to ```{r, mydata, echo=FALSE}.

Here mydata is the chunk label, and echo=FALSE is an option. Notice the use of commas to separate.

Common chunk options to control text output

As we saw, echo=FALSE suppresses printing of the R code. By default, echo is set to TRUE, but often we do not want our audience to see the underlying R code.

Here are some options to control our output (default of option specified in parentheses):

Change echo=FALSE to eval=FALSE.

Change eval=FALSE to results='hold'.

Suppressing warnings and messages

Many R functions will display warnings and messages to convey important information to the user.

knitr will print warnings and messages to the document by default, but they may be distracting to the reader.

We can use the chunk options:

A. Insert a new code chunk after the text “First, log-transforming the outcome **extra** was suggested:”.
B. Inside the code chunk, specify these 2 lines of code:

sleep$logextra <- log(sleep$extra)

Notice that a warning was printed to the document.

Add the chunk label log-transform and the option warning=FALSE to this second chunk, separated by commas.

Notice now that the warnings are printed to the R Markdown console.

Be warned, you may not want to suppress warnings and messages until you are sure everything is working correctly.

Global chunk options

If you know that you will need to set an option for multiple or all chunks, you can set them globally with a call to knitr::opts_chunk$set() in the first code chunk of the .rmd file.

A. Insert a new code chunk before the “# Purpose” header.
B. Give the chunk the label “setup” in the header.
C. Specify this code inside the chunk: knitr::opts_chunk$set(echo=FALSE).

The global option above sets echo=FALSE for all chunks, thus suppressing all R code.

If you’d like to see the R code in your document, delete this chunk or reset the option to echo=TRUE.

Usually we don’t want to see this setup chunk in the report. Suppress its printing.

Formatted tables with knitr::kable()

The function kable() from the knitr package produces pretty, formatted tables produced by R code (rather than the default R output style).

The table input is usually a data.frame, a matrix, or a table and is the first argument to kable().

kable() inludes arguments to control the number of digits printed, column names, column alignment, table caption, and other formatting options. See ?knitr::kable for details.

Look into the package kableextra to get many more formatting options for kable tables. See here for examples.

In the first code chunk, replace the first line of code sleep with knitr::kable(sleep, align='c').

Inline R code

We can also insert R code directly into text, which will be replaced by its output when rendered.

Enclose the inline R code with `r and `.

Inline R code itself will not be printed to the document.

Use Markdown tags to format the ouptut.

A. Replace the text “XXX” at the very end of the .Rmd file with the inline R code `r mean(sleep$extra)`.
B. Use Markdown tags ** to bold the result.

R code chunks and Figures

tidyverse package for this section

For this section of the seminar, we will be using the package tidyverse, a diverse collection of packages with many tools for data analysis. Specifically we will be using the following packages within tidyverse:

Please make sure you have tidyverse installed. You can check by issuing library(tidyverse) in the current environment. If it errors, please run install.packages("tidyverse") now.

Please open the mileage.rmd file to practice syntax for controlling R graphics.

rmarkdown, knitr and R graphics

knitr, and thus rmarkdown make including and formatting graphics in the documents quite easy

Graphics produced by R code are placed immediately after the generating code chunk.

Knit the mileage.rmd file and observe the placement of the R code and graphs.

Arranging multiple figures produced by the same code chunk

Notice that the three plots produced by the final code chunk of the mileage.rmd file are interleaved with the individual ggplot() commands that produced them.

The knitr chunk option fig.show determines how to place multiple plots:

Add the chunk option fig.show='hold' to the fourth chunk, mileage-graphs, of mileage.rmd. Leave this option at 'hold' for the remainder of the seminar.

Because the figures are large, knitr places them one after another.

Sizing and aligning figures

We can easily adjust the size of figures using the knitr chunk options:

If only one of fig.width or fig.height is specified, the other is not adjusted, unless fig.asp is also specified (it is NULL by default)

Add the chunk option fig.width=3 to the code chunk mileage-graphs of mileage.rmd and observe what happens to the size and positioning of the graphs.

Now add the chunk option fig.asp=1 to this same chunk. Don’t forget commas!

How could you change the size of all figures in the document to this size?

We also have an option to adjust the alignment of figures in the document:

fig.align:('default') 'default' is no alignment adjustment, and other possible values are 'left', 'center', and 'right'.

Place the option fig.align='center' inside of knitr::opts_chunk$set() in the very first chunk.

Figure captions

Though no captions are shown by default, knitr makes adding a figure caption easy with this option:

Add the option fig.cap='Fig 1' to the chunk sample (following the header ## The sample of cars)

Now add the option fig.cap='Fig 2' to the chunk mileage-graphs (following the header ## Mileage graphs) and observe an interesting result

Saving figures (Optional)

By default, knitr will embed the images into the final document as base64 strings, creating a single file with all content including images (rather than saving the images externally and linking them into the docuemnt).

If you would also like to save the R-produced images to external files, use:

R Notebooks

Where are my R objects?

You might have noticed that each time we render our document with the Knit button, none of the R objects appear in the current R session.

Using the Knit button actually starts a new (empty) R session to render the document, where all the R code is executed and is then closed after rendering.

Rendering in a new session ensures that the document is reproducible (for instance on someone else’s computer), as it prevents any dependencies on objects in the current R session.

If you want to render in the current R session instead of a new session, use rmarkdown::render() on the .rmd file directly instead of the Knit button.

However, R Notebooks provide another means to have your objects at your fingertips while you work on your .rmd file.

R Notebooks

When editing an R Markdown document within RStudio, by default it will be edited as an R Notebook.

R Notebooks allow the user to execute each R code chunk interactively, which places the output immediately below the code chunk itself in the .rmd document.

R Notebooks are R Markdown files in every sense – they just provide an interactive mode for document editing.

You’ll know that RStudio is treating your .rmd file as an R Notebook if you see these buttons at the top right of each R code block:

Check for the buttons at the top right of your code chunks in the mileage.rmd file

Click the middle button (gray triangle and green bar) in the code chunk mileage-graphs of mileage.rmd.

Click the right button (green triangle) in the code chunk mileage-graphs of mileage.rmd.

YAML header

Purpose of the YAML header

In the YAML header we specify pandoc options that control the overall appearance of the output document.

Remember, pandoc converts the Markdown .md file into one of many output document formats (HTML, LaTeX pdf, Word doc, etc.), and thus the output document format itself is also specified in the YAML header

Which options are available depends on the output document format.

See the rmarkdown cheatsheet or reference guide to see a table of options by output format (click on the Help menu in RStudio, then click Cheatsheats).

Specifying a YAML header

The YAML header is located at the top of the .rmd file and is enclosed in 2 sets of 3 dashes, ---

If you open a new .rmd file template in RStudio, you will see option title: and option output:, which is set to html_document by default. If you specify an Author when you open the .rmd file, RStudio will also supply populated fields for author: and Date:.

The YAML header is actually optional, and if omitted completely, an HTML document will be produced.

We will cover many more YAML (pandoc) options as we discuss specific output formats

YAML Indentation

Indentation is important when specifying suboptions (options for options) in YAML headers.

For example, let’s imagine we wanted to add a table of contents that floats on the left of the document created by mileage.rmd. Notice the indentation and newlines in this specifcation:

title: "Mileage of American Cars"
    toc: TRUE
    toc_float: TRUE

Replace the YAML header of mileage.rmd with the header above. Make sure the indentation is copied faithfully.

Parameters in the YAML heading

R Markdown allows specification of parameters in the YAML heading that can be passed to R code anywhere in the document.

Parameters provide an easy mechanism to generate different customized reports depending on the inputs. For example, we can reproduce the document for each of many subgroups.

To declare parameters, include the params: field in the YAML header, and underneath add one parameter per line, each specified as param_name: value, and each indented by 2 spaces.

To access the parameter value in R code, use params$param_name, where param_name is the name of the parameter specified in the YAML header.

A. Change the eval option in the final R code chunk, subgroup-plot, to eval=TRUE.
B. Add the following parameter specification to the YAML header:
  manufacturer: dodge

The first line is not indented, but the second line is indented 2 spaces.

Now try changing dodge to chevrolet, ford, jeep, lincoln, mercury, or pontiac and re-render the document

Notice how the parameter is accessed in the final chunk with params$manufacturer

Tour of Output Formats

Output Formats

R Markdown’s ability to produce a wide variety of document types using a single, unified coding framework is one of its biggest strengths.

The same .rmd file can produce an HTML document, a LaTeX .pdf document, a Word document, an ioslides slide show presentation, etc.

See here for a full list of available output formats. Formats are either documents or presentations.

Each output format has its own set of options that we can specify in the YAML header to control the document’s appearance.

For this section, please open the txhousing_sales.rmd file.

HTML documents

Markdown’s original purpose was to simplify HTML coding, so HTML documents naturally have the widest array of options available in rmarkdown.

We have already seen the use of toc: TRUE to add a table of contents.

Some more useful suboptions for HTML documents are:

Try adding a theme: suboption to txhousing_sales.rmd when the output format is html_document. Try a few of the theme_name specifications above.

Now add a highlight: suboption, using any of the style specifications above. (Make sure echo=TRUE in knitr::opts_chunk$set() in the first chunk to observe the results.)

Finally, add code_folding: hide, and try toggling the Code buttons throughout, and the master Code button at the top of the document.

A full list of options for HTML documents can be found at the R Markdown Definitive Guide

Raw HTML and CSS

If you want to have finer control over the appearance of your HTML documents (including some of the presentation formats we will discuss later), you will probably need to learn some HTML.

You can insert HTML directly into the document and in most cases it will render as expected. For example, <font color="red">ERROR:</font> produces ERROR:.

CSS (Cascading Style Sheets) is a language used to style markup languages like HTML. CSS code blocks that control the appearance of the document globally are often defined at the top of documents inside HTML <style> </style> tags.

Try adding the following CSS within HTML <style> tags to txhousing_sales.rmd immediately after the YAML header:
body {
background-color: AliceBlue;
font-family: Garamond, serif;
font-size: 20px;

Remove this style section when you feel you understand how it functions.

W3Schools is an excellent, free online resource for beginners to learn HTML and CSS.


R Markdown supports several slide-show-style presentation output formats.

The HTML slideshows are opened and viewed in a browser just like any other HTML file, while a beamer_presentation is viewed in a PDF viewer (e.g. Adobe Acrobat, more on PDF files later).

HTML slideshows can be styled with raw HTML and CSS, while Beamer presentations can be styled with LaTeX.

We will not be covering Beamer presentations in this seminar.

Slideshows often look and behave better in the actual output file than in the RStudio previewer.

Markdown in slideshow presentations

Slideshows use section headers as indicators of new slides, where the header then becomes the header of the slide. For example ## Purpose will initiate a new slide with the header “Purpose”.

First-level section headers (i.e. # Header) will become title slides, and should not have any accompanying text underneath.

    Seriously, including anything in a title slide besides the header itself can mangle all of the subsequent slides.

Second-level section headers (i.e. ## Header) will intiate new slides and may have additional content underneath.

You can also start a new slide without a header at any point by entering the Markdown tag ---, with a blank line before and after (this tag produces a horizontal line in non-presentations).

To have bulleted items appear on click (when advancing the slide) use >- instead of *.

ioslides presentation

By default, ioslides presentations are slick-looking slides set against a dark background.

knitr::kable() tables have a particularly nice presentation in ioslides.

Change the YAML header in txhousing_sales.rmd to this:
title: "Texas housing sales, 2000-2015"
    smaller: true
    widescreen: true
  spotlight: "Houston"

I find fitting content on to ioslides compact slides with default settings challenging, so I usually add these two options specific to ioslides_presentation:

Add --- before ### q-q plot of residuals and ### Influence plots towards the end of txhousing_sales.rmd to initiate new slides. Make sure to add a blank line before and after ---.

Viewing ioslides presentations

While viewing an ioslides presentation, the following keys will alter the display:

Slidy presentations

Slidy presentations by default have simple styling, but are highly customizable.

One huge advantage of slidy presentations is that the vertical size of slides is unlimited, as you can scroll down slides.

This seminar is a slidy_presentation.

Change the YAML header in txhousing_sales.rmd to this:
title: "Texas housing sales, 2000-2015"
    font_adjustment: -1
    footer: Created in R
  spotlight: "Houston"

A couple of options unique to slidy_presentations:

LaTeX PDF documents

Specifying output: pdf_document in the YAML header will produce a .pdf file formatted with LaTeX. Of course, there are suboptions available for pdf_documents.

Replace the entire YAML header in txhousing_sales.rmd with this header
title: "Texas housing sales, 2000-2015"
  spotlight: "Houston"

The RStudio previewer for PDF documents is separate from the previewer for HTML documents.

Any raw HTML code in a R Markdown file that is destined for a pdf_document will be ignored. Similarly, any LaTeX in a file destined for html_document or one of the HTML presentation formats will similarly be ignored.

Technical Note: Outputting a pdf_document requires that you have some distribution of TeX installed on your computer (e.g MiKTeX or TeX Live). You can install a small version of TeX on your computer directly through R by first running install.packages(tinytex) and then tinytex::install_tinytex(), which will install enough TeX enough to output a R Markdown pdf_document.

What is LaTeX?

LaTeX (pronounced LAY-tech) is another document markup language, allowing the user to use tags to format plain text with very fine control. Compiling a LaTeX file into a readable PDF document requires that a TeX distribtuion (e.g. MikTeX) be installed as well.

LaTeX is often used to produce scientific documents, as it is particularly well suited to produce beautiful mathematical equations.

LaTeX tags begin with the forward slash, and usually have the syntax: tag{value}{text}, where tag is the name of the markup tag, value is its assigned value, and text is the text to which the formatting will be applied.

Replace the heading towards the top, # Background, with # \color{red}{Background}

Overleaf is a good place for new users of LaTeX to learn.

Useful options for LaTeX

Some of the options for HTML documents that we have seen are also available for LaTeX PDF documents:

Another useful option for novice LaTeX users is to switch to the xelatex engine, with this option:

Somewhat confusingly, the mainfont option is actually a top-level option, passed directly to pandoc, so should not be indented.

Pandoc has many other top-level options (i.e. not indented) for LaTeX documents that can be specified in the YAML header of an R Markdown file. See here for more of these top-level options.

Set the YAML header to this specification (change Georgia to another font if it is not installed on