Generating Latex Tables in R

programming
Published

February 11, 2023

Introduction

Preparing Latex tables from results of a simulation study or other program can be tedious and time-consuming when done by hand. Especially when the process must be repeated many tables as results are updated to correct mistakes or to investigate unexpected findings. Hours spent manually adjusting formatting of table entries and typing ampersands between them could be spent in better ways. Fortunately, there are some excellent tools in R to help. Taking time to learn the tools and automate your table generation may be worth the investment. We will give a brief example in this post. This is only based on my experience; there are likely even better tools and methods that I have yet to learn.

Objective

Our objective will be to generate a Latex table for the first six rows of the airquality dataset.

R> head(airquality, n = 6)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6

We would also like to meet the following criteria.

  1. Criteria for formatting of the rendered Latex table.
    1. Use booktabs and make all separators horizontal lines.
    2. Use multiline columns to render column names.
    3. Add several headers which group together columns in the display.
    4. Specify a caption and a label which can be used in Latex references.
    5. Request the table to be placed “here”.
  2. Criteria for table content.
    1. Include row labels (observation number) in the table as a column.
    2. Format numbers in a purposeful way.
    3. Display the Month and Day columns together as a date.
    4. Customize column names to be more descriptive.
  3. Ensure that the generated Latex code is legible.
    1. Pad strings in a given column so that they are nicely aligned in the code.

To do this, we will make use of the following tools.

  1. The Tidyverse framework to manipulate tables.
  2. The knitr package for reproducible research in R. In particular, we will use the kable function. to generate Latex from data frames.
  3. The kableExtra package to handle some additional Latex work.
  4. The built-in sprintf function to format table entries.

Preparing the Data Frame

First let us load the packages we will use.

R> library(knitr)
R> library(dplyr)
R> library(tibble)
R> library(stringr)
R> library(kableExtra)

Before converting to Latex, let us format the entries and column names as we would like them to appear.

tbl = head(airquality, n = 6) %>%
    rownames_to_column(var = "Num") %>%
    mutate(Date = sprintf("%04d-%02d-%02d", 1973, Month, Day)) %>%
    mutate(Solar.R = sprintf("%0.2e", Solar.R)) %>%
    mutate(TempC = sprintf("%0.2f", 5/9 * (Temp - 32))) %>%
    select(Num, Date, Ozone, Solar.R, Wind, TempC)

We have done the following.

  1. Assemble Month and Day into Date, where 1973 is the year which all observations were taken (according to the manual page for the dataset).
  2. Convert Solar.R to a string with the original value in scientific notation.
  3. Convert temperature Temp from Farenheit to Celcius and call the result TempC.
  4. Use tibble::rownames_to_column to include row labels in the table as the Obs column.

This produces the following table.

R> print(tbl)
  Num       Date Ozone  Solar.R Wind TempC
1   1 1973-05-01    41 1.90e+02  7.4 19.44
2   2 1973-05-02    36 1.18e+02  8.0 22.22
3   3 1973-05-03    12 1.49e+02 12.6 23.33
4   4 1973-05-04    18 3.13e+02 11.5 16.67
5   5 1973-05-05    NA       NA 14.3 13.33
6   6 1973-05-06    28       NA 14.9 18.89

Generating Latex

We can now generate code to display out formatted table as Latex code.

out = tbl %>%
    mutate(Solar.R = str_pad(Solar.R, width = 10, side = "left", pad = "\u00A0")) %>%
    mutate(Wind = str_pad(Wind, width = 6, side = "left", pad = "\u00A0")) %>%
    kable(format = "latex", booktabs = TRUE, linesep = "", align = c("rlrrrr"),
        caption = "My formatted airquality table.", label = "airquality",
        col.names = c("Number", "Date", "Ozone (PPB)", "Radiation (Ly)",
            "Wind (MPH)", "Temp (C)")) %>%
    kable_styling(latex_options = c("hold_position")) %>%
    add_header_above(c("Observation" = 2, "Solar" = 2, "Weather" = 2))

We have done the following.

  1. Left-pad the Solar.R and Wind fields with the unicode “nbsp” character, which will render as a space in our resulting code rather than be ignored, as a regular space would.

  2. Use kable to generate a Latex table from our data frame. We have specified options such as cell alignments, the caption, and the label, which should be familiar to Latex users. We have also specified descriptive column names here.

  3. The option hold_position specifies the option !h for how Latex should place the table.

  4. The function add_header_above specifies a layer of headers above the column names. These respectively have text “Observation”, “Solar” and “Weather”, and are each two columns wide.

Printing the result yields the following Latex code.

R> print(out)
\begin{table}[!h]
\centering
\caption{\label{tab:airquality}My formatted airquality table.}
\centering
\begin{tabular}[t]{rlrrrr}
\toprule
\multicolumn{2}{c}{Observation} & \multicolumn{2}{c}{Solar} & \multicolumn{2}{c}{Weather} \\
\cmidrule(l{3pt}r{3pt}){1-2} \cmidrule(l{3pt}r{3pt}){3-4} \cmidrule(l{3pt}r{3pt}){5-6}
Number & Date & Ozone (PPB) & Radiation (Ly) & Wind (MPH) & Temp (C)\\
\midrule
1 & 1973-05-01 & 41 &   1.90e+02 &    7.4 & 19.44\\
2 & 1973-05-02 & 36 &   1.18e+02 &      8 & 22.22\\
3 & 1973-05-03 & 12 &   1.49e+02 &   12.6 & 23.33\\
4 & 1973-05-04 & 18 &   3.13e+02 &   11.5 & 16.67\\
5 & 1973-05-05 & NA &         NA &   14.3 & 13.33\\
6 & 1973-05-06 & 28 &         NA &   14.9 & 18.89\\
\bottomrule
\end{tabular}
\end{table}

We now have a Latex table environment - which is fairly human-readable - that can be copy/pasted into a Latex document, or even included programmatically using Sweave.