Making sexy tables using R with gt package

R

Rida Azmi

Guest
Hello,
We all have this problem that sometimes we want to generate nice tables in a report within a study. By the way, one of the qualities of a good data scientist is the ability to do dataviz, whether it’s for charts, maps or even sexy illustrative tables.
To do this, we have the super #R with these packages. In this blog we will present gt package !!
The gt package is the most recent attempt to improve the accessibility, modifiability, and reproducibility of tables in R. It aspires to be the ggplot2 for tables (see the twitt below of Hadley Wicham), and considering the magnitude of the former’s effect on R visualization, the package bears a lot of weight.
The goal of the gt package is to make creating nice-looking display tables as straightforward as possible. Do you have any display tables? Yes, we’re attempting to differentiate between data tables (such as tibbles, data.frames, and so on) and tables seen on a web page, in a journal paper, or in a magazine. These tables are often known as presentation tables, summary tables, or simple tables.

gt = grammar of tables. It aims to do for tables what ggplot2 did for graphics. It’s still early days and tables are surprisingly complicated, but this is a very exciting package by a skilled developer! #rstats https://t.co/138FrCy5th

— Hadley Wickham (@hadleywickham) April 8, 2020

Here’s an illustration of the architecture of the gt package from GT Documentation


The Grammar Of Tables Layout. Source — GT Documentation, R Studio

Data presentation :
The data in this article comes from the site https://ourworldindata.org/carbon-footprint-flying
gt integrates well into the existing [B]tidyverse[/B], and creating a gt table is as simple as two lines of code.
in this case, we will show the worst emitters using the “arrange” function with descending parameter for “Per.capita.domestic.aviation.CO2” attribute in code bellow.

كود:
#Load data and arrange in descending order of emissions
emissions_data <- read.csv("F:/WORDPRESS/Gt_Package/Data/per-capita-co2-domestic-aviation.csv") %>%
arrange(desc(Per.capita.domestic.aviation.CO2))


Yes, there’s a table, but it’s not one you’re going to publish! To begin the process of table perfection, we must clean up the data and add a title and data source. The reader must be able to comprehend what is going on. While most of the cleanup might be done with” dplyr” before converting to a table, I’ll show how this can be done entirely inside gt.

كود:
#Generate a gt table from head of data
head(emissions_data) %>% 
  gt()

Table showing the worst emitter of CO² using data from https://ourworldindata.org/carbon-footprint-flying

We will try to clean up by eliminating the unwanted columns, then, rename the remaining columns. In the rest of the code, we will start the formatting of the table by adding first a title, and then a footnote to indicate for example the sources of these data. This is the desirable formatting for a scientific article or a professional report.
Notice the `md` function in code below allows us to write the title using markdown syntax (which allows HTML).

كود:
emissions_table <- head(emissions_data) %>% 
   gt() %>% 
   #Hide unwanted columns
   cols_hide(columns = vars(Code)) %>% 
   #Rename columns
   cols_label(Entity = "Country",
              Per.capita.domestic.aviation.CO2 = "Per capita emissions (tonnes)") %>% 
   #Add a table title
   #Notice the `md` function allows us to write the title using markdown syntax (which allows HTML)
   tab_header(title = md("Comparison of per capita CO<sub>2</sub> emissions from domestic aviation (2018)")) %>% 
   #Add a data source footnote
   tab_source_note(source_note = "Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]"))

We can make some transformations on the unit used to measure the emissions, in the raw data the unit is the ton. However, in the raw data the unit is tons, it’s technically correct to report emissions in tonnes I feel the data would be much more suitable in kilograms. For this, we’ll use fmt_number() as shown in the code below.

كود:
emissions_table <- emissions_table %>% 
   #Format numeric column. Use `scale_by` to divide by 1,000. (Note: we'll need to rename the column again)
   fmt_number(columns = vars(Per.capita.domestic.aviation.CO2),
              scale_by = 1000) %>%
   #Our second call to cols_label overwrites our first
   cols_label(Per.capita.domestic.aviation.CO2 = "Per capita emissions (kg)"))

The most emitting countries per kg

Now it’s time to add some colors and styles to make your table more attractive.
The next step is to update the cell styles to make it easier for the reader to discover the information they’re looking for. We’ll utilize the tab style() method for this. Some stylistic decisions made here are similar to those made in the past by Jon Schwabish.
To begin, we must make a clearer distinction between the column headings and the table’s content (and, while we’re at it, the title!). This may be accomplished by giving our table a fresh look.

كود:
(emissions_table <- emissions_table %>% 
   #Apply new style to all column headers
   tab_style(
     locations = cells_column_labels(columns = everything()),
     style     = list(
       #Give a thick border below
       cell_borders(sides = "bottom", weight = px(3)),
       #Make text bold
       cell_text(weight = "bold")
     )
   ) %>% 
   #Apply different style to the title
   tab_style(
     locations = cells_title(groups = "title"),
     style     = list(
       cell_text(weight = "bold", size = 24)
     )
   ))

New style of the table after using the function “tab_style”

The package allows making changes upon the type of study and the taste of the editor. In this article, we will limit ourselves to the basic style with documentation of the other features.
We may add a heatmap to our cells to more clearly display the differences if our reader is interested in comparing emissions numbers between nations. This will necessitate the creation of a color palette and the use of data color to perform conditional coloring ().
First, we will apply our palette explicitly across the full range of values so that the top countries are colored correctly.

كود:
min_CO2 <- min(emissions_data$Per.capita.domestic.aviation.CO2) #Apply min value here
max_CO2 <- max(emissions_data$Per.capita.domestic.aviation.CO2) #Apply maximum value here
emissions_palette <- col_numeric(c("#FEF0D9", "#990000"), domain = c(min_CO2, max_CO2), alpha = 0.75) #Palette color here

(emissions_table <- emissions_table %>% 
    data_color(columns = vars(Per.capita.domestic.aviation.CO2),
               colors = emissions_palette)) #Apply palette here !!

And voilà !!



By RIDA AZMI
Postdoctoral researcher specializing in GIS and spatial Remote Sensing. Passionate about data science and data processing using #R

متابعة القراءة...
 
أعلى