Enhance ETL pipeline monitoring with text plots

R
Quick visualizations in command line using {txtplot}
Author

Rahul

Published

July 19, 2021

I often use data ETL pipeline scripts cron-jobed on text-only interfaces (be it Jenkins, or cron-jobed in shell). While I print descriptive stats to keep tabs on ETL runs, I’ve found {txtplot} adds a higher level of fidelity in my logfiles. Bjoern Bornkamp’s package offers a simple and effective way to enhance my logfiles.

Will I use them to visualize my data? No. Are they useful tokeep an eye on your pipelines and quickly diagnose issues? Yes, I’ve been able to diagnose more than a few data quality spills very quickly because I had these rudimentary plots in my logfiles.

Some examples…

What’s the distribution of a variable?

dat <- dplyr::starwars %>% 
  mutate(across(where(is.character), forcats::fct_infreq))

dat %>%
  tidyr::drop_na(height) %>% 
  pull(height) %>% 
  txtdensity(., xlab = "starwars: height distribution", width = 70, height = 12, pch = "o")
      +-----------+--------------+-------oo----+-------------+-------+
 0.02 +                                oooooo                        +
      |                               oo    oo                       |
0.015 +                              oo      o                       +
      |                             oo        o                      |
 0.01 +                             o         oo                     +
      |                            o           oo                    |
0.005 +         ooo              oo              ooo                 +
      |  oooooooo oooooooooooooooo                 ooooooooo   ooo   |
    0 +-----------+--------------+-------------+-----------ooooo-----+
                 100            150           200           250       
                       starwars: height distribution                  

Counts of a factor?

txtbarchart(dat$sex, width = 70, height = 12, pch = "x")
   +--x--------------+--------------+--------------+--------------+--+
60 +  x                                                              +
   |  x                                                              |
   |  x                                                              |
40 +  x                                                              +
   |  x                                                              |
20 +  x              x                                               +
   |  x              x                                               |
   |  x              x              x              x              x  |
 0 +--x--------------x--------------x--------------x--------------x--+
      1              2              3              4              5   
Legend: 1=male, 2=female, 3=none, 4=hermaphroditic, 5=NA's

Boxplots

txtboxplot(dat$height, dat$mass, width = 70, height = 12)
 0            50            100           150           200           
 |-------------+-------------+-------------+-------------+-----------|
                                                +---+--+              
1                                       --------|   |  |---------     
                                                +---+--+              
                 +-----+-+                                            
2     -----------|     | |----------                                  
                 +-----+-+                                            
     +                                                                
3    |                                                                
     +                                                                
Legend: 1=dat$height, 2=dat$mass, 3=12

Time series plots

txtplot(y = LakeHuron, x = 1875:1972, width = 70, height = 12, xlab = "year", ylab = "level")
  582 +--*--+----------+-----------+----------+-----------+----------+
      |      ****                                                    |
  581 +  *****  *                       *            **              +
l 580 +    *     ** *     ***    **                  * *       ***   +
e 579 +           **  ****   * * ***     *      ****   *  *     *    +
v     |             ** *     **    **   *       *       *  *  **     |
e 578 +                       * *    * *     **     *   *  *  *      +
l 577 +                               *  ** ** *         ** **       +
      |                                    *                         |
  576 +-----+----------+-----------+----------+-----------+-*--------+
          1880       1900        1920       1940        1960          
                                   year                               

ACF plots, though I’ve not used these in production yet.

txtacf(sunspot.year, width = 70, height = 12)
  1 +--*--------------+-------------+--------------+--------------+--+
    |  *                                                             |
    |  *  *                                                          |
    |  *  *                         *  *                             |
0.5 +  *  *  *                   *  *  *  *                          +
    |  *  *  *                   *  *  *  *                       *  |
    |  *  *  *                 * *  *  *  *  *                    *  |
  0 +  *  *  *  *  *  *  *  *  * *  *  *  *  *  *  *  *  *  *  *  *  +
    |              *  *  *  *                   *  *  *  *  *        |
    |              *  *  *                         *  *  *           |
    +--+--------------*-------------+--------------+--------------+--+
       0              5            10             15             20   

Subscribe to my newsletter!