Enhance ETL pipeline monitoring with text plots

I often use data ETL pipeline scripts cron-jobed on text-only interfaces (be it Jenkins, or cron-jobed in shell). While I print descriptive stats to keep tabs on ETL runs, I’ve found {txtplot} adds a higher level of fidelity in my logfiles. Bjoern Bornkamp’s package offers a simple and effective way to enhance my logfiles.

Will I use them to visualize my data? No. Are they useful tokeep an eye on your pipelines and quickly diagnose issues? Yes, I’ve been able to diagnose more than a few data quality spills very quickly because I had these rudimentary plots in my logfiles.

Some examples…

What’s the distribution of a variable?

dat <- dplyr::starwars %>% 
  mutate(across(where(is.character), forcats::fct_infreq))

dat %>%
  tidyr::drop_na(height) %>% 
  pull(height) %>% 
  txtdensity(., xlab = "starwars: height distribution", width = 70, height = 12, pch = "o")
##       +-----------+--------------+-------oo----+-------------+-------+
##  0.02 +                                oooooo                        +
##       |                               oo    oo                       |
## 0.015 +                              oo      o                       +
##       |                             oo        o                      |
##  0.01 +                             o         oo                     +
##       |                            o           oo                    |
## 0.005 +         ooo              oo              ooo                 +
##       |  oooooooo oooooooooooooooo                 ooooooooo   ooo   |
##     0 +-----------+--------------+-------------+-----------ooooo-----+
##                  100            150           200           250       
##                        starwars: height distribution

Counts of a factor?

txtbarchart(dat$sex, width = 70, height = 12, pch = "x")
##    +--x--------------+--------------+--------------+--------------+--+
## 60 +  x                                                              +
##    |  x                                                              |
##    |  x                                                              |
## 40 +  x                                                              +
##    |  x                                                              |
## 20 +  x              x                                               +
##    |  x              x                                               |
##    |  x              x              x              x              x  |
##  0 +--x--------------x--------------x--------------x--------------x--+
##       1              2              3              4              5   
## Legend: 1=male, 2=female, 3=none, 4=hermaphroditic, 5=NA's

Boxplots

txtboxplot(dat$height, dat$mass, width = 70, height = 12)
##  0            50            100           150           200           
##  |-------------+-------------+-------------+-------------+-----------|
##                                                 +---+--+              
## 1                                       --------|   |  |---------     
##                                                 +---+--+              
##                  +-----+-+                                            
## 2     -----------|     | |----------                                  
##                  +-----+-+                                            
##      +                                                                
## 3    |                                                                
##      +                                                                
## Legend: 1=dat$height, 2=dat$mass, 3=12

Time series plots

txtplot(y = LakeHuron, x = 1875:1972, width = 70, height = 12, xlab = "year", ylab = "level")
##   582 +--*--+----------+-----------+----------+-----------+----------+
##       |      ****                                                    |
##   581 +  *****  *                       *            **              +
## l 580 +    *     ** *     ***    **                  * *       ***   +
## e 579 +           **  ****   * * ***     *      ****   *  *     *    +
## v     |             ** *     **    **   *       *       *  *  **     |
## e 578 +                       * *    * *     **     *   *  *  *      +
## l 577 +                               *  ** ** *         ** **       +
##       |                                    *                         |
##   576 +-----+----------+-----------+----------+-----------+-*--------+
##           1880       1900        1920       1940        1960          
##                                    year

ACF plots, though I’ve not used these in production yet.

txtacf(sunspot.year, width = 70, height = 12)
##   1 +--*--------------+-------------+--------------+--------------+--+
##     |  *                                                             |
##     |  *  *                                                          |
##     |  *  *                         *  *                             |
## 0.5 +  *  *  *                   *  *  *  *                          +
##     |  *  *  *                   *  *  *  *                       *  |
##     |  *  *  *                 * *  *  *  *  *                    *  |
##   0 +  *  *  *  *  *  *  *  *  * *  *  *  *  *  *  *  *  *  *  *  *  +
##     |              *  *  *  *                   *  *  *  *  *        |
##     |              *  *  *                         *  *  *           |
##     +--+--------------*-------------+--------------+--------------+--+
##        0              5            10             15             20