# Enhance ETL pipeline monitoring with text plots

I often use data ETL pipeline scripts cron-jobed on text-only interfaces (be it Jenkins, or cron-jobed in shell). While I print descriptive stats to keep tabs on ETL runs, I’ve found {txtplot} adds a higher level of fidelity in my logfiles. Bjoern Bornkamp’s package offers a simple and effective way to enhance my logfiles.

Will I use them to visualize my data? No. Are they useful tokeep an eye on your pipelines and quickly diagnose issues? Yes, I’ve been able to diagnose more than a few data quality spills very quickly because I had these rudimentary plots in my logfiles.

Some examples…

### What’s the distribution of a variable?

dat <- dplyr::starwars %>%
mutate(across(where(is.character), forcats::fct_infreq))

dat %>%
tidyr::drop_na(height) %>%
pull(height) %>%
txtdensity(., xlab = "starwars: height distribution", width = 70, height = 12, pch = "o")
##       +-----------+--------------+-------oo----+-------------+-------+
##  0.02 +                                oooooo                        +
##       |                               oo    oo                       |
## 0.015 +                              oo      o                       +
##       |                             oo        o                      |
##  0.01 +                             o         oo                     +
##       |                            o           oo                    |
## 0.005 +         ooo              oo              ooo                 +
##       |  oooooooo oooooooooooooooo                 ooooooooo   ooo   |
##     0 +-----------+--------------+-------------+-----------ooooo-----+
##                  100            150           200           250
##                        starwars: height distribution

### Time series plots

txtplot(y = LakeHuron, x = 1875:1972, width = 70, height = 12, xlab = "year", ylab = "level")
##   582 +--*--+----------+-----------+----------+-----------+----------+
##       |      ****                                                    |
##   581 +  *****  *                       *            **              +
## l 580 +    *     ** *     ***    **                  * *       ***   +
## e 579 +           **  ****   * * ***     *      ****   *  *     *    +
## v     |             ** *     **    **   *       *       *  *  **     |
## e 578 +                       * *    * *     **     *   *  *  *      +
## l 577 +                               *  ** ** *         ** **       +
##       |                                    *                         |
##   576 +-----+----------+-----------+----------+-----------+-*--------+
##           1880       1900        1920       1940        1960
##                                    year

ACF plots, though I’ve not used these in production yet.

txtacf(sunspot.year, width = 70, height = 12)
##   1 +--*--------------+-------------+--------------+--------------+--+
##     |  *                                                             |
##     |  *  *                                                          |
##     |  *  *                         *  *                             |
## 0.5 +  *  *  *                   *  *  *  *                          +
##     |  *  *  *                   *  *  *  *                       *  |
##     |  *  *  *                 * *  *  *  *  *                    *  |
##   0 +  *  *  *  *  *  *  *  *  * *  *  *  *  *  *  *  *  *  *  *  *  +
##     |              *  *  *  *                   *  *  *  *  *        |
##     |              *  *  *                         *  *  *           |
##     +--+--------------*-------------+--------------+--------------+--+
##        0              5            10             15             20