Using tryCatch for robust R scripts

Programming Practices
A quick introduction to tryCatch below, followed by three use-cases I use on a regular basis.
Author

Rahul

Published

December 20, 2018

Using tryCatch to write robust R code can be a bit confusing. I found the help file dry to read. There are some resources which explore tryCatch, linked below. Over the years, I have developed a few programming paradigms which I’ve repeatedly found useful. A quick introduction to tryCatch below, followed by three use-cases I use on a regular basis.

Syntax

tryCatch has a slightly complex syntax structure. However, once we understand the 4 parts which constitute a complete tryCatch call as shown below, it becomes easy to remember:

  • expr : [Required] R code(s) to be evaluated
  • error : [Optional] What should run if an error occured while evaluating the codes in expr
  • warning : [Optional] What should run if a warning occured while evaluating the codes in expr
  • finally : [Optional] What should run just before quitting the tryCatch call, irrespective of if expr ran succcessfuly, with an error, or with a warning
tryCatch(
    expr = {
        # Your code...
        # goes here...
        # ...
    },
    error = function(e){ 
        # (Optional)
        # Do this if an error is caught...
    },
    warning = function(w){
        # (Optional)
        # Do this if an warning is caught...
    },
    finally = {
        # (Optional)
        # Do this at the end before quitting the tryCatch structure...
    }
)

Hello World example

This is a toy example showing how a function can use tryCatch to handle execution.

log_calculator <- function(x){
    tryCatch(
        expr = {
            message(log(x))
            message("Successfully executed the log(x) call.")
        },
        error = function(e){
            message('Caught an error!')
            print(e)
        },
        warning = function(w){
            message('Caught an warning!')
            print(w)
        },
        finally = {
            message('All done, quitting.')
        }
    )    
}

If x is a valid number, expr and finally are executed:

log_calculator(10)
## 2.30258509299405
## Successfully executed the log(x) call.
## All done, quitting.

If x is an invalid number (negative, zero, NA), expr is attempted, and warning and finally are executed:

log_calculator(-10)
## Caught an warning!
## <simpleWarning in log(x): NaNs produced>
## All done, quitting.

If x is an invalid entry which raises an error, expr is attempted, and error and finally are executed:

log_calculator("log_me")
## Caught an error!
## <simpleError in log(x): non-numeric argument to mathematical function>
## All done, quitting.

More useful examples

Use tryCatch within loops

There are cases at work where I have quite large datasets to pre-process before model building can begin. The sources of these data can be varied and thus the quality of these data can vary. While each dataset should conform to our data quality standards (datatypes, data dictionaries, other domain-specific constraints), very often these isn’t the case. As a result, common data preprocessing functions might fail on few datasets. We can use tryCatch within the for loop to catch errors without breaking the loop.

Another toy example: Say, we have a nested dataframe of the mtcars data, nested on the cylinder numbers, and say, we had a few character values in mpg which is our response variable.

# Example nested dataframe
df_nested <- split(mtcars, mtcars$cyl)

df_nested[[2]][c(4,5),"mpg"] <- "a"
df_nested
## $`4`
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## 
## $`6`
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4        21   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag    21   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Valiant           a   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Merc 280          a   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 
## $`8`
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

We wish to run a few custom preprocessors, including taking the log of mpg.

convert_gear_to_factors <-
  function(df) {
    df %>% 
      mutate(gear = factor(gear, levels = 1:5, labels = paste0("Gear_", 1:5)))
  }
transform_response_to_log <-
  function(df) {
    df %>% mutate(log_mpg = log(mpg)) %>% select(-mpg)
  }

How do we run our preprocessors over all the rows without error-ing out?

for (indx in 1:length(df_nested)) {
    tryCatch(
        expr = {
            df_nested[[indx]] <-  df_nested[[indx]] %>% 
                convert_gear_to_factors() %>% 
                transform_response_to_log()
            message("Iteration ", indx, " successful.")
        },
        error = function(e){
            message("* Caught an error on itertion ", indx)
            print(e)
        }
    )
}
## Iteration 1 successful.
## * Caught an error on itertion 2
## <error/dplyr:::mutate_error>
## Error in `mutate()`:
## ! Problem while computing `log_mpg = log(mpg)`.
## Caused by error in `log()`:
## ! non-numeric argument to mathematical function
## ---
## Backtrace:
##   1. base::tryCatch(...)
##  10. dplyr:::mutate.data.frame(., log_mpg = log(mpg))
##  11. dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
##  13. mask$eval_all_mutate(quo)
## Iteration 3 successful.

We’re able to handle the error on iteration 2, let the user know, and run the remaining iterations.

Catch issues early, log progress often

An important component of preparing ‘development’ code to be ‘production’ ready is implementation of good defensive programming and logging practices. I won’t go into details of either here, except to showcase the style of programs I have been writing to prepare code before it goes to our production cluster.

preprocess_data <- function(df, x, b, ...){
    message("-- Within preprocessor")
    df %>% 
        assertive::assert_is_data.frame() %>% 
        assertive::assert_is_non_empty()
    x %>% 
        assertive::assert_is_numeric() %>% 
        assertive::assert_all_are_greater_than(3.14)
    b %>% 
        assertive::assert_is_a_bool()
    
    # Code here...
    # ....
    # ....
    
    return(df)
}
build_model <- function(...){message("-- Building model...")}
eval_model  <- function(...) {message("-- Evaluating model...")}
save_model  <- function(...) {message("-- Saving model...")}

main_executor <- function(...){
    tryCatch(
        expr = {
            preprocess_data(df, x, b, more_args,...) %>% 
                build_model() %>% 
                eval_model() %>% 
                save_model()
        },
        error = function(e){
            message('** ERR at ', Sys.time(), " **")
            print(e)
            write_to_log_file(e, logger_level = "ERR") #Custom logging function
        },
        warning = function(w){
            message('** WARN at ', Sys.time(), " **")
            print(w)
            write_to_log_file(w, logger_level = "WARN") #Custom logging function
        },
        finally = {
            message("--- Main Executor Complete ---")
        }
    )
}

Each utility function starts with checking arguments. There are plenty of packages which allow run-time testing. My favorite one is assertive. It’s easy to read the code, and it’s pipe-able. Errors and warnings are handled using tryCatch - they are printed to the console if running in interactive mode, and then written to log files as well. I have written my own custom logging functions, but there are packages like logging and log4r which work perfectly fine.

Use tryCatch while model building

tryCatch is quite invaluable during model building. This is an actual piece of code I wrote for a kaggle competition as part of my midterm work at school. Github link here. The details of what’s going on isn’t important. At a high level, I was fitting stlf models using forecast for each shop, among 60 unique shop-ID numbers. For various reasons, for some shops, an stlf model could not be be fit, in which case a default seasonal naive model using snaive was to be used. tryCatch is a perfect way to handle such exceptions as shown below. I used a similar approach while building models at an “item” level: the number of unique items was in the 1000s; manually debugging one at a time is impossible. tryCatch allows us to programatically handle such situations.

stlf_yhats <- vector(mode = 'list', length = length(unique_shops))
for (i in seq_along(unique_shops)) {
    cat('\nProcessing shop', unique_shops[i])
    tr_data <- c6_tr %>% filter(shop_id == unique_shops[i])
    tr_data_ts <-
        dcast(
          formula = yw ~ shop_id,
          data = tr_data,
          fun.aggregate = sum,
          value.var = 'total_sales',
          fill = 0
        )
    tr_data_ts <- ts(tr_data_ts[, -1], frequency = 52)

    ##################
    # <--Look here -->
    fit <- tryCatch(
      expr = {tr_data_ts %>% stlf(lambda = 'auto')},
      error = function(e) { tr_data_ts %>% snaive()}
      )
    ##################
  
    fc <- fit %>% forecast(h = h)
    stlf_yhats[[i]] <- as.numeric(fc$mean)
    stlf_yhats[[i]] <- ifelse(stlf_yhats[[i]] < 0, 0, stlf_yhats[[i]])
}

Hope this is useful to others learning tryCatch. Cheers.


Subscribe to my newsletter!