Unittesting R scripts

We’re building a solution that uses some R scripts for data analysis and cleanup. The R scripts are tested during the integration phase when the database is available. We would like to test the scripts when new versions are pushed to source control without the need for a database. This is where unittests come in.

The scripts all follow the same steps:

  • setup,
  • read data from database,
  • process data,
  • write data to database,
  • report

First we need to split logic, flow and parameters. The easiest way to do this is to implement functions with the names of the steps listed above and call the functions from a new script. Some code will speak a thousand words.

# script with logic and flow functions
library(iterators)
library(foreach)
library(data.table)

setup <- function() {  }
read_data <- function(db, param) {  }
process_data <- function(data) { data.table(mean_price = mean(data$price))  }
write_data <- function(db, data) {  }
report <- function(db, param) {  }
do_work <- function(db, param) {
   setup()
   unprocessed_data <- read_data(db, param)
   iterator <- iter(unprocessed_data$data)
   foreach (row = iterator) %do% { 
      processed_row <- process_data(row)
      write_data(db, processed_row)
   }
   report(db, param)
}

# script with parameters
source('script_with_functions.R')
connection <- odbcDriverConnect(connectionstring = '...')
year <- 2016
do_work(connection, year)
odbcClose(connection)

During execution of the script with parameters the functions are loaded and the outcome will be the same as the initial script. This code refactoring enabled us to write unittests for the functions where mock objects can be used to mimic the external dependencies. Again some code to explain what we’re talking about

library(testthat)
library(mockery)
library(data.table)

source('script_with_functions.R')
describe('process_data', {
   it('calculates_mean_price', {
      # data table with 4 rows with price 10
      four_rows_price_10 <- data.table(price = rep(10, 4))
      result <- process_data(four_rows_price_10)
      expect_equal(result$mean_price, 10)
   })
})
# other functions only read/write data: no unittest needed, since no logic
describe('do_work', {
   it('calls setup', {
      # create mock object for setup function
      fake_setup = mock()
      # replace setup function with mock for calls to do_work
      stub(do_work, 'setup', fake_setup)
      # call the 'flow' function
      fake_db = mock()
      do_work(fake_db, 2016)
      # verify setup was called
      expect_called(fake_setup, 1)
   })
   it('calls process_data 4 times', {
      # create mock object for setup function
      fake_process_data = mock()
      # replace process_data function with mock for calls to do_work
      stub(do_work, 'process_data', fake_process_data)
      # return 4 sets of data to process (with 3 rows each)
      four_sets_of_data <- tibble(g=1:4, data=list(data.table(price=1:3)))
      stub(do_work, 'read_data', four_rows_with_data)
      # call the 'flow' function
      fake_db = mock()
      do_work(fake_db, 2016)
      # verify process_data was called 4 times
      expect_called(fake_process_data, 4)
   })
})    

By splitting the logic and flow into functions we’re able to write unittests that check the logic in the process_data function and the flow in the do_work function. The functions not unittested all need a database to work – we could use sqlite inmemory – but that is a database too – so we leave those tests for the integration tests.

Running the unittests above will result in rainbows and a smiley. Complete working code in my github.

Test passed 🌈
Test passed 🌈
Test passed 😀
Posted in Development, Test | Tagged , , | Leave a comment

The Unicorn Project

During lockdown I’ve been reading The Unicorn Project. The follow-up to The Phoenix Project. Again a very entertaining writing about the rocky road to get the IT department to the next level.

I’ve read this book after I’ve read The Phoenix Project for the second time. It is situated during the same timespan and sometimes the storylines overlap. This makes clear not one team’s effort has saved the company, but multiple teams with the same endgoal.

The story is written with enough technical background to be based on true life and everything learned can be applied in the now. I can relate to a lot of the problems faced even though I’m not working at a commercial company. But keep in mind that the book is fictional.

After reading The Unicorn Project I’m more convinced that devops has its advantages, but might not be easily applied to every company. It requires a lot of agility of the entire company and a group of enthousiasts that are willing to go the extra mile.

Read more here: https://itrevolution.com/the-unicorn-project/

Posted in Uncategorized | Leave a comment

Renew certificate 2020 edition

Image courtesy of KROMKRATHOG / FreeDigitalPhotos.net

It’s that time of the year to renew my certificate. In my Renew certificate 2017 edition post I wrote about switching to another source, that now has reached the end of the 3 free certificates. Time to switch again 🙄

A quick google directed me to https://letsencrypt.org. They are funded by companies and provide certificates for free. Use of certbot (https://certbot.eff.org) is advised and what I’ve used.

I’m using the docker version of certbot in manual mode. In the terminal line below I start the container, mount ~/workspace/certbot and provide the commandline options:

docker run -it --rm --name certbot -v "/Users/Eric/workspace/certbot:/etc/letsencrypt" -v "/Users/Eric/workspace/certbot/lib:/var/lib/letsencrypt" certbot/certbot certonly --manual --preferred-challenges dns

This starts a ‘conversation’ that creates a certificate in manual mode with verification with ‘_acme-challenge’. The _acme-challenge provides a random string that I must put in my domain dns as TXT-record:

TypeHostValueTTL
TXT-Record_acme_challengeR4nD0m57R1n91 min

In my ~/workspace/certbot a lot of folders are created and one is called ‘live’. The live folder contains a folder with the domain name with inside the certificate and private key. Synology accepts the privkey.pem and cert.pem. No need to provide the intermediate certificatie.

Next time I should be able to use commandline option ‘renew’ and all certificates will be renewed. Will update this post when that happens 😉

Posted in Security, Tooling | Tagged , | Leave a comment

Restore IFTTT applets

Since november IFTTT limits free users to 3 custom applets. More info on ifttt.com https://help.ifttt.com/hc/en-us/articles/360055682873-What-happens-if-I-do-not-upgrade-to-IFTTT-Pro- Today I did a checkup on my applets and noticed that 3 very old applets were left. My most used applets are in the archive. Time to restore these.

First I archived the custom applets to make room for the restore. To archive an applet open it and click on the settings. At the bottom there is an Archive button. Now I have room for 3 custom applets.

Open your archive (https://ifttt.com/archive) and click the applet(s) to restore. A click is all that’s needed to restore an applet from the archive – as long as you’ve got the space. Remember free users are limited to 3 custom applets.

After the restore you need to connect (activate) the applet. On the home page (https://ifttt.com/home) open the applet by clicking it. Now click on Connect and the switch animation will flick to connected. – done

custom applet of strava to webrequest
Custom applet restored

Tip for free users: browse the library (https://ifttt.com/explore) on IFTTT and activate common applets from there, they will not count to your limit of 3. I’m using lots of different applets that do the same – send rss items to pocket – here are some of them:

Posted in Tooling | Tagged | Leave a comment

Lift and shift SSIS package

We use Sql Server Integration Services (SSIS) to import text files into our database. The dtsx files are deployed into the MSDB on the Sql Server. Everything works well with deployment with a powershell DSC script. But times change and we must deploy the packages to the catalog. Here is what we did.

Convert

The first step is to convert the project. We used the “Convert to Project Deployment Model” wizard that is started from the Project menu or the context menu in the Solution Explorer. Because this is a lift-and-shift we don’t replace the configuration bij parameters, so in the step “create parameters” we unselect everything. After the conversion we can still open the “Package Configuration” and see the properties are set from there.

Build

The reason we converted the project is because we need an .ispac file for the release to the catalog. During the build the ispac file is created. We gather alle dtsx and ispac files in an atrifact.

Deploy

Deployment will take the new ispac files in the artifact. With the code sample from microsoft (see references) we install the ispac into the catalog with DeployProject. This change is in the powershell DSC.

Conslusion

Our SqlJobs had to load the package from the catalog now. This was a minor change:  

SET @Command = N'/SQL "\"\FOLDER\PACKAGENAME\"" /SERVER "\"' + @Server + '\"" /CHECKPOINTING OFF /REPORTING E’
SET @command=N'/ISSERVER "\"\SSISDB\FOLDER\PROJECTNAME\PACKAGENAME\""  /SERVER "\"' + @Server + '\"" /CHECKPOINTING OFF /REPORTING E'

After these steps the SSIS packages are in the catalog. The impact on the system is minimal because the sqljob remains on the same location and handles the location change. Happy dev, happy dba.

References

Posted in Development | Tagged , , , | Leave a comment