ActiveJob, GDPR and my contribution to Rails

👋 I am open to remote-friendly offers. If you or your company would be interested in working with me, feel free to send me a message.

Main areas of interest: Ruby, CI, CD, DevOps, AWS, Terraform, Golang, bash, the best engineering practices 👨‍💻

General Data Protection Regulation (GDPR) was fully implemented in 2018.

The GDPR aims primarily to give control to individuals over their personal data (…)
Source: https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

The regulation consists of 99 articles which describe our (as individuals) rights in regards to collecting our data and the obligations of organisations which collect that information.

Due to it, whenever we work with personal data we need to be extra cautious. Violating GDPR may cost us up to €20 million or up to 4% of the annual turnover.

Ruby on Rails logs versus GDPR

What does the ActiveJob module has to do with GDPR you may ask. Let’s focus on application logs first.

Logging is a very useful debugging mechanism. Even though some people have different opinions I do log things. I bet most of us do the same. The basic line of code to have something logged withing Ruby on Rails application is as follows:

logger.info "I am staying at home, and you?"

No magic here. Let’s try something more similar to a real-life example:

logger.tagged('UserCreation') { logger.info "User with e-mail address #{ user.email } has just been created." }

Now it becomes more interesting. An e-mail address, among others, is personal data. Additionally, if the UserCreation service was invoked by a controller action, like UsersController#create the same address could be logged within your controller action by Rails automatically.

What does it mean for a developer from a legal point of view? Say 👋 to GDPR. The regulation states more or less:

(…) don’t collect any information about anyone unless there’s documented and informed consent for the collection, and don’t use that information for anything but the specified purposes.
Source: https://www.ctrl.blog/entry/gdpr-web-server-logs.html

First of all, you should not collect any personal details without asking users for permission first. Secondly, if you have such an agreement you can do so only for the specified in the regulation reasons.

There are exceptions though. GDPR allows storing personal data without consent in server logs for the limited and legitimate purpose of detecting and preventing fraud and unauthorized system access and ensuring the security of your systems. At the same time, access to such logs should be limited and secured (e.g. by using encryption). Logs which are no longer needed should be removed as soon as possible (retention).

💡 Think twice before you decide to store any type of personal data in application logs.

 

What can we do to avoid sensitive data in Ruby on Rails logs?

There are at least two easy-to-implement solutions to the problem.

1. Use unique identifiers

As simple as that. Instead of logging user.email or user.name rely on a unique identifier. You still will be able to track user, or any other entity, when needed.

2. Parameters filtering

Ruby on Rails has a built-in mechanism to prevent logging sensitive data. By default the framework creates config/initializers/filter_parameter_logging.rb file where you can define names of parameters which may contain sensitive data:

Rails.application.config.filter_parameters += [
:email,
:email_confirmation,
:password,
]

You just need to remember to keep the list up-to-date as it is easy to forget about adding recently introduced parameters.

ActiveJob versus personal data

Running business logic in the background, using ActiveJob possibilities, is quite common:

class ValidateEmailAddressJob < ApplicationJob
  queue_as :default

  def perform(email)
    # Do something with the email address
  end
end

# Enqueue a job to be performed as soon as possible
ValidateEmailAddressJob.perform_later(user.email)

Nothing fancy. Until we take a look source code of ActiveJob:

# https://github.com/rails/rails/blob/master/activejob/lib/active_job/log_subscriber.rb#L20-L22
info do
  "Enqueued #{job.class.name} (Job ID: #{job.job_id}) to #{queue_name(event)}" + args_info(job)
end

# https://github.com/rails/rails/blob/master/activejob/lib/active_job/log_subscriber.rb#L107-L114
def args_info(job)
  if job.class.log_arguments? && job.arguments.any?
    " with arguments: " +
      job.arguments.map { |arg| format(arg).inspect }.join(", ")
  else
    ""
  end
end

Simply speaking, every time a job is enqueued, all its arguments are available in application logs.

As soon as you pass personal data to any of your jobs, you have a potential GDRP compliance problem 👮. Even if you have such argument blacklisted in the config.filter_parameters array.

What can be done to avoid leaking sensitive data when enqueuing background jobs?

Fortunately enough several people noticed that the default behaviour may cause issues. Rafael França, based on feedback received in issues, created a pull request which added log_arguments class attribute to ApplicationJob class and all its descendants. The change is present in master branch but has not yet been released yet (08.04.20).

After setting the attribute to false (default value is set to true) no job’s attributes will be logged.

What if we would like to filter only sensitive arguments and have the others present in the logs? config.filter_parameters array seems to be an ideal place to store them there. I had even been sure that it works like that till, thanks to my co-workers (thanks Bartek & Kasia), I became aware that it is not true.

My contribution

I spent a few evenings digging into Rails source code. I thought that it may be a good time to try making the framework more GDPR friendly by default. I even learned how to run Rails tests locally 🙂

Here it goes, my first PR to Ruby on Rails source code. I would really appreciate your feedback and input to make the feature alive. At the same time, as I wrote in the commit message:

There is one caveat though. My proposition works only with hashes as it is kinda hard to provide a bulletproof mechanism for all possible cases. On the other hand, based on my experience, hashes are commonly used as jobs’ arguments.

If you know how to make the proposition even more bulletproof please let me know in the pull request.

UPDATE (16.04.20): My pull request was closed as Rafael França does not agree with the proposed solution. It occurred that a similar approach had been proposed in 2018 and was rejected as well. Well, it looks like we need to find a more bulletproof solution to the problem. For now, you can:

  • wait until the log_arguments class argument will be available in a released Ruby on Rails version and skip logging all arguments,
  • monkey patch source code to filter sensitive data in another way, e.g. the one proposed in my pull request.

Nevertheless, it was a pleasant first try to make Ruby on Rails better. I am sure that other opportunities will show up in the future. I hope that you learned something as well 🙂

📚 I recommend the article which I cited in the article if you would like to dig into GDPR regulation in the context of logs a bit deeper.
A nice sum up without legal jargon.
 

Igor Springer

I build web apps. From time to time I put my thoughts on paper. I hope that some of them will be valuable for you. To teach is to learn twice.