General Data Protection Regulation (GDPR) was fully implemented in 2018.
The GDPR aims primarily to give control to individuals over their personal data (…)
Source: https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
The regulation consists of 99 articles which describe our (as individuals) rights in regards to collecting our data and the obligations of organisations which collect that information.
Due to it, whenever we work with personal data we need to be extra cautious. Violating GDPR may cost us up to €20 million or up to 4% of the annual turnover.
Ruby on Rails logs versus GDPR
What does the ActiveJob
module has to do with GDPR you may ask. Let’s focus on application logs first.
Logging is a very useful debugging mechanism. Even though some people have different opinions I do log things. I bet most of us do the same. The basic line of code to have something logged withing Ruby on Rails application is as follows:
logger.info "I am staying at home, and you?"
No magic here. Let’s try something more similar to a real-life example:
logger.tagged('UserCreation') { logger.info "User with e-mail address #{ user.email } has just been created." }
Now it becomes more interesting. An e-mail address, among others, is personal data. Additionally, if the UserCreation
service was invoked by a controller action, like UsersController#create
the same address could be logged within your controller action by Rails automatically.
What does it mean for a developer from a legal point of view? Say 👋 to GDPR. The regulation states more or less:
(…) don’t collect any information about anyone unless there’s documented and informed consent for the collection, and don’t use that information for anything but the specified purposes.
Source: https://www.ctrl.blog/entry/gdpr-web-server-logs.html
First of all, you should not collect any personal details without asking users for permission first. Secondly, if you have such an agreement you can do so only for the specified in the regulation reasons.
There are exceptions though. GDPR allows storing personal data without consent in server logs for the limited and legitimate purpose of detecting and preventing fraud and unauthorized system access and ensuring the security of your systems. At the same time, access to such logs should be limited and secured (e.g. by using encryption). Logs which are no longer needed should be removed as soon as possible (retention).
What can we do to avoid sensitive data in Ruby on Rails logs?
There are at least two easy-to-implement solutions to the problem.
1. Use unique identifiers
As simple as that. Instead of logging user.email
or user.name
rely on a unique identifier. You still will be able to track user, or any other entity, when needed.
2. Parameters filtering
Ruby on Rails has a built-in mechanism to prevent logging sensitive data. By default the framework creates config/initializers/filter_parameter_logging.rb
file where you can define names of parameters which may contain sensitive data:
Rails.application.config.filter_parameters += [ :email, :email_confirmation, :password, ]
You just need to remember to keep the list up-to-date as it is easy to forget about adding recently introduced parameters.
ActiveJob versus personal data
Running business logic in the background, using ActiveJob
possibilities, is quite common:
class ValidateEmailAddressJob < ApplicationJob queue_as :default def perform(email) # Do something with the email address end end # Enqueue a job to be performed as soon as possible ValidateEmailAddressJob.perform_later(user.email)
Nothing fancy. Until we take a look source code of ActiveJob
:
# https://github.com/rails/rails/blob/master/activejob/lib/active_job/log_subscriber.rb#L20-L22 info do "Enqueued #{job.class.name} (Job ID: #{job.job_id}) to #{queue_name(event)}" + args_info(job) end # https://github.com/rails/rails/blob/master/activejob/lib/active_job/log_subscriber.rb#L107-L114 def args_info(job) if job.class.log_arguments? && job.arguments.any? " with arguments: " + job.arguments.map { |arg| format(arg).inspect }.join(", ") else "" end end
Simply speaking, every time a job is enqueued, all its arguments are available in application logs.
As soon as you pass personal data to any of your jobs, you have a potential GDRP compliance problem 👮. Even if you have such argument blacklisted in the config.filter_parameters
array.
What can be done to avoid leaking sensitive data when enqueuing background jobs?
Fortunately enough several people noticed that the default behaviour may cause issues. Rafael França, based on feedback received in issues, created a pull request which added log_arguments
class attribute to ApplicationJob
class and all its descendants. The change is present in master
branch but has not yet been released yet (08.04.20).
After setting the attribute to false
(default value is set to true
) no job’s attributes will be logged.
What if we would like to filter only sensitive arguments and have the others present in the logs? config.filter_parameters
array seems to be an ideal place to store them there. I had even been sure that it works like that till, thanks to my co-workers (thanks Bartek & Kasia), I became aware that it is not true.
My contribution
I spent a few evenings digging into Rails source code. I thought that it may be a good time to try making the framework more GDPR friendly by default. I even learned how to run Rails tests locally 🙂
Here it goes, my first PR to Ruby on Rails source code. I would really appreciate your feedback and input to make the feature alive. At the same time, as I wrote in the commit message:
There is one caveat though. My proposition works only with hashes as it is kinda hard to provide a bulletproof mechanism for all possible cases. On the other hand, based on my experience, hashes are commonly used as jobs’ arguments.
If you know how to make the proposition even more bulletproof please let me know in the pull request.
- wait until the
log_arguments
class argument will be available in a released Ruby on Rails version and skip logging all arguments, - monkey patch source code to filter sensitive data in another way, e.g. the one proposed in my pull request.
Nevertheless, it was a pleasant first try to make Ruby on Rails better. I am sure that other opportunities will show up in the future. I hope that you learned something as well 🙂
A nice sum up without legal jargon.