Array
, Set
and SortedSet
reflect various data structures created for different purposes. It’s good to know their pros and cons and, based on them, be able to select the one which fulfils concrete requirements best. Sometimes there is almost no difference, but there are cases where Set
& SortedSet
are much faster (and more suitable) than Array
and vice versa.Array
class is one of the most commonly used Ruby class in day-to-day development. Enumerable
module adds a comprehensive set of methods which makes playing with arrays a pleasant activity. I have just written set deliberately.
Next to the Array
class there is also less known Set
, which to be honest, I have not used by myself until today. In this article, I would like to share with you my findings about the classes based on a short example 🙂
Basic requirements and implementation
Let’s assume that we are building a brand new Ruby application (without any specific web framework). One of the requirements defined as a user story is written below:
As a user I want to be able to select a language in which content is displayed so that my friends from other countries can understand it as well.
To fulfil it we need to store all supported languages1 somewhere. There are multiple approaches to this problem, but for the sake of simplicity let’s agree that we are going to define Language
class. It will be responsible for all the logic related to languages supported by the application.
class Language end
To keep a list of the supported languages we can define a public class method called all
. For the beginning, let’s assume that en
is the only language we have content in.
class Language def self.all [:en] # It could be string as well. end end
The method returns all supported languages:
Language.all => [:en]
My brain intuitively prompted me, that Array
is a suitable data structure for that case in Ruby world.
Language.all.class => Array
Over time, as the application has been growing, other developers have extended the list with new elements:
class Language def self.all [:en, :de, :fr, :pl, :ro, :it, :nl] end end
At this point I can see two potential problems:
- The array is defined in one line, what makes spotting a bug (e.g. a duplicated record) during code review process harder.
- The array is not sorted alphabetically. Therefore, it is easy to add the same language twice.
Let’s apply small changes:
class Language def self.all [ :de, :en, :fr, :it, :nl, :pl, :ro, ] end end
There are more lines of codes, but we are prepared better to avoid the potential issues. To gain even more confidence, we can take usage of sort
and uniq
methods defined in Enumerable
mixin:
class Language def self.all [ :de, :en, :fr, :it, :nl, :ro, :pl, # yiks, we added an unsorted element :de, # yiks, we added a duplicated element ].sort.uniq end end
Thanks to it, the method always returns a sorted array of supported languages without duplicates. Are there any downsides of the above code? There are:
sort
&uniq
methods return new array every time, so three objects are created instead of one.- We did not freeze the array, so new elements (languages) can be pushed to it. It may lead to bugs.
languages = Language.all => [:de, :en, :fr, :it, :nl, :pl, :ro] languages.push(:my_evil_language) => [:de, :en, :fr, :it, :nl, :pl, :ro, :my_evil_language]
Let’s try to address the above issues:
class Language def self.all [ :de, :en, :fr, :it, :nl, :pl, :ro, ].uniq.sort!.freeze end end
The code removes duplicates, sorts and freezes the array finally. Order of operations is important here. If we froze it first, we wouldn’t be able to modify it anymore:
[:de, :en, :pl].freeze.sort! FrozenError (can't modify frozen Array)
Let’s check if our refactored method works as expected:
languages = Language.all => [:de, :en, :pl] languages.push(:my_evil_language) FrozenError (can't modify frozen Array)
At this point, we could stop polishing the method and it would fulfil the defined requirements well enough. But that is not the aim of this article 🙂
Say hi! to Set
& SortedSet
classes
Time for some fancy definition:
In mathematics, a set is a collection of distinct objects, considered as an object in its own right.2
The above sentence is true also in Ruby. Simply speaking, a set is an unordered array without duplicates.
Set[:de, :en] => #<Set: {:de, :en}> Set[:de, :en, :de, :en] # duplicates are ignored => #<Set: {:de, :en}> [:de, :en].to_set # Array has to_set method defined => #<Set: {:de, :en}> Set.new([:de, :en]) => #<Set: {:de, :en}>
The above snippet shows different ways of creating new Set
objects. Furthermore, similar to Array
class, Enumerable
module is one of his ancestors:
Array.ancestors => [Array, Enumerable, Object, Kernel, BasicObject] Set.ancestors => [Set, Enumerable, Object, Kernel, BasicObject
It means that we can use all the handy collection methods on Set
objects:
set = Set.new([:en, :de]) => #<Set: {:en, :de}> sorted_set = set.sort => [:de, :en]
But wait, did you notice a subtle difference?
set.class => Set sorted_set.class => Array
You may ask Why? The answer can be found in Ruby documentation:
Set implements a collection of unordered values with no duplicates.3
Unique values without order. As simple as that. Calling any method that breaks this rule returns Array
object instead.
Applying Set
to Language
class
require 'set' class Language def self.all Set[ :de, :en, :fr, :it, :nl, :pl, :ro, ].sort.freeze end end
Note, that for a set we need to use sort
method, as sort!
one is defined inside Array
class only.
Does the above make the code any better? From one point we used a data structure created specially for storing unique values. From another, we ends with an array anyway and create two objects (one set and one array).
Can we do any better? SortedSet
to the rescue! I literally discovered it when I was writing this article.
SortedSet implements a Set that guarantees that its elements are yielded in sorted order when iterating over them.4
SortedSet.ancestors => [SortedSet, Set, Enumerable, Object, Kernel, BasicObject]
A subclass of Set
with elements in order, let’s try that.
require 'set' class Language def self.all SortedSet[ :en, :fr, :it, :nl, :pl, :ro, :de # an unordered element ].freeze end end
languages = Language.all => #<SortedSet: {:de, :en, :fr, :it, :nl, :pl, :ro}> languages.add(:my_evil_angiage) FrozenError (can't modify frozen SortedSet)
Thanks to SortedSet
we created only one object representing a class created for storying ordered elements with no duplicates. Voila 🙂
Summary
We can achieve (almost) the same result using Array
, Set
and SortedSet
class. They are different data structures created for various reasons, though. A concrete class should be selected based on well-defined requirements. To make the final choice we need to be aware of their strengths and weaknesses. Asking yourself some questions helps as well:
- How many elements will the data structure hold?
- Will all the elements be returned every time or rather only one/a few based on some criteria (like checking if a language is supported by the application)?
- and so on.
In the end, we might find that for one case it won’t make any difference (like holding only 10 static values), when for another a difference may be significant (like checking if an element is present in a data structure by using include?
method).
If you have doubts, it’s always a good idea to do some benchmarking veryfing your assumptions:
require 'benchmark/ips' require 'set' elements = (1..100_000).to_a array = [*elements].freeze set = Set[*elements].freeze Benchmark.ips do |x| random_element = elements.sample x.report("Set.include?") { 10000.times { set.include?(random_element) } } x.report("Array.include?") { 10000.times { array.include?(random_element) } } end
Warming up -------------------------------------- Set.include? 109.000 i/100ms Array.include? 1.000 i/100ms Calculating ------------------------------------- Set.include? 1.071k (±12.0%) i/s - 5.341k in 5.077456s Array.include? 0.214 (± 0.0%) i/s - 2.000 in 9.343202s
The above example clearly shows that searching for an element using include?
method is much more effective on Set
objects. Why? Because it uses Hash
behind the scene which is faster for such cases than Array
in Ruby. But that’s topic for another article 🙂
Treat Array
, Set
and SortedSet
classes like specialised tools in your toolset. You should be aware that all three exist, know their limitations, and use them to solve your problems effectively 🙂
Footnotes
- Let’s ignore differences between locales and languages.
- https://en.wikipedia.org/wiki/Set_(mathematics)
- http://ruby-doc.org/stdlib-2.5.3/libdoc/set/rdoc/Set.html
- https://ruby-doc.org/stdlib-2.5.3/libdoc/set/rdoc/SortedSet.html