Advanced Rails ActiveRecord & SQL tricks (Part 1)

Sun, Mar 22, 2020

Read in 6 minutes

Rails ActiveRecord is powerful but it can feel limiting at times. With a bit of coercing, you can get it to play well with SQL. In this part I'll illustrate an example using PARTITION BY clauses.

At NodeFlair we store each user’s specialisations in a table that looks like this:

id name user_id percentage
1 Backend 1 30
2 Frontend 1 70
3 Backend 2 20
4 Frontend 2 30
5 Mobile 2 50
6 Mobile 3 100

Recently we came across a situation where we needed to:

Naive solution

The naive way of solving the above problem is to use the group() method.

Specialisation.all.group(:profile_id)
  .select("
    count(*) as total_count, 
    profile_id,
    sum(percentage) as total_percentage,
    max(percentage) as max_percentage
  ").to_a

# Adding also virtual attributes to the model
attributes :total_count, :integer
attributes :total_percentage, :integer
attributes :max_percentage, :integer

which results in:

[
  #<Specialisation id: nil, profile_id: 1, total_count: 2, total_percentage: 100, max_percentage: 70>, 
  #<Specialisation id: nil, profile_id: 2, total_count: 3, total_percentage: 100, max_percentage: 50>, 
  #<Specialisation id: nil, profile_id: 3, total_count: 1, total_percentage: 100, max_percentage: 100>
] 

which can be an annoying because:

Luckily, there is a better way to do this, and that is the SQL PARTITION BY clause.

What is PARTITION BY?

SQL PARTITION BY allows you to define a partition over a set of rows, and perform aggregations and sorting within those partitions. The beauty of this is that your current table is never transformed, so you get to access all the data that is there.

The graphical way to describe how PARITION BY works is with the graphic below:

Jumping over to SQL, we can easily form our query like this:

SELECT 
  *,
  COUNT(profile_id) OVER (PARTITION BY profile_id) AS total_count,
  SUM(percentage) OVER (PARTITION BY profile_id) AS total_percentage,
  ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY percentage desc) AS rank
FROM 
  specialisations

to get the result that we want:

id name user_id percentage total_count total_percentage rank
1 Backend 1 30 2 100 2
2 Frontend 1 70 2 100 1
3 Backend 2 20 3 100 3
4 Frontend 2 30 3 100 2
5 Mobile 2 50 3 100 1
6 Mobile 3 100 1 100 1

Now this is convenient! Let’s go through what the sql statement does.

The line

... OVER ( PARTITION BY ... ) 

is what allows us to utilise the window function. The PARTITION BY keyword allows is to specify how we want to break up our records for processing. In our case, we wanted to group the windows (or “partitions”) by profile_id, so we enter PARTITION BY profile_id.

Then in order to get the count and sum, we just apply the aggregate function to each line to obtain

COUNT(profile_id) OVER (PARTITION BY profile_id) AS total_count,
SUM(percentage) OVER (PARTITION BY profile_id) AS total_percentage

Cool. Then how do we get the specialisation of each profile that has the highest percentage?

The true power of partitions is that you can define ordering within each partition. All we need to do now is to sort by the percentage within each partition, and then we rank them by using ROW_NUMBER() like this:

ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY percentage desc) AS rank

Now if the rank is 1, then we know it is the specialisation with the highest percentage.

Great! Now how do I translate this to Rails?

It turns out you can trick Rails to think that the new columns exist in your database, even though they don’t actually do. The way to do this is to first add some virtual attributes:

# model/specialisation.rb
...
attributes :total_count, :integer
attributes :total_percentage, :integer
attributes :rank, :integer
...

Next, define a scope and wrap the query within another helper function

# model/specialisation.rb
...
scope :with_total_count_and_rank, -> { append_count_and_rank }

def self.append_count_and_rank 
  from <<-SQL.strip_heredoc 
    (
      SELECT 
        *, 
        COUNT(profile_id) OVER (PARTITION BY profile_id) AS total_count,
        SUM(percentage) OVER (PARTITION BY profile_id) AS total_percentage,
        ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY percentage desc) AS rank
      FROM specialisations
    ) AS specialisations 
  SQL
end 
...

The magic body of code here is

from <<-SQL.strip_heredoc 
  (
    #...your SQL here...
  ) AS specialisations 
SQL

This allows you to interpret any SQL query within the brackets as the table that exists in the database. To Rails, this detail is completely transparent. As long as the attributes and columns match you can expect all the usual ActiveRecord features to work.

Then to use the query, we do

Specialisations.with_total_count_and_rank

gives the result

#<ActiveRecord::Relation [
  #<Specialisation id: 1, name: "Backend", percentage: 30, profile_id: 1, 
      total_count: 2, total_percentage: 100, rank: 2>, 
  #<Specialisation id: 2, name: "Frontend", percentage: 30, profile_id: 1, 
      total_count: 2, total_percentage: 100, rank: 1>, 
  #<Specialisation id: 3, name: "Backend", percentage: 20, profile_id: 2, 
      total_count: 3, total_percentage: 100, rank: 3>, 
  ...
]> 

Notice that it now returns an ActiveRecord::Relation? This means that the query retains all the scoping capabilities that comes with ActiveRecord. This means you can do query operations like:

# To get all the specialisations that have the highest percentage
Specialisations.with_total_count_and_rank.where("rank = 1")

# To get all the profile_ids where total percentage is not 100
Specialisations.with_total_count_and_rank.where("total_percentage is not 100").pluck(:profile_id)

# If you have a scope, it works as expected
# in model/specialisation.rb
scope :only_backend, -> { where(name: "Backend") }

# Anywhere else
Specialisations.with_total_count_and_rank.only_backend

# Even works in relations!
profile = Profile.find(1)
profile.specialisations.with_total_count_and_rank

# Gets you the result
# <ActiveRecord::Relation [
#  <Specialisation id: 1, name: "Backend", percentage: 30, profile_id: 1, 
#      total_count: 2, total_percentage: 100, rank: 2>, 
#  <Specialisation id: 2, name: "Frontend", percentage: 30, profile_id: 1, 
#      total_count: 2, total_percentage: 100, rank: 1>
# ]> 

Other applications

Some other applications where I have found PARTITION BY clauses useful are

Ranking all the sequence of messsages between 2 users

SELECT 
  *,
  ROW_NUMBER() OVER (PARTITION BY sender_id, receiver_id ORDER BY created_at desc) AS sequence
FROM 
  messages

Yes you can partition by compound keys!

Find the total sales by month

SELECT 
  *,
  SUM(amount) OVER (PARTITION BY to_char(timestamp, 'YYYY-MM')) AS month_total
FROM 
  sales

Yes you can even partition by calculated fields as well!

Conclusion

ActiveRecord is great and if you are dealing with data, try to work from the SQL layer first and then work backwards to the ActiveRecord layer for mapping. I prefer this approach as it is more declarative (state what you need) as opposed to imperative (calculate what you need).

In the next part, I will be covering another powerful SQL function: SQL window functions.

Learning to write software? Subscribe now to receive tips on software engineering.

Back to home page