Optimizing pg_search with precomputed tsvector columns in Postgres


TLDR: If your pg_search queries are slow due to joins and dynamic tsvector generation, precompute a weighted search_vector column and index it. Searches become instant, and you control when updates happen.

Optimizing search speed

I recently optimized full-text search in a Rails app using pg_search. Our original setup searched across multiple fields and associations, but performance degraded as the dataset grew.

Originally, our scope looked like this:

pg_search_scope :full_text_search,
  against: [:title, :body],
  associated_against: { tags: :name },
  using: {
    tsearch: { prefix: true },
    trigram: { threshold: 0.8, word_similarity: true }
  },
  ignoring: :accents

We replaced this with a faster approach: a precomputed, weighted tsvector column.

First, we added the column and index:

class AddSearchVectorToPosts < ActiveRecord::Migration[7.1]
  def change
    add_column :posts, :search_vector, :tsvector
    add_index :posts, :search_vector, using: :gin
  end
end


Then we simplified the search scope:

pg_search_scope :full_text_search,
  using: { tsearch: { tsvector_column: :search_vector, prefix: true } },
  ignoring: :accents

To populate search_vector, we weighted the content:

  • title → weight ‘A’ (most important)
  • body → weight ‘B’
  • tags → weight ‘C’

Here’s a method to generate the weighted vector for a single post:

def generate_search_vector
  sql = self.class.sanitize_sql_array(
    [<<~SQL.squish,
      SELECT
        setweight(to_tsvector('simple', unaccent(?)), 'A') ||
        setweight(to_tsvector('simple', unaccent(?)), 'B') ||
        setweight(to_tsvector('simple', unaccent(?)), 'C')
    SQL
      title,
      body,
      tags.pluck(:name).join(' ')
    ]
  )

  self.class.connection.select_value(sql)
end


We updated posts explicitly (but it could be in triggers or callbacks if that’s your design):

post.update(search_vector: post.generate_search_vector)


For bulk updates (during migrations or maintenance), we batched it:

class PostSearchVectorUpdater
  def self.run
    batch_size = 5000
    min_id = Post.minimum(:id)
    max_id = Post.maximum(:id)

    (min_id..max_id).step(batch_size) do |first_id|
      last_id = [first_id + batch_size - 1, max_id].min

      sql = <<-SQL.squish
        WITH data AS (
          SELECT
            p.id,
            p.title AS title,
            p.body AS body,
            COALESCE(string_agg(t.name, ' '), '') AS tag_names
          FROM posts p
          LEFT JOIN tagged_resources tr ON tr.resource_id = p.id AND tr.resource_type = 'Post'
          LEFT JOIN tags t ON t.id = tr.tag_id
          WHERE p.id BETWEEN #{first_id} AND #{last_id}
          GROUP BY p.id, p.title, p.body
        )
        UPDATE posts
        SET search_vector =
          setweight(to_tsvector('simple', unaccent(data.title)), 'A') ||
          setweight(to_tsvector('simple', unaccent(data.body)), 'B') ||
          setweight(to_tsvector('simple', unaccent(data.tag_names)), 'C')
        FROM data
        WHERE posts.id = data.id
      SQL

      ActiveRecord::Base.connection.execute(sql)
      puts "Updated posts #{first_id} - #{last_id}"
    end
  end
end

Results

This gave us a huge speedup — queries that took hundreds of milliseconds now complete in under 10 ms. This approach is ideal if:

  • Your data doesn’t change constantly
  • You’re fine with explicit updates to the search_vector
  • You want to avoid runtime joins and heavy scopes

If pg_search is slowing down your app, precomputing weighted vectors is a simple and powerful solution.

This also has it’s limits and on some point you way want to move to ElasticSearch or some similar tool.

https://github.com/Casecommons/pg_search

Leave a Reply

Your email address will not be published. Required fields are marked *