Full-text search

Searching prose with LIKE '%word%' feels like search, but it isn't. It matches raw characters, not meaning — so it misses "running" when you type "run", trips over case, ranks nothing, and scans every row. Postgres has a real search engine built in: turn text into normalized lexemes, then match and rank against them.

The seed is a small blog: six articles, each with a title and a paragraph of body, on mixed topics.

sql

SELECT id, title FROM articles ORDER BY id;

Why `LIKE` is not search

Ask LIKE for articles about running. You'll get exactly the rows that contain the literal string "run":

sql

SELECT title FROM articles WHERE body ILIKE '%run%';

That match is dumb in both directions. It catches "running" and "runners" by accident (the substring "run" is in there), but it would happily match "runny" or "prune" too, and it has no idea that "ran" is the same verb. There's no ranking — every hit is equal — and on a big table each query is a full scan. We need to search words, normalized to their root, not characters.

Documents become lexemes: `to_tsvector`

to_tsvector('english', text) parses text into a tsvector: a sorted list of lexemes (normalized word roots) with the positions where each appears. Watch what the english configuration does to a sentence:

sql

SELECT to_tsvector('english', 'The runners were running and had ran ten miles');

Three things happened. "runners", "running", and "ran" all collapsed to the single lexeme run — that's stemming. Common words like "the", "were", "and", "had" vanished — those are stop words, too frequent to be useful. And every surviving lexeme carries its position (run:2,4,7), which powers phrase search and ranking later.

That's why full-text search beats LIKE: it compares meaning-bearing roots, not letters.

Why `LIKE` is not search

Documents become lexemes: `to_tsvector`

Matching with `tsquery` and

tsquery operators

Ranking by relevance: `ts_rank`

Highlighting matches: `ts_headline`

Weighting fields: `setweight`

Making it fast: a generated column + GIN index

What you learned

Why LIKE is not search

Documents become lexemes: to_tsvector

Matching with tsquery and

tsquery operators

Ranking by relevance: ts_rank

Highlighting matches: ts_headline

Weighting fields: setweight

Making it fast: a generated column + GIN index

What you learned

Why `LIKE` is not search

Documents become lexemes: `to_tsvector`

Matching with `tsquery` and

Ranking by relevance: `ts_rank`

Highlighting matches: `ts_headline`

Weighting fields: `setweight`