Searching prose with LIKE '%word%' feels like search, but it isn't. It matches raw characters, not meaning — so it misses "running" when you type "run", trips over case, ranks nothing, and scans every row. Postgres has a real search engine built in: turn text into normalized lexemes, then match and rank against them.
The seed is a small blog: six articles, each with a title and a paragraph of body, on mixed topics.
sql
SELECT id, title FROM articles ORDER BY id;
Why LIKE is not search
Ask LIKE for articles about running. You'll get exactly the rows that contain the literal string "run":
sql
SELECT title FROM articles WHERE body ILIKE '%run%';
That match is dumb in both directions. It catches "running" and "runners" by accident (the substring "run" is in there), but it would happily match "runny" or "prune" too, and it has no idea that "ran" is the same verb. There's no ranking — every hit is equal — and on a big table each query is a full scan. We need to search words, normalized to their root, not characters.
Documents become lexemes: to_tsvector
to_tsvector('english', text) parses text into a tsvector: a sorted list of lexemes (normalized word roots) with the positions where each appears. Watch what the english configuration does to a sentence:
sql
SELECT to_tsvector('english', 'The runners were running and had ran ten miles');
Three things happened. "runners", "running", and "ran" all collapsed to the single lexeme run — that's stemming. Common words like "the", "were", "and", "had" vanished — those are stop words, too frequent to be useful. And every surviving lexeme carries its position (run:2,4,7), which powers phrase search and ranking later.
That's why full-text search beats LIKE: it compares meaning-bearing roots, not letters.
Matching with tsquery and
sandbox locked
Sign in to spin up your own Postgres sandbox and run the queries for this lesson.
@@
A search needs the other half: a tsquery, the parsed query. The match operator @@ returns true when the vector satisfies the query. There are four ways to build a tsquery, from strict to friendly.
to_tsquery is the raw form — you write the operators yourself, and lexemes must be single tokens:
sql
SELECT to_tsvector('english', 'I ran a marathon') @@ to_tsquery('english', 'run');
Because the document stems "ran" to run and the query stems too, run matches ran. LIKE '%run%' could never do that.
plainto_tsquery takes plain words and ANDs them together — no operator syntax to learn:
phraseto_tsquery keeps word order, requiring the lexemes to be adjacent (note the \<-> "followed-by" operator it produces):
sql
SELECT phraseto_tsquery('english', 'quick brown fox');
And websearch_to_tsquery understands the syntax people already type into search boxes — quoted phrases, or, and - for exclude. This is usually the one you want for user input:
SELECT title
FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ websearch_to_tsquery('english', 'running');
Every article that talks about running, ran, runners, or runs comes back — regardless of which surface form the author used.
tsquery operators
When you build a query with to_tsquery, you combine lexemes with operators: & and, | or, ! not, and \<-> followed-by (phrase). Compare "run and rain" against "run or rain":
sql
SELECT title
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'run & rain');
sql
SELECT title
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'run | rain');
! excludes — running articles that are not about treadmills:
sql
SELECT title
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'run & !treadmill');
And \<-> demands the lexemes be adjacent, so "quick brown" only matches when those words sit side by side:
sql
SELECT title
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'quick <-> brown');
Ranking by relevance: ts_rank
Search without ranking is a list, not results. ts_rank scores how well a document matches a query — more hits, rarer terms, and closer positions score higher. Order by it descending and the best matches float to the top:
sql
SELECT title,
ts_rank(to_tsvector('english', body), websearch_to_tsquery('english', 'run')) AS rank
FROM articles
WHERE to_tsvector('english', body) @@ websearch_to_tsquery('english', 'run')
ORDER BY rank DESC;
ts_rank_cd ("cover density") is a sibling that also rewards matched lexemes appearing close together, which is often what you want for phrase-like relevance:
sql
SELECT title,
ts_rank_cd(to_tsvector('english', body), websearch_to_tsquery('english', 'run rain')) AS rank
FROM articles
WHERE to_tsvector('english', body) @@ websearch_to_tsquery('english', 'run rain')
ORDER BY rank DESC;
Highlighting matches: ts_headline
To show why a row matched, ts_headline returns a snippet of the original text with the matching lexemes wrapped (by default in \<b> tags). It works on the raw text, not the tsvector:
sql
SELECT ts_headline('english', body, websearch_to_tsquery('english', 'run rain')) AS snippet
FROM articles
WHERE to_tsvector('english', body) @@ websearch_to_tsquery('english', 'run rain');
Weighting fields: setweight
A hit in the title should count more than a hit in the body. setweight tags every lexeme in a tsvector with a label — A, B, C, or D, highest to lowest — and you concatenate the tagged vectors with || into one. ts_rank then reads those labels and scores an A-hit above a B-hit:
sql
SELECT title,
ts_rank(
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B'),
websearch_to_tsquery('english', 'run')
) AS rank
FROM articles
WHERE (setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B'))
@@ websearch_to_tsquery('english', 'run')
ORDER BY rank DESC;
The article with "run" in its title now outranks the ones that only mention it in the body.
Making it fast: a generated column + GIN index
Recomputing to_tsvector(...) on every query means Postgres can't use an index — it's back to scanning. The fix is to store the vector once. A GENERATED ALWAYS AS (...) STORED column keeps a tsvector that Postgres maintains automatically on every insert and update, so it's always in sync:
Now searches read the precomputed column, weighting included:
sql
SELECT title, ts_rank(search, websearch_to_tsquery('english', 'run')) AS rank
FROM articles
WHERE search @@ websearch_to_tsquery('english', 'run')
ORDER BY rank DESC;
To make that @@ lookup fast at scale, add a GIN index on the column — a GIN index maps each lexeme to the rows that contain it, so a search touches only matching rows instead of scanning the table:
sql
CREATE INDEX articles_search_idx ON articles USING gin (search);
On six rows the planner won't bother with the index, but on a real corpus this is what turns full-text search from a scan into a lookup. (Index types get their own lesson — here just note that tsvector columns want gin.)
What you learned
LIKE '%word%' matches characters, not words: no stemming, no ranking, and a full scan every time — it is not search.
to_tsvector('english', text) turns a document into normalized lexemes with positions — stemming ("running"/"ran" → run) and dropping stop words along the way.
A tsquery matches a vector with @@. Build it with to_tsquery (raw operators), plainto_tsquery (words AND'd), phraseto_tsquery (ordered phrase), or websearch_to_tsquery (user-friendly search syntax).