chxo internets RSS

A network of memes,
by Chris Snyder

See also
CHXO Internet
twitter.com/64

Archive

Oct
29th
Mon
permalink

On Stemming

I recently added stemming to my search action, and I noticed a really big gotcha that is not immediately obvious: when performing full-text searches, you can’t just search for the stem, you also need to search for the original word.

Perhaps you learn this in CompSci? At any rate, searching for just the stem fails because word stems are not always included in the original word: therapy becomes therapi, for instance, and during becomes dure.

Obviously stemming works best when used with an index of stems; using it on full-text search is a hack. But it works!