spell correction

by David Weldon

Over the past couple of years I’ve done some NLP work in erlang including LSA, text segmentation and automatic spell correction. If you’ve ever thought about creating a spell checker, I’d recommend reading this article. Much can be improved in the implementation presented – chiefly, the brute force search of all conceivable variations of the input text. My own application required automatic correction on the order of a few milliseconds, so that approach was clearly not going to work.

I then found this article, which explains a terrific solution. In summary, if you keep your dictionary in a trie you can narrow your search to only variations which exist in the dictionary. This greatly improves the speed of the algorithm. I implemented the same idea in erlang, and ever since I’ve been doing ~1ms corrections with a dictionary of tens of thousands of words.

While most of my work is application-specific, the trie structure and search algorithm are generic so I decided to open source them here. Like all of my projects, please feel free to send suggestions and pull requests.

Advertisements