by David Weldon
Over the past couple of years I’ve done some NLP work in erlang including LSA, text segmentation and automatic spell correction. If you’ve ever thought about creating a spell checker, I’d recommend reading this article. Much can be improved in the implementation presented – chiefly, the brute force search of all conceivable variations of the input text. My own application required automatic correction on the order of a few milliseconds, so that approach was clearly not going to work.
I then found this article, which explains a terrific solution. In summary, if you keep your dictionary in a trie you can narrow your search to only variations which exist in the dictionary. This greatly improves the speed of the algorithm. I implemented the same idea in erlang, and ever since I’ve been doing ~1ms corrections with a dictionary of tens of thousands of words.
While most of my work is application-specific, the trie structure and search algorithm are generic so I decided to open source them here. Like all of my projects, please feel free to send suggestions and pull requests.