are correct but hardly ever used
and could be a misspelled often-used word.
- For making suggestions the speed is less important and requiring to install
another program or library would be acceptable. But the word lists probably
differ, the suggestions may be wrong words.
Spelling suggestions *develop-spell-suggestions*
For making suggestions there are two basic mechanisms:
1. Try changing the bad word a little bit and check for a match with a good
word. Or go through the list of good words, change them a little bit and
check for a match with the bad word. The changes are deleting a character,
inserting a character, swapping two characters, etc.
2. Perform soundfolding on both the bad word and the good words and then find
matches, possibly with a few changes like with the first mechanism.
The first is good for finding typing mistakes. After experimenting with
hashtables and looking at solutions from other spell checkers the conclusion
was that a trie (a kind of tree structure) is ideal for this. Both for
reducing memory use and being able to try sensible changes. For example, when
inserting a character only characters that lead to good words need to be
tried. Other mechanisms (with hashtables) need to try all possible letters at
every position in the word. Also, a hashtable has the requirement that word
boundaries are identified separately, while a trie does not require this.
That makes the mechanism a lot simpler.
Soundfolding is useful when someone knows how the words sounds but doesn't
know how it is spelled. For example, the word "dictionary" might be written
as "daktonerie". The number of changes that the first method would need to
try is very big, it's hard to find the good word that way. After soundfolding
the words become "tktnr" and "tkxnry", these differ by only two letters.
To find words by their soundfolded equivalent (soundalike word) we need a list
of all soundfolded words. A few experiments have been done to find out what
the best method is. Alternatives:
1. Do the sound folding on the fly when looking for suggestions. This means
walking through the trie of good words, soundfolding each word and
checking how different it is from the bad word. This is very efficient for
memory use, but takes a long time. On