[Expand]General Information
[Expand]WinForms Controls
[Expand]ASP.NET Controls and MVC Extensions
[Expand]ASP.NET Bootstrap Controls
[Expand]ASP.NET Core Bootstrap Controls
[Collapse]WPF Controls
 [Expand]What's Installed
 [Expand].NET Core 3 Support
 [Expand]Common Concepts
 [Expand]MVVM Framework
 [Collapse]Controls and Libraries
  [Expand]Data Grid
  [Expand]Ribbon, Bars and Menu
  [Expand]Charts Suite
  [Expand]Pivot Grid
  [Expand]Rich Text Editor
  [Expand]Tree List
  [Expand]Gauge Controls
  [Expand]Map Control
  [Expand]Layout Management
  [Expand]Windows Modern UI
  [Expand]Data Editors
  [Expand]Navigation Controls
  [Collapse]Spell Checker
   [Expand]Getting Started
    Spell Check Algorithms
   [Expand]Visual Elements
  [Expand]Property Grid
  [Expand]PDF Viewer
  [Expand]TreeMap Control
  [Expand]Gantt Control
  [Expand]Diagram Control
  [Expand]Windows and Utility Controls
   Dialogs, Notifications and Panels
  [Expand]Scheduler (legacy)
 [Expand]Scaffolding Wizard
  Redistribution and Deployment
  Get More Help
 [Expand]API Reference
[Expand]Xamarin Controls
[Expand]Windows 10 App Controls
[Expand]Office File API
[Expand]Report and Dashboard Server
[Expand]eXpressApp Framework
[Expand]eXpress Persistent Objects
[Expand]CodeRush Classic
[Expand]Cross-Platform Core Libraries
[Expand]Tools and Utilities
 End-User Documentation
View this topic on docs.devexpress.com (Learn more)

Spell Check Algorithms

Implementing the spell-checking engine is a more complicated task than it may seem. It's evident that looping through the vocabulary is not enough, even if this vocabulary is quite extensive and correct. The spell checker should consider the phonetic aspect of the language.

The key points of our spell checking engine are:

Expanded Text Parser

While parsing the text, certain text elements should be treated uniquely. These elements include abbreviations, proper names, figures, e-mail addresses, uniform resource locator (URL) strings (web addresses), and so on. They could be ignored or checked in a way that's different from other words in the text, depending on the spell checker implementation and user options. The DXSpellChecker for WPF suite provides an OptionsSpelling class that allows a user to avoid checking e-mail and web addresses, words with numbers and mixed case and upper case words.

Expanded Dictionary

An ideal dictionary should be comprised of all the words in a given language. In real life, it can be much smaller and effectively split into several parts, depending on the language. For several Indo-European languages, including English, words are derived from the base by adding affixes - prefixes or postfixes. So, the size of the dictionary can be greatly reduced if the base words, affixes and the rules for adding affixes to base words are placed into separate files. The complete list of words could be built in-place, when necessary. This technique has proven to be effective, especially for synthetic languages (rich in verbal and inflective forms) - Lithuanian or Russian, for example.

An approach that includes the base words and affixes is used in the ISpell and ASpell spell checker projects. Thanks to the Open Office project, the spellchecker dictionaries of these projects may be freely used and distributed. The DXSpellChecker supports this format, since those dictionaries are quite complete and correct, and constantly amended by cooperative users. The current US-English variant includes more than 62000 base words.

When a word is found to be misspelled (that is, not found in the dictionary), then the spell checker generates a list of suggestions - words that may replace the mistake. The final choice is always up to the user.

For more information, see Dictionaries.

Expanded Using Near-Miss Strategy to Find Suggestions

The first algorithm implemented by the DXSpellChecker for building a suggestion list is a near miss strategy. It was developed by Geoff Kuenning for ISpell, and makes the assumption that the word is not necessarily misspelled, but rather mistyped. We change the misspelled word by changing a letter, deleting or adding it, inserting a blank space, or interchanging two adjacent letters. If these steps result in a word contained in the dictionary, then we estimate how far we are from the original word. To measure the proximity of words, a modified Levenshtein distance notion is used.

Expanded Using Phonetic Comparison to Find Suggestions

The phonetic suggestion algorithm takes into account the pronunciation of a word. The DXSpellChecker utilizes the implementation of the Double Metaphone search algorithm. Two phonetic codes (primary and secondary) are calculated for each word. The calculation rules are different for different languages. They are based on the set of pronunciation rules for that language.

Then, the phonetic strategy compares the phonetic code of the misspelled word to all the words in the word list. If the phonetic codes match, then the word is added to the suggestion list.

Expanded Suggestion Ranking

After the list of suggestions is composed, it should be ordered so that the user doesn't have to scroll through it, searching for a perfect match. The implemented solution makes use of the Levenshtein algorithm to calculate the word distance. This distance becomes a parameter for list ordering. Additional assumptions on the nature of a spelling error may help modify the algorithm.

The user makes his choice from the list of suggestions. The misspelled word can be replaced with a word from the suggestion list, ignored, or edited by the user. The last possibility indicates a spell checker miss, and provides an option for appending the corrected word to an auxiliary user dictionary.

Is this topic helpful?​​​​​​​