Spell File Format Details: Straight Word List and Affix/Dictionary Files

used for some languages where the words use only ASCII letters for most of the words. The default "spellfile.vim" plugin uses this autocommand, if you define your autocommand afterwards you may want to use ":au! SpellFileMissing" to overrule it. If you define your autocommand before the plugin is loaded it will notice this and not do anything. *E797* Note that the SpellFileMissing autocommand must not change or destroy the buffer the user was editing. ============================================================================== 4. Spell file format *spell-file-format* This is the format of the files that are used by the person who creates and maintains a word list. Note that we avoid the word "dictionary" here. That is because the goal of spell checking differs from writing a dictionary (as in the book). For spelling we need a list of words that are OK, thus should not be highlighted. Person and company names will not appear in a dictionary, but do appear in a word list. And some old words are rarely used while they are common misspellings. These do appear in a dictionary but not in a word list. There are two formats: A straight list of words and a list using affix compression. The files with affix compression are used by Myspell (Mozilla and OpenOffice.org). This requires two files, one with .aff and one with .dic extension. FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format* The words must appear one per line. That is all that is required. Additionally the following items are recognized: - Empty and blank lines are ignored. # comment ~ - Lines starting with a # are ignored (comment lines). /encoding=utf-8 ~ - A line starting with "/encoding=", before any word, specifies the encoding of the file. After the second '=' comes an encoding name. This tells Vim to setup conversion from the specified encoding to 'encoding'. Thus you can use one word list for several target encodings. /regions=usca ~ - A line starting with "/regions=" specifies the region names that are supported. Each region name must be two ASCII letters. The first one is region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca". In an addition word list the region names should be equal to the main word list! - Other lines starting with '/' are reserved for future use. The ones that are not recognized are ignored. You do get a warning message, so that you know something won't work. - A "/" may follow the word with the following items: = Case must match exactly. ? Rare word. ! Bad (wrong) word. 1 to 9 A region in which the word is valid. If no regions are specified the word is valid in all regions. Example: # This is an example word list comment /encoding=latin1 encoding of the file /regions=uscagb regions "us", "ca" and "gb" example word for all regions blah/12 word for regions "us" and "ca" vim/! bad word Campbell/?3 rare word in region 3 "gb" 's mornings/= keep-case word Note that when "/=" is used the same word with all upper-case letters is not accepted. This is different from a word with mixed case that is automatically marked as keep-case, those words may appear in all upper-case letters. FORMAT WITH .AFF AND .DIC FILES *aff-dic-format* There are two files: the basic word list and an affix file. The affix file specifies settings for the language and can contain affixes. The affixes are used to modify the basic words to get the full word list. This significantly reduces the number of words, especially for a language like Polish. This is called affix compression. The basic word list and the affix file are combined with the ":mkspell" command and results in a binary spell file. All the preprocessing has been done, thus this file loads fast. The binary spell file format is described in the source code (src/spell.c). But only developers need to know about it. The preprocessing also allows us to take the Myspell language

This section describes the structure and format of spell files, including both a straight list of words and the use of `.aff` and `.dic` files for affix compression, as employed by Myspell. The explanation covers the components of a straight word list, such as encoding specifications, region definitions, and flags to indicate case sensitivity, rarity, or incorrectness. For affix compression, the text outlines the combination of affix and dictionary files through the `:mkspell` command to generate a binary spell file, enhancing the efficiency of word list storage, especially for languages with rich morphology. It mentions the usage of `SpellFileMissing` autocommand and how `spellfile.vim` plugin utilizes it.