Unaccent Dictionary Configuration and Usage

<listitem> <para> Actually, each <quote>character</quote> can be any string not containing whitespace, so <filename>unaccent</filename> dictionaries could be used for other sorts of substring substitutions besides diacritic removal. </para> </listitem> <listitem> <para> Some characters, like numeric symbols, may require whitespaces in their translation rule. It is possible to use double quotes around the translated characters in this case. A double quote needs to be escaped with a second double quote when including one in the translated character. For example: <programlisting> ¼ " 1/4" ½ " 1/2" ¾ " 3/4" “ """" ” """" </programlisting> </para> </listitem> <listitem> <para> As with other <productname>PostgreSQL</productname> text search configuration files, the rules file must be stored in UTF-8 encoding. The data is automatically translated into the current database's encoding when loaded. Any lines containing untranslatable characters are silently ignored, so that rules files can contain rules that are not applicable in the current encoding. </para> </listitem> </itemizedlist> <para> A more complete example, which is directly useful for most European languages, can be found in <filename>unaccent.rules</filename>, which is installed in <filename>$SHAREDIR/tsearch_data/</filename> when the <filename>unaccent</filename> module is installed. This rules file translates characters with accents to the same characters without accents, and it also expands ligatures into the equivalent series of simple characters (for example, Æ to AE). </para> </sect2> <sect2 id="unaccent-usage"> <title>Usage</title> <para> Installing the <literal>unaccent</literal> extension creates a text search template <literal>unaccent</literal> and a dictionary <literal>unaccent</literal> based on it. The <literal>unaccent</literal> dictionary has the default parameter setting <literal>RULES='unaccent'</literal>, which makes it immediately usable with the standard <filename>unaccent.rules</filename> file. If you wish,

The unaccent dictionary can be used for substring substitutions beyond diacritic removal, and its rules file must be in UTF-8 encoding, with examples and a pre-installed rules file available for European languages, and can be easily installed and used with default parameter settings.