Home Explore Blog CI



postgresql

2nd chunk of `doc/src/sgml/unaccent.sgml`
acf60632220ee262d7d3352b79103ad3910e0f879950423700000001000008ca

   <listitem>
    <para>
     Actually, each <quote>character</quote> can be any string not containing
     whitespace, so <filename>unaccent</filename> dictionaries could be used for
     other sorts of substring substitutions besides diacritic removal.
    </para>
   </listitem>

   <listitem>
    <para>
     Some characters, like numeric symbols, may require whitespaces in their
     translation rule. It is possible to use double quotes around the translated
     characters in this case. A double quote needs to be escaped with a second
     double quote when including one in the translated character. For example:
<programlisting>
&frac14;      " 1/4"
&frac12;      " 1/2"
&frac34;      " 3/4"
&ldquo;       """"
&rdquo;       """"
</programlisting>
    </para>
   </listitem>

   <listitem>
    <para>
     As with other <productname>PostgreSQL</productname> text search configuration files,
     the rules file must be stored in UTF-8 encoding.  The data is
     automatically translated into the current database's encoding when
     loaded.  Any lines containing untranslatable characters are silently
     ignored, so that rules files can contain rules that are not applicable in
     the current encoding.
    </para>
   </listitem>
  </itemizedlist>

  <para>
   A more complete example, which is directly useful for most European
   languages, can be found in <filename>unaccent.rules</filename>, which is installed
   in <filename>$SHAREDIR/tsearch_data/</filename> when the <filename>unaccent</filename>
   module is installed.  This rules file translates characters with accents
   to the same characters without accents, and it also expands ligatures
   into the equivalent series of simple characters (for example, &AElig; to
   AE).
  </para>
 </sect2>

 <sect2 id="unaccent-usage">
  <title>Usage</title>

  <para>
   Installing the <literal>unaccent</literal> extension creates a text
   search template <literal>unaccent</literal> and a dictionary <literal>unaccent</literal>
   based on it.  The <literal>unaccent</literal> dictionary has the default
   parameter setting <literal>RULES='unaccent'</literal>, which makes it immediately
   usable with the standard <filename>unaccent.rules</filename> file.
   If you wish,

Title: Unaccent Dictionary Configuration and Usage
Summary
The unaccent dictionary can be used for substring substitutions beyond diacritic removal, and its rules file must be in UTF-8 encoding, with examples and a pre-installed rules file available for European languages, and can be easily installed and used with default parameter settings.