Home Explore Blog CI



postgresql

18th chunk of `doc/src/sgml/charset.sgml`
a5d2e65fa59db77976a99335eb6b4b3edd30424f6a601f6f0000000100000fbe

CREATE COLLATION num_ignore_punct (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-kn');
SELECT 'id-45' < 'id-123' COLLATE num_ignore_punct; -- true
SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
</programlisting>

    Many of the available options are described in <xref
    linkend="icu-collation-settings"/>, or see <xref
    linkend="icu-external-references"/> for more details.
   </para>

   <sect3 id="icu-collation-comparison-levels">
    <title>ICU Comparison Levels</title>

    <para>
     Comparison of two strings (collation) in ICU is determined by a
     multi-level process, where textual features are grouped into
     "levels". Treatment of each level is controlled by the <link
     linkend="icu-collation-settings-table">collation settings</link>. Higher
     levels correspond to finer textual features.
    </para>

    <para>
     <xref linkend="icu-collation-levels"/> shows which textual feature
     differences are considered significant when determining equality at the
     given level. The Unicode character <literal>U+2063</literal> is an
     invisible separator, and as seen in the table, is ignored for at all
     levels of comparison less than <literal>identic</literal>.
    </para>

     <table id="icu-collation-levels">
      <title>ICU Collation Levels</title>
      <tgroup cols="8">
       <colspec colname="col1" colwidth="1*"/>
       <colspec colname="col2" colwidth="1.25*"/>
       <colspec colname="col3" colwidth="1*"/>
       <colspec colname="col4" colwidth="1*"/>
       <colspec colname="col5" colwidth="1*"/>
       <colspec colname="col6" colwidth="1*"/>
       <colspec colname="col7" colwidth="1*"/>
       <colspec colname="col8" colwidth="1*"/>

       <thead>
        <row>
         <entry>Level</entry>
         <entry>Description</entry>
         <entry><literal>'f' = 'f'</literal></entry>
         <entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
         <entry><literal>'x-y' = 'x_y'</literal></entry>
         <entry><literal>'g' = 'G'</literal></entry>
         <entry><literal>'n' = 'ñ'</literal></entry>
         <entry><literal>'y' = 'z'</literal></entry>
        </row>
       </thead>

       <tbody>
        <row>
         <entry>level1</entry>
         <entry>Base Character</entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>false</literal></entry>
        </row>
        <row>
         <entry>level2</entry>
         <entry>Accents</entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>false</literal></entry>
         <entry><literal>false</literal></entry>
        </row>
        <row>
         <entry>level3</entry>
         <entry>Case/Variants</entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>false</literal></entry>
         <entry><literal>false</literal></entry>
         <entry><literal>false</literal></entry>
        </row>
        <row>
         <entry>level4</entry>
         <entry>Punctuation<footnote><para>only with
         <literal>ka-shifted</literal>; see <xref
         linkend="icu-collation-settings-table"/></para></footnote></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>true</literal></entry>
         <entry><literal>false</literal></entry>
         <entry><literal>false</literal></entry>
         <entry><literal>false</literal></entry>
         <entry><literal>false</literal></entry>
        </row>
        <row>
         <entry>identic</entry>
         <entry>All</entry>
         <entry><literal>true</literal></entry>

Title: ICU Collation Comparison Levels
Summary
ICU uses a multi-level process to compare strings, with higher levels corresponding to finer textual features, and each level considering different types of character differences, such as base characters, accents, case, and punctuation, as shown in the ICU Collation Levels table.