Home Explore Blog CI



postgresql

17th chunk of `doc/src/sgml/textsearch.sgml`
fa1565a4c4f2d96bca89abf5368da48a3800f511c8b9afac0000000100000fa8
 Dark Matter |  1.90953
 Rafting for Solar Neutrinos                   |      1.9
 NGC 4650A: Strange Galaxy and Dark Matter     |  1.85774
 Hot Gas and Dark Matter                       |   1.6123
 Ice Fishing for Cosmic Neutrinos              |      1.6
 Weak Lensing Distorts the Universe            | 0.818218
</screen>

    This is the same example using normalized ranking:

<screen>
SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */ ) AS rank
FROM apod, to_tsquery('neutrino|(dark &amp; matter)') query
WHERE  query @@ textsearch
ORDER BY rank DESC
LIMIT 10;
                     title                     |        rank
-----------------------------------------------+-------------------
 Neutrinos in the Sun                          | 0.756097569485493
 The Sudbury Neutrino Detector                 | 0.705882361190954
 A MACHO View of Galactic Dark Matter          | 0.668123210574724
 Hot Gas and Dark Matter                       |  0.65655958650282
 The Virgo Cluster: Hot Plasma and Dark Matter | 0.656301290640973
 Rafting for Solar Neutrinos                   | 0.655172410958162
 NGC 4650A: Strange Galaxy and Dark Matter     | 0.650072921219637
 Hot Gas and Dark Matter                       | 0.617195790024749
 Ice Fishing for Cosmic Neutrinos              | 0.615384618911517
 Weak Lensing Distorts the Universe            | 0.450010798361481
</screen>
   </para>

   <para>
    Ranking can be expensive since it requires consulting the
    <type>tsvector</type> of each matching document, which can be I/O bound and
    therefore slow. Unfortunately, it is almost impossible to avoid since
    practical queries often result in large numbers of matches.
   </para>

  </sect2>

  <sect2 id="textsearch-headline">
   <title>Highlighting Results</title>

   <para>
    To present search results it is ideal to show a part of each document and
    how it is related to the query. Usually, search engines show fragments of
    the document with marked search terms.  <productname>PostgreSQL</productname>
    provides a function <function>ts_headline</function> that
    implements this functionality.
   </para>

   <indexterm>
    <primary>ts_headline</primary>
   </indexterm>

<synopsis>
ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type>regconfig</type>, </optional> <replaceable class="parameter">document</replaceable> <type>text</type>, <replaceable class="parameter">query</replaceable> <type>tsquery</type> <optional>, <replaceable class="parameter">options</replaceable> <type>text</type> </optional>) returns <type>text</type>
</synopsis>

   <para>
    <function>ts_headline</function> accepts a document along
    with a query, and returns an excerpt from
    the document in which terms from the query are highlighted.
    Specifically, the function will use the query to select relevant
    text fragments, and then highlight all words that appear in the query,
    even if those word positions do not match the query's restrictions.  The
    configuration to be used to parse the document can be specified by
    <replaceable>config</replaceable>; if <replaceable>config</replaceable>
    is omitted, the
    <varname>default_text_search_config</varname> configuration is used.
   </para>

   <para>
    If an <replaceable>options</replaceable> string is specified it must
    consist of a comma-separated list of one or more
    <replaceable>option</replaceable><literal>=</literal><replaceable>value</replaceable> pairs.
    The available options are:

    <itemizedlist  spacing="compact" mark="bullet">
     <listitem>
      <para>
       <literal>MaxWords</literal>, <literal>MinWords</literal> (integers):
       these numbers determine the longest and shortest headlines to output.
       The default values are 35 and 15.
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>ShortWord</literal> (integer): words of this length or less
       will be dropped at the start and end of a headline,

Title: Ranking Costs, Highlighting Results with ts_headline
Summary
Ranking can be computationally expensive due to the I/O bound nature of consulting the tsvector of each matching document. To display results, PostgreSQL provides the ts_headline function, which accepts a document and a query, returning an excerpt of the document with highlighted query terms. The configuration used to parse the document can be specified; otherwise, the default_text_search_config is used. Options such as MaxWords, MinWords, and ShortWord control the headline's length and content.