Home Explore Blog CI



postgresql

18th chunk of `doc/src/sgml/textsearch.sgml`
b26a523c5c92a6731ad0fb97e7e75a34c3946f5628e7c8600000000100000fa1
 restrictions.  The
    configuration to be used to parse the document can be specified by
    <replaceable>config</replaceable>; if <replaceable>config</replaceable>
    is omitted, the
    <varname>default_text_search_config</varname> configuration is used.
   </para>

   <para>
    If an <replaceable>options</replaceable> string is specified it must
    consist of a comma-separated list of one or more
    <replaceable>option</replaceable><literal>=</literal><replaceable>value</replaceable> pairs.
    The available options are:

    <itemizedlist  spacing="compact" mark="bullet">
     <listitem>
      <para>
       <literal>MaxWords</literal>, <literal>MinWords</literal> (integers):
       these numbers determine the longest and shortest headlines to output.
       The default values are 35 and 15.
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>ShortWord</literal> (integer): words of this length or less
       will be dropped at the start and end of a headline, unless they are
       query terms.  The default value of three eliminates common English
       articles.
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>HighlightAll</literal> (boolean): if
       <literal>true</literal> the whole document will be used as the
       headline, ignoring the preceding three parameters.  The default
       is <literal>false</literal>.
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>MaxFragments</literal> (integer): maximum number of text
       fragments to display.  The default value of zero selects a
       non-fragment-based headline generation method.  A value greater
       than zero selects fragment-based headline generation (see below).
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>StartSel</literal>, <literal>StopSel</literal> (strings):
       the strings with which to delimit query words appearing in the
       document, to distinguish them from other excerpted words.  The
       default values are <quote><literal>&lt;b&gt;</literal></quote> and
       <quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
       for HTML output (but see the warning below).
      </para>
     </listitem>
     <listitem>
      <para>
       <literal>FragmentDelimiter</literal> (string): When more than one
       fragment is displayed, the fragments will be separated by this string.
       The default is <quote><literal> ... </literal></quote>.
      </para>
     </listitem>
    </itemizedlist>

    <warning>
     <title>Warning: Cross-site scripting (XSS) safety</title>
     <para>
      The output from <function>ts_headline</function> is not guaranteed to
      be safe for direct inclusion in web pages. When
      <literal>HighlightAll</literal> is <literal>false</literal> (the
      default), some simple XML tags are removed from the document, but this
      is not guaranteed to remove all HTML markup. Therefore, this does not
      provide an effective defense against attacks such as cross-site
      scripting (XSS) attacks, when working with untrusted input. To guard
      against such attacks, all HTML markup should be removed from the input
      document, or an HTML sanitizer should be used on the output.
     </para>
    </warning>

    These option names are recognized case-insensitively.
    You must double-quote string values if they contain spaces or commas.
   </para>

   <para>
    In non-fragment-based headline
    generation, <function>ts_headline</function> locates matches for the
    given <replaceable class="parameter">query</replaceable> and chooses a
    single one to display, preferring matches that have more query words
    within the allowed headline length.
    In fragment-based headline generation, <function>ts_headline</function>
    locates the query matches and splits each match
    into <quote>fragments</quote> of no more than <literal>MaxWords</literal>
    words each, preferring fragments

Title: ts_headline Options and Security Considerations
Summary
The ts_headline function offers several options to customize the headline output, including MaxWords, MinWords, ShortWord, HighlightAll, MaxFragments, StartSel, StopSel, and FragmentDelimiter. It is crucial to be aware that the output of ts_headline is not guaranteed to be safe for direct inclusion in web pages and may be vulnerable to cross-site scripting (XSS) attacks, especially when working with untrusted input. To mitigate these risks, all HTML markup should be removed from the input document, or an HTML sanitizer should be used on the output.