ts_headline Generation and Additional Text Search Features

provide an effective defense against attacks such as cross-site scripting (XSS) attacks, when working with untrusted input. To guard against such attacks, all HTML markup should be removed from the input document, or an HTML sanitizer should be used on the output. </para> </warning> These option names are recognized case-insensitively. You must double-quote string values if they contain spaces or commas. </para> <para> In non-fragment-based headline generation, <function>ts_headline</function> locates matches for the given <replaceable class="parameter">query</replaceable> and chooses a single one to display, preferring matches that have more query words within the allowed headline length. In fragment-based headline generation, <function>ts_headline</function> locates the query matches and splits each match into <quote>fragments</quote> of no more than <literal>MaxWords</literal> words each, preferring fragments with more query words, and when possible <quote>stretching</quote> fragments to include surrounding words. The fragment-based mode is thus more useful when the query matches span large sections of the document, or when it's desirable to display multiple matches. In either mode, if no query matches can be identified, then a single fragment of the first <literal>MinWords</literal> words in the document will be displayed. </para> <para> For example: <screen> SELECT ts_headline('english', 'The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query.', to_tsquery('english', 'query & similarity')); ts_headline ------------------------------------------------------------ containing given query terms + and return them in order of their similarity to the+ query. SELECT ts_headline('english', 'Search terms may occur many times in a document, requiring ranking of the search matches to decide which occurrences to display in the result.', to_tsquery('english', 'search & term'), 'MaxFragments=10, MaxWords=7, MinWords=3, StartSel=<<, StopSel=>>'); ts_headline ------------------------------------------------------------ <<Search>> <<terms>> may occur + many times ... ranking of the <<search>> matches to decide </screen> </para> <para> <function>ts_headline</function> uses the original document, not a <type>tsvector</type> summary, so it can be slow and should be used with care. </para> </sect2> </sect1> <sect1 id="textsearch-features"> <title>Additional Features</title> <para> This section describes additional functions and operators that are useful in connection with text search. </para> <sect2 id="textsearch-manipulate-tsvector"> <title>Manipulating Documents</title> <para> <xref linkend="textsearch-parsing-documents"/> showed how raw textual documents can be converted into <type>tsvector</type> values. <productname>PostgreSQL</productname> also provides functions and operators that can be used to manipulate documents that are already in <type>tsvector</type> form. </para> <variablelist> <varlistentry> <term> <indexterm> <primary>tsvector concatenation</primary> </indexterm> <literal><type>tsvector</type> || <type>tsvector</type></literal> </term> <listitem> <para> The <type>tsvector</type> concatenation operator returns a vector which combines the lexemes and positional information of the two vectors given as arguments. Positions and weight labels are retained during the concatenation. Positions appearing in the right-hand vector are offset by the

The ts_headline function generates headlines using either non-fragment-based or fragment-based approaches, preferring matches with more query words. It also includes example usages of the ts_headline function. Additionally, PostgreSQL offers functions and operators for manipulating tsvector documents, including tsvector concatenation, which combines lexemes and positional information of two vectors.