Home Explore Blog CI



postgresql

20th chunk of `doc/src/sgml/textsearch.sgml`
b9bafdaef85a3ce421d2c8fb1941acc4f23e7b7507091b750000000100000fad
 <sect2 id="textsearch-manipulate-tsvector">
   <title>Manipulating Documents</title>

   <para>
    <xref linkend="textsearch-parsing-documents"/> showed how raw textual
    documents can be converted into <type>tsvector</type> values.
    <productname>PostgreSQL</productname> also provides functions and
    operators that can be used to manipulate documents that are already
    in <type>tsvector</type> form.
   </para>

   <variablelist>

    <varlistentry>

     <term>
     <indexterm>
      <primary>tsvector concatenation</primary>
     </indexterm>

      <literal><type>tsvector</type> || <type>tsvector</type></literal>
     </term>

     <listitem>
      <para>
       The <type>tsvector</type> concatenation operator
       returns a vector which combines the lexemes and positional information
       of the two vectors given as arguments.  Positions and weight labels
       are retained during the concatenation.
       Positions appearing in the right-hand vector are offset by the largest
       position mentioned in the left-hand vector, so that the result is
       nearly equivalent to the result of performing <function>to_tsvector</function>
       on the concatenation of the two original document strings.  (The
       equivalence is not exact, because any stop-words removed from the
       end of the left-hand argument will not affect the result, whereas
       they would have affected the positions of the lexemes in the
       right-hand argument if textual concatenation were used.)
      </para>

      <para>
       One advantage of using concatenation in the vector form, rather than
       concatenating text before applying <function>to_tsvector</function>, is that
       you can use different configurations to parse different sections
       of the document.  Also, because the <function>setweight</function> function
       marks all lexemes of the given vector the same way, it is necessary
       to parse the text and do <function>setweight</function> before concatenating
       if you want to label different parts of the document with different
       weights.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>

     <term>
     <indexterm>
      <primary>setweight</primary>
     </indexterm>

      <literal>setweight(<replaceable class="parameter">vector</replaceable> <type>tsvector</type>, <replaceable class="parameter">weight</replaceable> <type>"char"</type>) returns <type>tsvector</type></literal>
     </term>

     <listitem>
      <para>
       <function>setweight</function> returns a copy of the input vector in which every
       position has been labeled with the given <replaceable>weight</replaceable>, either
       <literal>A</literal>, <literal>B</literal>, <literal>C</literal>, or
       <literal>D</literal>.  (<literal>D</literal> is the default for new
       vectors and as such is not displayed on output.)  These labels are
       retained when vectors are concatenated, allowing words from different
       parts of a document to be weighted differently by ranking functions.
      </para>

      <para>
       Note that weight labels apply to <emphasis>positions</emphasis>, not
       <emphasis>lexemes</emphasis>.  If the input vector has been stripped of
       positions then <function>setweight</function> does nothing.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>
     <term>
     <indexterm>
      <primary>length(tsvector)</primary>
     </indexterm>

      <literal>length(<replaceable class="parameter">vector</replaceable> <type>tsvector</type>) returns <type>integer</type></literal>
     </term>

     <listitem>
      <para>
       Returns the number of lexemes stored in the vector.
      </para>
     </listitem>
    </varlistentry>

    <varlistentry>

     <term>
     <indexterm>
      <primary>strip</primary>
     </indexterm>

      <literal>strip(<replaceable class="parameter">vector</replaceable> <type>tsvector</type>) returns <type>tsvector</type></literal>

Title: Manipulating tsvector Documents: Concatenation, Weighting, and Length
Summary
PostgreSQL provides functions and operators to manipulate tsvector documents. The concatenation operator (||) combines lexemes and positional information of two tsvectors, offsetting positions in the right-hand vector. Using vector concatenation allows parsing different document sections with different configurations. The setweight function labels each position in a tsvector with a weight (A, B, C, or D) that can be used by ranking functions. The length function returns the number of lexemes in a tsvector.