Weighting tsvector Entries and Parsing Queries

linkend="textsearch-configuration"/>). It is possible to have many different configurations in the same database, and predefined configurations are available for various languages. In our example we used the default configuration <literal>english</literal> for the English language. </para> <para> The function <function>setweight</function> can be used to label the entries of a <type>tsvector</type> with a given <firstterm>weight</firstterm>, where a weight is one of the letters <literal>A</literal>, <literal>B</literal>, <literal>C</literal>, or <literal>D</literal>. This is typically used to mark entries coming from different parts of a document, such as title versus body. Later, this information can be used for ranking of search results. </para> <para> Because <function>to_tsvector</function>(<literal>NULL</literal>) will return <literal>NULL</literal>, it is recommended to use <function>coalesce</function> whenever a field might be null. Here is the recommended method for creating a <type>tsvector</type> from a structured document: <programlisting> UPDATE tt SET ti = setweight(to_tsvector(coalesce(title,'')), 'A') || setweight(to_tsvector(coalesce(keyword,'')), 'B') || setweight(to_tsvector(coalesce(abstract,'')), 'C') || setweight(to_tsvector(coalesce(body,'')), 'D'); </programlisting> Here we have used <function>setweight</function> to label the source of each lexeme in the finished <type>tsvector</type>, and then merged the labeled <type>tsvector</type> values using the <type>tsvector</type> concatenation operator <literal>||</literal>. (<xref linkend="textsearch-manipulate-tsvector"/> gives details about these operations.) </para> </sect2> <sect2 id="textsearch-parsing-queries"> <title>Parsing Queries</title> <para> <productname>PostgreSQL</productname> provides the functions <function>to_tsquery</function>, <function>plainto_tsquery</function>, <function>phraseto_tsquery</function> and <function>websearch_to_tsquery</function> for converting a query to the <type>tsquery</type> data type. <function>to_tsquery</function> offers access to more features than either <function>plainto_tsquery</function> or <function>phraseto_tsquery</function>, but it is less forgiving about its input. <function>websearch_to_tsquery</function> is a simplified version of <function>to_tsquery</function> with an alternative syntax, similar to the one used by web search engines. </para> <indexterm> <primary>to_tsquery</primary> </indexterm> <synopsis> to_tsquery(<optional> <replaceable class="parameter">config</replaceable> <type>regconfig</type>, </optional> <replaceable class="parameter">querytext</replaceable> <type>text</type>) returns <type>tsquery</type> </synopsis> <para> <function>to_tsquery</function> creates a <type>tsquery</type> value from <replaceable>querytext</replaceable>, which must consist of single tokens separated by the <type>tsquery</type> operators <literal>&</literal> (AND), <literal>|</literal> (OR), <literal>!</literal> (NOT), and <literal><-></literal> (FOLLOWED BY), possibly grouped using parentheses. In other words, the input to <function>to_tsquery</function> must already follow the general rules for <type>tsquery</type> input, as described in <xref linkend="datatype-tsquery"/>. The difference is that while basic <type>tsquery</type> input takes the tokens at face value, <function>to_tsquery</function> normalizes each token into a lexeme using the specified or default configuration, and discards any tokens that are stop words according to the configuration. For example: <screen> SELECT to_tsquery('english', 'The & Fat & Rats'); to_tsquery --------------- 'fat' & 'rat' </screen> As in basic <type>tsquery</type> input, weight(s)

The `setweight` function can label tsvector entries with weights (A, B, C, D) for ranking search results. It's recommended to use `coalesce` to handle potentially NULL fields when creating a tsvector from structured documents. PostgreSQL offers functions like `to_tsquery`, `plainto_tsquery`, `phraseto_tsquery`, and `websearch_to_tsquery` for converting queries to the `tsquery` data type. `to_tsquery` normalizes tokens into lexemes using a specified configuration, discarding stop words, and supports operators like AND, OR, NOT, and FOLLOWED BY.