Text Search Components and Table Searching in PostgreSQL

objects. <productname>PostgreSQL</productname>'s text search facility provides four types of configuration-related database objects: </para> <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> <firstterm>Text search parsers</firstterm> break documents into tokens and classify each token (for example, as words or numbers). </para> </listitem> <listitem> <para> <firstterm>Text search dictionaries</firstterm> convert tokens to normalized form and reject stop words. </para> </listitem> <listitem> <para> <firstterm>Text search templates</firstterm> provide the functions underlying dictionaries. (A dictionary simply specifies a template and a set of parameters for the template.) </para> </listitem> <listitem> <para> <firstterm>Text search configurations</firstterm> select a parser and a set of dictionaries to use to normalize the tokens produced by the parser. </para> </listitem> </itemizedlist> <para> Text search parsers and templates are built from low-level C functions; therefore it requires C programming ability to develop new ones, and superuser privileges to install one into a database. (There are examples of add-on parsers and templates in the <filename>contrib/</filename> area of the <productname>PostgreSQL</productname> distribution.) Since dictionaries and configurations just parameterize and connect together some underlying parsers and templates, no special privilege is needed to create a new dictionary or configuration. Examples of creating custom dictionaries and configurations appear later in this chapter. </para> </sect2> </sect1> <sect1 id="textsearch-tables"> <title>Tables and Indexes</title> <para> The examples in the previous section illustrated full text matching using simple constant strings. This section shows how to search table data, optionally using indexes. </para> <sect2 id="textsearch-tables-search"> <title>Searching a Table</title> <para> It is possible to do a full text search without an index. A simple query to print the <structname>title</structname> of each row that contains the word <literal>friend</literal> in its <structfield>body</structfield> field is: <programlisting> SELECT title FROM pgweb WHERE to_tsvector('english', body) @@ to_tsquery('english', 'friend'); </programlisting> This will also find related words such as <literal>friends</literal> and <literal>friendly</literal>, since all these are reduced to the same normalized lexeme. </para> <para> The query above specifies that the <literal>english</literal> configuration is to be used to parse and normalize the strings. Alternatively we could omit the configuration parameters: <programlisting> SELECT title FROM pgweb WHERE to_tsvector(body) @@ to_tsquery('friend'); </programlisting> This query will use the configuration set by <xref linkend="guc-default-text-search-config"/>. </para> <para> A more complex example is to select the ten most recent documents that contain <literal>create</literal> and <literal>table</literal> in the <structname>title</structname> or <structname>body</structname>: <programlisting> SELECT title FROM pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table') ORDER BY last_mod_date DESC LIMIT 10; </programlisting> For clarity we omitted the <function>coalesce</function> function calls which would be needed to find rows that contain <literal>NULL</literal> in one of the two fields. </para> <para> Although these queries will work without an index, most applications will find this approach too slow, except perhaps for occasional ad-hoc searches. Practical use of text searching usually requires creating an index. </para> </sect2> <sect2 id="textsearch-tables-index">

PostgreSQL's text search uses parsers, dictionaries, templates, and configurations. Parsers and templates require C programming and superuser privileges to create. Dictionaries and configurations can be created without special privileges. Searching tables involves using to_tsvector and to_tsquery functions with or without specifying a configuration. Indexing is necessary for practical use.