Home Explore Blog CI



postgresql

55th chunk of `doc/src/sgml/datatype.sgml`
810583e362b53e0cc944cb2e26a5c44a99583958804295250000000100000fa1
 function
    <function>xmlparse</function>:<indexterm><primary>xmlparse</primary></indexterm>
<synopsis>
XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
</synopsis>
    Examples:
<programlisting><![CDATA[
XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>')
XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')
]]></programlisting>
    While this is the only way to convert character strings into XML
    values according to the SQL standard, the PostgreSQL-specific
    syntaxes:
<programlisting><![CDATA[
xml '<foo>bar</foo>'
'<foo>bar</foo>'::xml
]]></programlisting>
    can also be used.
   </para>

   <para>
    The <type>xml</type> type does not validate input values
    against a document type declaration
    (DTD),<indexterm><primary>DTD</primary></indexterm>
    even when the input value specifies a DTD.
    There is also currently no built-in support for validating against
    other XML schema languages such as XML Schema.
   </para>

   <para>
    The inverse operation, producing a character string value from
    <type>xml</type>, uses the function
    <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [ NO ] INDENT ] )
</synopsis>
    <replaceable>type</replaceable> can be
    <type>character</type>, <type>character varying</type>, or
    <type>text</type> (or an alias for one of those).  Again, according
    to the SQL standard, this is the only way to convert between type
    <type>xml</type> and character types, but PostgreSQL also allows
    you to simply cast the value.
   </para>

   <para>
    The <literal>INDENT</literal> option causes the result to be
    pretty-printed, while <literal>NO INDENT</literal> (which is the
    default) just emits the original input string.  Casting to a character
    type likewise produces the original string.
   </para>

   <para>
    When a character string value is cast to or from type
    <type>xml</type> without going through <type>XMLPARSE</type> or
    <type>XMLSERIALIZE</type>, respectively, the choice of
    <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
    determined by the <quote>XML option</quote>
    <indexterm><primary>XML option</primary></indexterm>
    session configuration parameter, which can be set using the
    standard command:
<synopsis>
SET XML OPTION { DOCUMENT | CONTENT };
</synopsis>
    or the more PostgreSQL-like syntax
<synopsis>
SET xmloption TO { DOCUMENT | CONTENT };
</synopsis>
    The default is <literal>CONTENT</literal>, so all forms of XML
    data are allowed.
   </para>

   </sect2>

   <sect2 id="datatype-xml-encoding-handling">
    <title>Encoding Handling</title>
   <para>
    Care must be taken when dealing with multiple character encodings
    on the client, server, and in the XML data passed through them.
    When using the text mode to pass queries to the server and query
    results to the client (which is the normal mode), PostgreSQL
    converts all character data passed between the client and the
    server and vice versa to the character encoding of the respective
    end; see <xref linkend="multibyte"/>.  This includes string
    representations of XML values, such as in the above examples.
    This would ordinarily mean that encoding declarations contained in
    XML data can become invalid as the character data is converted
    to other encodings while traveling between client and server,
    because the embedded encoding declaration is not changed.  To cope
    with this behavior, encoding declarations contained in
    character strings presented for input to the <type>xml</type> type
    are <emphasis>ignored</emphasis>, and content is assumed
    to be in the current server encoding.  Consequently, for correct
    processing, character strings of XML data must be sent
    from the client

Title: XML Functions and Encoding Handling
Summary
The xmlparse function is used to convert character strings into XML values, while the xmlserialize function is used to convert XML values back into character strings. Additionally, PostgreSQL provides options for handling character encoding when working with XML data, including ignoring encoding declarations in input strings and assuming content is in the current server encoding.