SQL Syntax and Lexical Structure

<chapter id="sql-syntax"> <title>SQL Syntax</title> <indexterm zone="sql-syntax"> <primary>syntax</primary> <secondary>SQL</secondary> </indexterm> <para> This chapter describes the syntax of SQL. It forms the foundation for understanding the following chapters which will go into detail about how SQL commands are applied to define and modify data. </para> <para> We also advise users who are already familiar with SQL to read this chapter carefully because it contains several rules and concepts that are implemented inconsistently among SQL databases or that are specific to <productname>PostgreSQL</productname>. </para> <sect1 id="sql-syntax-lexical"> <title>Lexical Structure</title> <indexterm> <primary>token</primary> </indexterm> <para> SQL input consists of a sequence of <firstterm>commands</firstterm>. A command is composed of a sequence of <firstterm>tokens</firstterm>, terminated by a semicolon (<quote>;</quote>). The end of the input stream also terminates a command. Which tokens are valid depends on the syntax of the particular command. </para> <para> A token can be a <firstterm>key word</firstterm>, an <firstterm>identifier</firstterm>, a <firstterm>quoted identifier</firstterm>, a <firstterm>literal</firstterm> (or constant), or a special character symbol. Tokens are normally separated by whitespace (space, tab, newline), but need not be if there is no ambiguity (which is generally only the case if a special character is adjacent to some other token type). </para> <para> For example, the following is (syntactically) valid SQL input: <programlisting> SELECT * FROM MY_TABLE; UPDATE MY_TABLE SET A = 5; INSERT INTO MY_TABLE VALUES (3, 'hi there'); </programlisting> This is a sequence of three commands, one per line (although this is not required; more than one command can be on a line, and commands can usefully be split across lines). </para> <para> Additionally, <firstterm>comments</firstterm> can occur in SQL input. They are not tokens, they are effectively equivalent to whitespace. </para> <para> The SQL syntax is not very consistent regarding what tokens identify commands and which are operands or parameters. The first few tokens are generally the command name, so in the above example we would usually speak of a <quote>SELECT</quote>, an <quote>UPDATE</quote>, and an <quote>INSERT</quote> command. But for instance the <command>UPDATE</command> command always requires a <token>SET</token> token to appear in a certain position, and this particular variation of <command>INSERT</command> also requires a <token>VALUES</token> in order to be complete. The precise syntax rules for each command are described in <xref linkend="reference"/>. </para> <sect2 id="sql-syntax-identifiers"> <title>Identifiers and Key Words</title> <indexterm zone="sql-syntax-identifiers"> <primary>identifier</primary> <secondary>syntax of</secondary> </indexterm> <indexterm zone="sql-syntax-identifiers"> <primary>name</primary> <secondary>syntax of</secondary> </indexterm> <indexterm zone="sql-syntax-identifiers"> <primary>key word</primary> <secondary>syntax of</secondary> </indexterm> <para> Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or <token>VALUES</token> in the example above are examples of <firstterm>key words</firstterm>, that is, words that have a fixed meaning in the SQL language. The tokens <token>MY_TABLE</token> and <token>A</token> are examples of <firstterm>identifiers</firstterm>. They identify names of tables, columns, or other database objects, depending on the command they are used in. Therefore they are sometimes simply called <quote>names</quote>. Key words and identifiers have the same lexical structure, meaning that

This chapter introduces the syntax of SQL in PostgreSQL, focusing on the lexical structure. SQL input consists of commands, which are sequences of tokens terminated by a semicolon. Tokens include keywords, identifiers, quoted identifiers, literals, and special characters, separated by whitespace. Comments are treated as whitespace. The chapter also discusses the inconsistent syntax regarding command identification and the role of tokens like SET and VALUES. It then explains identifiers and keywords, which have the same lexical structure.