Filenodes, Free Space Maps, Visibility Maps, and Table Segmentation

<structname>pg_class</structname>.<structfield>relfilenode</structfield>. But for temporary relations, the file name is of the form <literal>t<replaceable>BBB</replaceable>_<replaceable>FFF</replaceable></literal>, where <replaceable>BBB</replaceable> is the process number of the backend which created the file, and <replaceable>FFF</replaceable> is the filenode number. In either case, in addition to the main file (a/k/a main fork), each table and index has a <firstterm>free space map</firstterm> (see <xref linkend="storage-fsm"/>), which stores information about free space available in the relation. The free space map is stored in a file named with the filenode number plus the suffix <literal>_fsm</literal>. Tables also have a <firstterm>visibility map</firstterm>, stored in a fork with the suffix <literal>_vm</literal>, to track which pages are known to have no dead tuples. The visibility map is described further in <xref linkend="storage-vm"/>. Unlogged tables and indexes have a third fork, known as the initialization fork, which is stored in a fork with the suffix <literal>_init</literal> (see <xref linkend="storage-init"/>). </para> <caution> <para> Note that while a table's filenode often matches its OID, this is <emphasis>not</emphasis> necessarily the case; some operations, like <command>TRUNCATE</command>, <command>REINDEX</command>, <command>CLUSTER</command> and some forms of <command>ALTER TABLE</command>, can change the filenode while preserving the OID. Avoid assuming that filenode and table OID are the same. Also, for certain system catalogs including <structname>pg_class</structname> itself, <structname>pg_class</structname>.<structfield>relfilenode</structfield> contains zero. The actual filenode number of these catalogs is stored in a lower-level data structure, and can be obtained using the <function>pg_relation_filenode()</function> function. </para> </caution> <para> When a table or index exceeds 1 GB, it is divided into gigabyte-sized <firstterm>segments</firstterm>. The first segment's file name is the same as the filenode; subsequent segments are named filenode.1, filenode.2, etc. This arrangement avoids problems on platforms that have file size limitations. (Actually, 1 GB is just the default segment size. The segment size can be adjusted using the configuration option <option>--with-segsize</option> when building <productname>PostgreSQL</productname>.) In principle, free space map and visibility map forks could require multiple segments as well, though this is unlikely to happen in practice. </para> <para> A table that has columns with potentially large entries will have an associated <firstterm>TOAST</firstterm> table, which is used for out-of-line storage of field values that are too large to keep in the table rows proper. <structname>pg_class</structname>.<structfield>reltoastrelid</structfield> links from a table to its <acronym>TOAST</acronym> table, if any. See <xref linkend="storage-toast"/> for more information. </para> <para> The contents of tables and indexes are discussed further in <xref linkend="storage-page-layout"/>. </para> <para> Tablespaces make the scenario more complicated. Each user-defined tablespace has a symbolic link inside the <varname>PGDATA</varname><filename>/pg_tblspc</filename> directory, which points to the physical tablespace directory (i.e., the location specified in the tablespace's <command>CREATE TABLESPACE</command> command). This symbolic link is named after the tablespace's OID. Inside the physical tablespace directory there is a subdirectory with a name that depends on the <productname>PostgreSQL</productname> server version, such as <literal>PG_9.0_201008051</literal>. (The reason for using this subdirectory is so that successive versions of the database can use the same <command>CREATE TABLESPACE</command> location value without conflicts.) Within the version-specific subdirectory, there is a subdirectory for each database that has elements in

This section delves into the details of how tables and indexes are stored, including the use of filenodes, free space maps (FSM), visibility maps (VM), and initialization forks. It cautions that filenodes and OIDs are not always the same and explains how tables are segmented into gigabyte-sized chunks. It also introduces the concept of TOAST tables for out-of-line storage of large field values. Tablespaces add complexity, with symbolic links in pg_tblspc pointing to the physical tablespace directory.