Before you can do anything, you must initialize a database storage
area on disk. We call this a database cluster.
(SQL uses the term catalog cluster.) A
database cluster is a collection of databases that is managed by a
single instance of a running database server. After initialization, a
database cluster will contain a database named postgres
,
which is meant as a default database for use by utilities, users and third
party applications. The database server itself does not require the
postgres
database to exist, but many external utility
programs assume it exists. Another database created within each cluster
during initialization is called
template1
. As the name suggests, this will be used
as a template for subsequently created databases; it should not be
used for actual work. (See Chapter 21, Managing Databases for
information about creating new databases within a cluster.)
In file system terms, a database cluster will be a single directory
under which all data will be stored. We call this the data
directory or data area. It is
completely up to you where you choose to store your data. There is no
default, although locations such as
/usr/local/pgsql/data
or
/var/lib/pgsql/data
are popular. To initialize a
database cluster, use the command initdb(1), which is
installed with PostgreSQL™. The desired
file system location of your database cluster is indicated by the
-D
option, for example:
$
initdb -D /usr/local/pgsql/data
Note that you must execute this command while logged into the PostgreSQL™ user account, which is described in the previous section.
Alternatively, you can run initdb via the pg_ctl(1) program like so:
$
pg_ctl -D /usr/local/pgsql/data initdb
This may be more intuitive if you are using pg_ctl for starting and stopping the server (see the section called “Starting the Database Server”), so that pg_ctl would be the sole command you use for managing the database server instance.
initdb will attempt to create the directory you specify if it does not already exist. It is likely that it will not have the permission to do so (if you followed our advice and created an unprivileged account). In that case you should create the directory yourself (as root) and change the owner to be the PostgreSQL™ user. Here is how this might be done:
root#mkdir /usr/local/pgsql/data
root#chown postgres /usr/local/pgsql/data
root#su postgres
postgres$initdb -D /usr/local/pgsql/data
initdb will refuse to run if the data directory looks like it has already been initialized.
Because the data directory contains all the data stored in the database, it is essential that it be secured from unauthorized access. initdb therefore revokes access permissions from everyone but the PostgreSQL™ user.
However, while the directory contents are secure, the default
client authentication setup allows any local user to connect to the
database and even become the database superuser. If you do not
trust other local users, we recommend you use one of
initdb's -W
, --pwprompt
or --pwfile
options to assign a password to the
database superuser.
Also, specify -A md5
or
-A password
so that the default trust
authentication
mode is not used; or modify the generated pg_hba.conf
file after running initdb, but
before you start the server for the first time. (Other
reasonable approaches include using peer
authentication
or file system permissions to restrict connections. See Chapter 19, Client Authentication for more information.)
initdb also initializes the default
locale for the database cluster.
Normally, it will just take the locale settings in the environment
and apply them to the initialized database. It is possible to
specify a different locale for the database; more information about
that can be found in the section called “Locale Support”. The default sort order used
within the particular database cluster is set by
initdb, and while you can create new databases using
different sort order, the order used in the template databases that initdb
creates cannot be changed without dropping and recreating them.
There is also a performance impact for using locales
other than C
or POSIX
. Therefore, it is
important to make this choice correctly the first time.
initdb also sets the default character set encoding for the database cluster. Normally this should be chosen to match the locale setting. For details see the section called “Character Set Support”.
Many installations create database clusters on network file systems. Sometimes this is done directly via NFS, or by using a Network Attached Storage (NAS) device that uses NFS internally. PostgreSQL™ does nothing special for NFS file systems, meaning it assumes NFS behaves exactly like locally-connected drives (DAS, Direct Attached Storage). If client and server NFS implementations have non-standard semantics, this can cause reliability problems (see http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). Specifically, delayed (asynchronous) writes to the NFS server can cause reliability problems; if possible, mount NFS file systems synchronously (without caching) to avoid this. Also, soft-mounting NFS is not recommended. (Storage Area Networks (SAN) use a low-level communication protocol rather than NFS.)