Introduction
When dealing with hierarchical data, we usually use recursion for constructing
the tree structure. It would be a performance problem if you query the tree node
data from the database every time the recursion happens. Also, if you work with
languages which are heavily based on callback like Javascript, you will find it
is very difficult to deal with the recursion because you have to determine when
the callback has finished execution. Luckily, PostgreSQL supports recursive
query using the WITH RECURSIVE
keyword. All the data structure can be returns
within a single query.
WITH queries in PostgreSQL
According to PostgreSQL document, WITH
queries provide a method for dealing
with large queries by creating temporary tables that exist only for that query.
This is a very simple example from Postgres document to illustrate the usage of
WITH
queries.
In the above example, the WITH
clause creates 2 temporary tables
regional_sales and top_region that exist only for the last SELECT
command.
WITH RECURSIVE queries in PostgreSQL
WITH
can be followed by RECURSIVE
modifier for the query to refer to its
own. This simple query will list all number from 1 to 100 using WITH RECURSIVE
Let me explain a bit about the example. WITH RECURSIVE t(n)
creates a
temporary tables named t with one column n. A WITH RECURSIVE
query
usually includes 2 parts: non-recursive term and recursive term. All the two
SELECT
query should have exactly the same column as the t table.
PostgreSQL will first evaluate the non-recursive term (including discarding all
duplicate rows if you’re using just UNION not UNION ALL), put the result into a
temporary table and then evaluate the recursive term (which refers to itself)
and then append the result into the previous temporary table until it reaches
the stop condition or continues forever if there is no stop condition like the
above example. We need to use LIMIT
in the final SELECT
command to filter
the result.
WITH RECURSIVE for querying tree
Recursion is extremely useful for dealing with hierarchical data structure. Consider the pedigree tree example, I have 2 tables. One table stores all the family members information and the other table contains all the relations in the pedigree (i.e. the parent-children relation). The PedigreeRelations tables references to People table.
Here is the query to create these two tables
Insert some sample data
Now we come to the most chalenging part, querying the descendants with a given id. I have modified the search_graph query from PostfreSQL doc a bit to fit my data in 2 tables. Here is how the query look like
Remember this query is to find all the descendants of a given id, so you may
need to update the r."parentId" = 1
in the 7th line to the id that you want to
find.
The result of this query is all its descendants (not include itself). The
result also contains a column (array type) holding the path from the given node
to the node in the current row and a column carries the depth of the current
node compare to the node we passed in before.
If you have a huge amount of data
and you just want to query all descendant nodes within a specific depth level,
uncomment -- AND depth < 2
in the 17th line and change 2 to whatever depth
level that you want.