This post is the second part of the first post here. This post focuses on how to utilize RethinkDB Secondary Index in different use cases efficiently.
Some rules when using RethinkDB Indexes
RethinkDB Indexes, similar to Indexes in other database, are the trade-off between read and write performance. Therefore, the basic rules for RethinkDB Indexes are similar to other database.
- Don’t create index if not necessary.
- Don’t create index if the data set is small enough so that you can use the
filter
query. - Indexes require memory to process, be careful with tables that have a lot of indexes.
- Indexes can slow down write operations significantly.
RethinkDB Simple Secondary Index
Continue from last post, sometimes, what you need is not simply querying by primary key. More often,
you may want to get all the records belong to one specific category.
In case the number of records is too big for one simple filter
command, you will need to create
one secondary index for faster querying.
For example, you are organizing all the products in your warehouse into several categories. Each
category is stored as one record in the categories
table as below
Each product is stored as one record in the products
table with one prop categoryId
for
referencing back to the category
table.
If you want to query all products belong to one category, the simplest solution is to create a
simple secondary index on the field categoryId
of the product table
To query all the products by category, simply use RethinkDB getAll
command
Joining 2 tables with the help of secondary indexes is also very fast. In case you want to get all the products belong to one category with the category name, simply do
Or if you want to do the reverse way, find the category by id and all its related products
RethinkDB Compound Index
As your application grows bigger, your requirements may be extended to filtering products based on
more properties, not just categoryId
. That is where a compound
index would be useful. If you design it carefully, you can even use one compound index to serve many
different kinds of queries.
Back to the above example, if we change the requirements to find all the products belong to one category and has the rating is 5, the index would look like this
and the getAll
query will receive an array param like this
A compound index is also helpful when you want to do a range query, too. If you want to find all the
products belong to one category with rating from 1 to 5, use between
in combination with that
index
And hey, you can also take advantage of the new index to eliminate the previous one categoryId
. If
you want to search for all the products belong to one category, simply use between
with all the
possible rating
values
This would be a big win when you only to one compound index to serve multiple purposes.
Extending the Compound Index
To extend the above example, if your application requires querying all the products belong to one category, but also care about the rating and status of those products, the index will look like this
The above index be used for several use cases, from exact value to range querying.
- Get all products belong to one category
- Get all products belong to one category, which have specific rating score
- Get all products by 3 props
You can even order the results retrieved from that index
- Get all products belong to one category, order by rating first and then status
- Get all products belong to one category, with a specific rating score, order by the status
A Note on Compound Index
One important thing to notice for RethinkDB Compound Index is the order of the fields.
You may not need to care about the order when you just need to query by the exact values of the
compound index.
For example, you can change the above index to whatever order you want (rating_status_categoryId
,
status_categoryId_rating
) and these queries will always produce the same result with the same
speed
However, when it comes to range querying, it’s another story. If you take a closer look at the above queries, you will notice that the left-most values are always fixed/determined values while the right-most values can be vary. If you want to do a range query with the vary left-most values, it is similar to a whole table sequence scan.
For example, to get all the products belong to one category, this will work efficiently
This one also works but it’s similar to a sequence scan and is not efficient