Just a collection of tips to make working with Postgres (and other SQL-like databases) easier
Usually when you build a system that integrates with other 3rd party service, you will need to store integration information related to the entity, for example the id of the entity on the 3rd party system or some of its configuration on that system. Imagine that you are building an e-commerce related product, you may want to sync the Sales order information from Shopify to do the analytics on customer behavior. The first solution you can think of is to add a column like shopify_entity_id on that table.
- What will happens if you introduce another integration later? Does the name shopify_entity_id still make sense? You may consider renaming it to external_entity_id. How do you know where it comes from? Adding another source column? How do you store extra 3rd party information about the sales order? Keep adding columns like external_something? Do those columns actually belong to the sales_order table itself?
- What will happen if an single entity exists on multiple 3rd party system? For instance, the sales
order may be presented on both Shopify and on another Shipping service. How would you deal with
it? Keep adding more columns? What if we introduct another integration?
- A Json (Jsonb) column could solve the above issue but also creates a whole new problem. How
about schema enforcement and constraint? How do we make sure that nobody will accidentally
update it the an incorrect schema? How about
- A Json (Jsonb) column could solve the above issue but also creates a whole new problem. How about schema enforcement and constraint? How do we make sure that nobody will accidentally update it the an incorrect schema? How about
The solution, of course, is a SQL approach: make an entity integration table (sales_order_integration in this case). It’s a 1-N relationship, 1 sales order could have 0 or multiple integrations
|1||Ho Chi Minh city||10||20|
You may add another integration table (and sub-tables if needed) as a generalization to store other information like the access/refresh token, for example
and then the sales_order_integration table would become
Here are some query scenarios
- Find sales_order by id: of course, you always have index on primary key
- Find external ids by sales_order_id: simply add an index on sales_order_id column of the
sales_order_integration table and you’are good to go.
- You could use this when you want to refecth new data of the entity from a specific 3rd party system
- You could also use this when you want to write updated data to the other systems
- Find sales_order_id by external_entity_id: you can add an index on external_entity_id
column. Another option is a unique constraint on integration_id, external_entity_id (which makes
more sense since each entity in your system should correspond to only one entity in the other
system). A unique constraint also acts as an index. A simple JOIN like this will utilize both the
unique index and the primary key index on 2 tables
SELECT * FROM sales_order s INNER JOIN sales_order_integration i ON s.id = i.sales_order_id WHERE i.integration_id = $1 AND i.external_entity_id = $2
- This is usually seen in webhook handler, where you receive the update about each individual sales order. You can use this method to quickly look up the sales order using its external id.