Just a collection of tips to make working with Postgres (and other SQL-like databases) easier

Integration data

Usually when you build a system that integrates with other 3rd party service, you will need to store integration information related to the entity, for example the id of the entity on the 3rd party system or some of its configuration on that system. Imagine that you are building an e-commerce related product, you may want to sync the Sales order information from Shopify to do the analytics on customer behavior. The first solution you can think of is to add a column like shopify_entity_id on that table.

  • What will happens if you introduce another integration later? Does the name shopify_entity_id still make sense? You may consider renaming it to external_entity_id. How do you know where it comes from? Adding another source column? How do you store extra 3rd party information about the sales order? Keep adding columns like external_something? Do those columns actually belong to the sales_order table itself?
  • What will happen if an single entity exists on multiple 3rd party system? For instance, the sales order may be presented on both Shopify and on another Shipping service. How would you deal with it? Keep adding more columns? What if we introduct another integration?
    • A Json (Jsonb) column could solve the above issue but also creates a whole new problem. How about schema enforcement and constraint? How do we make sure that nobody will accidentally update it the an incorrect schema? How about null and undefined values (in case you are working with Javascript)? How about indexing the values for quick access? You can index inside the json but it just makes things more complicated due to those schema problems mentioned above.

The solution, of course, is a SQL approach: make an entity integration table (sales_order_integration in this case). It’s a 1-N relationship, 1 sales order could have 0 or multiple integrations

sales_order table

id shipping_address price weightMg
1 Ho Chi Minh city 10 20
2 Hanoi 20 30

sales_order_integration table

id sales_order_id external_entity_id source
1 1 external-id1 SHOPIFY
2 1 external-id2 WOOCOMMERCE
3 2 external-id3 SHOPIFY
Read more

Just a collection of tips to make working with Postgres (and other SQL-like databases) easier

Measurement Unit

Take this case for an example, you have a product table and you want to store product information like its size and weight by adding these columns to describe such properties: width, length, height and weight.

id name width length height weight
1 ipad 10 20 0.1 0.5
2 macbook 20 30 0.5 2

So what’s the problem with the above table? We are assuming that the size props (width, height and length) are measured in cm and the weight is measured in kg. Usually, we could put this logic in application layer to make sure we convert everything to cm and kg before inserting into the database. However, it could lead to even more problems

  • What will happen if a new dev join the team? How can you make sure that person will know when to convert and when not?
  • What will happen if a dev using pound join the team?
  • What will happen if a dev accidentally assume the value in weight is in mg?
  • Sometimes, you could do a double conversion, making thing worse.
  • You need to remember adding comment to every place in your code, just to remind people which measurement unit that function is using.
  • Which data type to choose? Integer, of course, is not a good choice. However, working with real, double, decimal or numeric is always harder compare to int. They could cause some problems with parsing and datatype for languages/libraries like Nodejs.
Read more

Leetcode: Squares of Sorted Array

Given an integer array nums sorted in non-decreasing order, return an array of the squares of each number sorted in non-decreasing order.

Example 1

Input: nums = [-4,-1,0,3,10]
Output: [0,1,9,16,100]
Explanation: After squaring, the array becomes [16,1,0,9,100].
After sorting, it becomes [0,1,9,16,100].

Example 2

Input: nums = [-7,-3,2,3,11]
Output: [4,9,9,49,121]

Constraints

1 <= nums.length <= 104
-104 <= nums[i] <= 104
nums is sorted in non-decreasing order.
Read more

Leetcode: Power of Two

Given an integer n, return true if it is a power of two. Otherwise, return false.

An integer n is a power of two, if there exists an integer x such that n == 2^x.

Example 1:

Input: n = 1
Output: true
Explanation: 20 = 1

Example 2:

Input: n = 16
Output: true
Explanation: 24 = 16

Example 3:

Input: n = 3
Output: false

Constraints:

-231 <= n <= 231 - 1

Follow up: Could you solve it without loops/recursion?

Read more

An example about configuring PubSub BigQuery Subscription with Pulumi

BigQuery Subscription

It’s hard to view the content of the messages that were published to a topic because the application has already processed and acknowledged them before you can do anything. Usually, you have to create another test subscription for the messages to be replicated to and then pull messages from that test subscription. However, the Google PubSub UI doesn’t provide any way to pull specific message by id. The GCloud Console UI is a frustrating UI itself, slow to load and had to pull several times to find the necessary messages.

Google offers BigQuery Subscription, a solution to that issue and also to provide a long term storage for your messages so you can troubleshoot and do complex query later. In this post, I’m going to show a sample BigQuery Subscription workflow with Pulumi.

Configure BigQuery Dataset and Table

First, you need to create a BigQuery Dataset and a BigQuery Table following the schema defined here. You can do it manually on the UI or via Pulumi

BigQuery Dataset

const pubsubDatasetId = `pubsub`;

export const pubsubDataset = new gcp.bigquery.Dataset(
  `my-dataset`,
  { datasetId: pubsubDatasetId }
);

BigQuery Table (a bit messy since the schema has to be defined in JSON string)

export const messageTable = new gcp.bigquery.Table(
  `my-table`,
  {
    datasetId: pubsubDatasetId,
    tableId: `message-values`,
    // if you don't want other people to accidentally delete is, set to true
    deletionProtection: true,
    schema: `
  [
    {
      "name": "data",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": "The message body"
    },
    {
      "name": "subscription_name",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": ""
    },
    {
      "name": "message_id",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": ""
    },
    {
      "name": "publish_time",
      "type": "TIMESTAMP",
      "mode": "NULLABLE",
      "description": ""
    },
    {
      "name": "attributes",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": "Message attributes as JSON string"
    }
  ]
  `,
  },
  {
    dependsOn: [pubsubDataset],
  }
);
Read more

Ok, the story is that, I’m really bad at css. I have never worked on building any frontend component and I was given a task to build the Custom Checkbox component with Reactjs from scratch. Here is how…

1. The basic HTML and CSS

Prepare the structure

Here is how you usually create a checkbox with pure html and css. To avoid any complicated event handler, I will simply wrap the <label> tag around, which allows clicking on any element inside to transfer the event to the corresponding <input> element without any Javascript needed.

<label class="mylabel">
  <input class="myinput" type="checkbox" name="checkbox" />
  <div class="mylabel">Checkbox label</div>
</label>
.mylabel {
  display: flex;
  gap: 5px;
  align-items: center;
  margin: 2px;
}

Try the live example in the below iframe (or direct link)

Read more

just a blog post for summarising my algorithm course

The Problem

Given a data structure organised as a set of N objects, is there a path connecting 2 objects?

// union: connect 2 objects
// connected: whether 2 objects are connected?
union(4, 3);
union(3, 8);
union(6, 5);
union(9, 4);
union(2, 1);

connected(0, 7) return false;
connected(8, 9) return true;

union(5, 0);
union(7, 2);
union(6, 1);
union(1, 0);

connected(0, 7) return true;

Can only answer the question with Yes or No. The Dynamic Connectivity implementation cannot answer the exact path between 2 objects. It can only answer whether there are any paths connecting 2 objects.

Read more

Leetcode: Binary Search

This is so trivial. I just put it here so I can look up faster.

Given an array of integers nums which is sorted in ascending order, and an integer target, write a function to search target in nums. If target exists, then return its index. Otherwise, return -1. You must write an algorithm with O(log n) runtime complexity.

Example 1

Input: nums = [-1,0,3,5,9,12], target = 9
Output: 4
Explanation: 9 exists in nums and its index is 4

Example 2

Input: nums = [-1,0,3,5,9,12], target = 2
Output: -1
Explanation: 2 does not exist in nums so return -1

Constraints

1 <= nums.length <= 104
-104 < nums[i], target < 104
All the integers in nums are unique.
nums is sorted in ascending order.
Read more

Just an interview question 🙃

The interview question: Re-implement Javascript Promise from scratch. The implementation should support chaining (asynchronously, of course).

Promise API

It has been too long since I started using async/await. I almost forgot these Promise APIs.

Let’s revisit the basic Javascript Promise APIs

const myPromise = new Promise((resolve, reject) => {
  setTimeout(() => {
    resolve('foo');
  }, 300);
});

// try catch each one separately
myPromise
  .then(handleResolvedA, handleRejectedA)
  .then(handleResolvedB, handleRejectedB)
  .then(handleResolvedC, handleRejectedC);

// or catch the first one if any
myPromise
  .then(handleResolvedA)
  .then(handleResolvedB)
  .then(handleResolvedC)
  .catch(handleRejectedAny);
Read more

Part 2: Refactor a legacy Worker Base - Part 2 - Scope Management

Rewrite the WorkerBase class

After fixing the Scope management problem, it’s time to rewrite the WorkerBase class in a way that the components are loosely coupling, composable and detachable. The solution turned out to be a very simple approach. It’s the middleware design that is very common in popular Web server frameworks (ASP.Net Core, Express.js, Koa.js,…).

In case you don’t know what a middleware is, read ASP.Net Core Middleware.

After analyzing the legacy WorkerBase class, I found that they could be organized into these middlewares

  • Exception handling middleware
  • Logging middleware
  • Message queue behaviors middleware

Worker middlewares

Read more