30 November 2024
algorithm

It’s not just about Sorting!

1. Merge Sort

Merge Sort is one of the most commonly taught sorting algorithm (beside Quick Sort) in any Computer Science and Information Technology courses. I will start by briefly reviewing the fundamental knowledge of Merge Sort. The basic idea behind Merge sort is a divide-and-conquer strategy, where you break the array into smaller halves, sort each half and then merge them back together. The recursive process continues until each half contains only 1 item (already sorted).

Merge Sort Explain

There are 2 variants of Merge sort implementation, using for and while loop. You can check my other post about Merge Sort here, which contains the for version from the algorithm course on Coursera. For the purpose of interview questions, I’ll demonstrate using the while version in this post.

As mentioned above, there are 2 essential steps in Merge Sort, the Divide step and the Merge step. Let’s begin with the Divide step

// WARNING: This is not working code and not the most efficient way to do
const mergeSort = (arr) => {
  // base case
  if (arr.length === 1)
    return arr;

  const mid = Math.floor(arr.length / 2);
  const firstHalf = mergeSort(arr.slice(0, mid));
  const secondHalf = mergeSort(arr.slice(mid, arr.length));
  return merge(firstHalf, secondHalf);
}

and here is how you merge 2 halves (2 sorted arrays)

// WARNING: This is not working code and not the most efficient way to do
const merge  = (arr1, arr2) => {
  const res = [];
  while (arr1.length && arr2.length) {
    const item1 = arr1[0];
    const item2 = arr2[0];
    if (item1 < item2) {
      res.push(item1);
      arr1.shift();
    } else {
      res.push(item2);
      arr2.shift();
    }
  }

  if (arr1.length) {
    res.push(...arr1);
  } else {
    res.push(...arr2);
  }

  return res;
}

Optimize a Recursive problem

13 October 2024
algorithm

Ok, you will probably find this problem during your coding interview 😩 Many companies don’t like giving the candidates tricky problems. Instead, they often ask you to write a recursive function to prove that you’re a real engineer, not some random guys applying for the job because it’s well paid 😆 After that, the next question is usually about how to optimize that function with very large datasets and avoid the “Maximum call stack size exceeded” error

Here is how you can prepare yourself for that type of interview question

1. Basic Recursion

Let’s start with this very basic recursion question

Given an array arr with N items, write a recursive function to calculate the sum of all items in the array

If you think about recursion in its simplest form, it’s like a divide and conquer strategy, where you break the problem into smaller ones, solve one of them first and then repeat with the other remaining ones (recursive case) until there is no item left (base case). Now, let’s take a look at the above problem, the solution can be described like this: The sum of an array is the sum of the first element (the head of the list) and all the remaining items (the tail of the array).

If you have worked in any lisp-like language (Emacs Lisp for example 😂), you will immediately see the pattern. They are the car and cdr function in Elisp.

From that, you can easily write a basic recursive function like this (in Javascript)

const sum = (arr) => {
  if (arr.length === 0) {
    return 0;
  }
  const [head, ...tail] = arr;
  return head + sum(tail);
};

A Unified Development Environment Journey - Part 2

06 September 2024
misc

Previous post: A Unified Development Environment Journey - Part 1

I won’t mention about setting up Codespaces or Devcontainer as it’s already on their documentation page. In this post, I’ll show you some of the problems and example workflows that I did with Codespaces/Devcontainer.

Dev credentials and Environment variables

Your organization may created some private packages that requires some layers of authentication in order to install (the npm packages hosted on GCP for example). One possible solution is to create an access token (or any other type of credential) and put set it in your development environment. For Codespaces, you can simply set it on your Github configuration page. However, for Devcontainer, it needs a bit more work. The idea is to mount the environment file from outside of the container and then load those variables to Devcontainer

Create an env file on your computer, for example /var/devcontainer-mount/workflow.env, which contains all the secrets that you want to inject

export GOOGLE_APPLICATION_CREDENTIALS="xxx"

Mount the folder to your Devcontainer by adding a mount volume to your devcontainer.json file. Read more here Add another local file mount

{
  //...other config
  "mounts": [
    {
      "source": "/var/devcontainer-mount",
      "target": "/workspaces/devcontainer-mount",
      "type": "bind"
    }
  ]
}

Who will win?

26 August 2024
javascript

My friend sent me this and asked me what the value printed at the end is, which promise would win the race.

const promise1 = new Promise((resolve) => {
  for (let i = 0; i < 10000; i++) {} // longer loop
  resolve('promise1');
});

const promise2 = new Promise((resolve) => {
  for (let i = 0; i < 100; i++) {} // shorter loop
  resolve('promise2');
});

Promise.race([promise1, promise2]).then((value) => console.log(value));

Surprisingly, most of their answers are the same, with promise2 to be printed at the end!

If you have been working with Nodejs long enough and understand its event loop, you can immediately see the problem in the above code. Even though the code is inside a Promise, which should be run concurrently (in normal mindset). However, the for loops contain nothing inside. They are called blocking code in js. Since Nodejs is single-threaded, the above functions will be executed sequentially, in the order of declaration.

A Unified Development Environment Journey - Part 1

26 August 2024
misc

I have been dreaming about a unified dev environment for everybody in the team, where there’s no more It works on my machine issue. The dev environment should be portable, automatically configured and can be destroyed/recreated as quickly as possible. It should also provide all the necessary tools for the developer as well as enforce everyone in the team to follow the correct convention consistently.

My first attempt in university

Here is my first effort when I was a student, a classic solution.

First attempt

In this method, you would run everything normally in your computer and use a Bash/Ansible script to install all the necessary tools as well as set up the development environment. Of course, this is the worst (but was the best with my knowledge at that time 😆).

This setup won’t be reused for multiple engineers. Each engineer has a very different setup on that machine. The script can mess up the other people’s computer easily.
Even if it’s just for me, that also won’t work. I install and set up new applications all the time. After a while, when I touch that project again, all the scripts were broken and I don’t even remember how to run my app again.

A better way

I began learning how to do it the professional way in my first job after graduated. The environment is configured using a Virtualbox VM (provisioned by Vagrant), which reflects the real setup on the production server. The server/VM runs everything, from the database server to the main application.

2014

Compose Higher Order Functions in Typescript

23 June 2024
javascript

Initial problem

It’s the classic logging issue again! In the Warehouse Management system that I’m working on, the team usually needs to add this logging pattern

const sendToteToPackingStation = (warehouseId: string, toteId: string): Promise<Result> => {
  logger.info('Sending tote to packing station', { warehouseId, toteId });

  const result = await someLogic(...);

  logger.info('Send tote result', { result });
  return result;
};

The purpose is simple. It’s what you have to do for production debugging in every system. You should write out some unique ids so you have a place to start querying your logging system. From that point, you will then trace the related entries using a correlation id that your system provides.

From time to time, when the system scaled up, there were new areas that could slow down the system. We then added more logging logic to the system, for example, execution time logging to help build some visualization dashboards to identify the root cause.

const sendToteToPackingStation = (warehouseId: string, toteId: string): Promise<Result> => {
  const startTime = performance.now();

  logger.info('Sending tote to packing station', { warehouseId, toteId });
  const result = await someLogic(...);
  logger.info('Send tote result', { result });

  logger.info('Execution time', { durationMs: elapsedTimeMs(startTime) });
  return result;
};

Of course, when this was repeated multiple times, we started thinking about making a higher order function (HOF) to reuse everywhere is the system

First implementation…

Let’s begin with the type definition. Here are the generic types how a HOF looks like. It’s a function that receives a function and return another function with the same signature with the input function

export type WrappedFunction<
  FunctionArgs extends ReadonlyArray<unknown>,
  FunctionReturn
> = (...args: FunctionArgs) => FunctionReturn;

export type HigherOrderFunction = <
  FunctionArgs extends ReadonlyArray<unknown>,
  FunctionReturn
>(
  func: WrappedFunction<FunctionArgs, FunctionReturn>
) => WrappedFunction<FunctionArgs, FunctionReturn>;

Let's build Product, not Software - Part 3

12 September 2023
misc

Let’s build Product, not Software - Part 2

In the previous post, I have shown you a real example about how to solve the problem as a Software Engineer. This time, we gonna do it by the Product Engineer approach.

Make Agile great again!

If you refer to the first post that I wrote, Building Product is about delivering user values, collecting feedbacks and constantly adapting to the change of business. Does it sound like Agile? Yes, in my opinion, Agile seems to be the best fit out there for Product company at a small and medium size.

Don’t do this

Let’s start with the non-Agile way by this picture-by-picture story

delivery-1

delivery-2

delivery-4

delivery-3

When you put this into a Software perspective, it’s pretty much the same with the first example that I showed in my previous post. The way most people would choose is to implement the whole feature, from backend to frontend before delivering to the customer. Again, do NOT do this.

Let's build Product, not Software - Part 2

03 September 2023
misc

Previous post: Let’s build Product, not Software - Part 1

In the first part of this series, I have walked you quickly through some differences between Building Product and Building Software and why Building Product is important. In this post, I’ll show you a simple example, analyze its problem and come up with a solution following the Product Engineer mindset.

A Software Engineer approach

Let’s take a look at this simple project

You are working for an Automation marketing platform for E-commerce merchants. The product needs the data about the sales orders of the merchant and your task is to build a 1 way Sales Order integration feature to sync data from Shopify to your system

After going through several discovery steps with your customers and the PO, you decide that it’s time to make the implementation plan. You now come up with this plan

Milestones	Details	Duration
Backend	Build the backend to handle auth & webhook requests from Shopify	1 month
Frontend	Build the frontend for the users	1 month
Beta	Release to some beta customers	0.5 month
Bug fixes	Handle issues reported from customers	0.5 month
Go live	Release to everybody	0.5 month

Sounds good? Yes, this is a perfect plan from a Software Engineer perspective, but…

Let's build Product, not Software - Part 1

29 August 2023
misc

This may not always be true. It really depends on the type of company that I will go through in this post. This also sometimes sounds strange from a Software Engineering perspective, but as we get used to, it really did help FMG (my old company) grow to the leading player that niche market

Software engineers love technology, for sure. We love building things, love applying the latest technology into our product. However, we usually forget one important thing: The technology that doesn’t fit in the product, cannot make a profitable business, is a useless one. That sounds obvious, right? Surprisingly, a lot of Software Engineers that I’ve met made this mistake, especially the talented ones.

Let me walk you through the 2 approaches, compare the differences between them and analyze some real examples. In latter posts, you can also find some techniques that I’ve applied in order to help build a better Product Engineer mindset.

This is converted from a presentation that I made at work

Tips when working with Postgres - Part 2

16 October 2022
misc

Just a collection of tips to make working with Postgres (and other SQL-like databases) easier

Integration data

Usually when you build a system that integrates with other 3rd party service, you will need to store integration information related to the entity, for example the id of the entity on the 3rd party system or some of its configuration on that system. Imagine that you are building an e-commerce related product, you may want to sync the Sales order information from Shopify to do the analytics on customer behavior. The first solution you can think of is to add a column like shopify_entity_id on that table.

What will happens if you introduce another integration later? Does the name shopify_entity_id still make sense? You may consider renaming it to external_entity_id. How do you know where it comes from? Adding another source column? How do you store extra 3rd party information about the sales order? Keep adding columns like external_something? Do those columns actually belong to the sales_order table itself?
What will happen if an single entity exists on multiple 3rd party system? For instance, the sales order may be presented on both Shopify and on another Shipping service. How would you deal with it? Keep adding more columns? What if we introduct another integration?
- A Json (Jsonb) column could solve the above issue but also creates a whole new problem. How about schema enforcement and constraint? How do we make sure that nobody will accidentally update it the an incorrect schema? How about null and undefined values (in case you are working with Javascript)? How about indexing the values for quick access? You can index inside the json but it just makes things more complicated due to those schema problems mentioned above.

The solution, of course, is a SQL approach: make an entity integration table (sales_order_integration in this case). It’s a 1-N relationship, 1 sales order could have 0 or multiple integrations

sales_order table

id	shipping_address	price	weightMg
1	Ho Chi Minh city	10	20
2	Hanoi	20	30

sales_order_integration table

id	sales_order_id	external_entity_id	source
1	1	external-id1	SHOPIFY
2	1	external-id2	WOOCOMMERCE
3	2	external-id3	SHOPIFY