Previous post: A Unified Development Environment Journey - Part 1

I won’t mention about setting up Codespaces or Devcontainer as it’s already on their documentation page. In this post, I’ll show you some of the problems and example workflows that I did with Codespaces/Devcontainer.

Dev credentials and Environment variables

Your organization may created some private packages that requires some layers of authentication in order to install (the npm packages hosted on GCP for example). One possible solution is to create an access token (or any other type of credential) and put set it in your development environment. For Codespaces, you can simply set it on your Github configuration page. However, for Devcontainer, it needs a bit more work. The idea is to mount the environment file from outside of the container and then load those variables to Devcontainer

Create an env file on your computer, for example /var/devcontainer-mount/workflow.env, which contains all the secrets that you want to inject

export GOOGLE_APPLICATION_CREDENTIALS="xxx"

Mount the folder to your Devcontainer by adding a mount volume to your devcontainer.json file. Read more here Add another local file mount

{
  //...other config
  "mounts": [
    {
      "source": "/var/devcontainer-mount",
      "target": "/workspaces/devcontainer-mount",
      "type": "bind"
    }
  ]
}
Read more
Who will win?

My friend sent me this and asked me what the value printed at the end is, which promise would win the race.

const promise1 = new Promise((resolve) => {
  for (let i = 0; i < 10000; i++) {} // longer loop
  resolve('promise1');
});

const promise2 = new Promise((resolve) => {
  for (let i = 0; i < 100; i++) {} // shorter loop
  resolve('promise2');
});

Promise.race([promise1, promise2]).then((value) => console.log(value));

Surprisingly, most of their answers are the same, with promise2 to be printed at the end!

If you have been working with Nodejs long enough and understand its event loop, you can immediately see the problem in the above code. Even though the code is inside a Promise, which should be run concurrently (in normal mindset). However, the for loops contain nothing inside. They are called blocking code in js. Since Nodejs is single-threaded, the above functions will be executed sequentially, in the order of declaration.

Read more

I have been dreaming about a unified dev environment for everybody in the team, where there’s no more It works on my machine issue. The dev environment should be portable, automatically configured and can be destroyed/recreated as quickly as possible. It should also provide all the necessary tools for the developer as well as enforce everyone in the team to follow the correct convention consistently.

My first attempt in university

Here is my first effort when I was a student, a classic solution.

First attempt

In this method, you would run everything normally in your computer and use a Bash/Ansible script to install all the necessary tools as well as set up the development environment. Of course, this is the worst (but was the best with my knowledge at that time 😆).

  • This setup won’t be reused for multiple engineers. Each engineer has a very different setup on that machine. The script can mess up the other people’s computer easily.
  • Even if it’s just for me, that also won’t work. I install and set up new applications all the time. After a while, when I touch that project again, all the scripts were broken and I don’t even remember how to run my app again.

A better way

I began learning how to do it the professional way in my first job after graduated. The environment is configured using a Virtualbox VM (provisioned by Vagrant), which reflects the real setup on the production server. The server/VM runs everything, from the database server to the main application.

2014

Read more

Initial problem

It’s the classic logging issue again! In the Warehouse Management system that I’m working on, the team usually needs to add this logging pattern

const sendToteToPackingStation = (warehouseId: string, toteId: string): Promise<Result> => {
  logger.info('Sending tote to packing station', { warehouseId, toteId });

  const result = await someLogic(...);

  logger.info('Send tote result', { result });
  return result;
};

The purpose is simple. It’s what you have to do for production debugging in every system. You should write out some unique ids so you have a place to start querying your logging system. From that point, you will then trace the related entries using a correlation id that your system provides.

From time to time, when the system scaled up, there were new areas that could slow down the system. We then added more logging logic to the system, for example, execution time logging to help build some visualization dashboards to identify the root cause.

const sendToteToPackingStation = (warehouseId: string, toteId: string): Promise<Result> => {
  const startTime = performance.now();

  logger.info('Sending tote to packing station', { warehouseId, toteId });
  const result = await someLogic(...);
  logger.info('Send tote result', { result });

  logger.info('Execution time', { durationMs: elapsedTimeMs(startTime) });
  return result;
};

Of course, when this was repeated multiple times, we started thinking about making a higher order function (HOF) to reuse everywhere is the system

First implementation…

Let’s begin with the type definition. Here are the generic types how a HOF looks like. It’s a function that receives a function and return another function with the same signature with the input function

export type WrappedFunction<
  FunctionArgs extends ReadonlyArray<unknown>,
  FunctionReturn
> = (...args: FunctionArgs) => FunctionReturn;

export type HigherOrderFunction = <
  FunctionArgs extends ReadonlyArray<unknown>,
  FunctionReturn
>(
  func: WrappedFunction<FunctionArgs, FunctionReturn>
) => WrappedFunction<FunctionArgs, FunctionReturn>;
Read more

Let’s build Product, not Software - Part 2

In the previous post, I have shown you a real example about how to solve the problem as a Software Engineer. This time, we gonna do it by the Product Engineer approach.

Make Agile great again!

If you refer to the first post that I wrote, Building Product is about delivering user values, collecting feedbacks and constantly adapting to the change of business. Does it sound like Agile? Yes, in my opinion, Agile seems to be the best fit out there for Product company at a small and medium size.

Don’t do this

Let’s start with the non-Agile way by this picture-by-picture story

delivery-1

delivery-2

delivery-4

delivery-3

When you put this into a Software perspective, it’s pretty much the same with the first example that I showed in my previous post. The way most people would choose is to implement the whole feature, from backend to frontend before delivering to the customer. Again, do NOT do this.

Read more

Previous post: Let’s build Product, not Software - Part 1

In the first part of this series, I have walked you quickly through some differences between Building Product and Building Software and why Building Product is important. In this post, I’ll show you a simple example, analyze its problem and come up with a solution following the Product Engineer mindset.

A Software Engineer approach

Let’s take a look at this simple project

You are working for an Automation marketing platform for E-commerce merchants. The product needs the data about the sales orders of the merchant and your task is to build a 1 way Sales Order integration feature to sync data from Shopify to your system

After going through several discovery steps with your customers and the PO, you decide that it’s time to make the implementation plan. You now come up with this plan

Milestones Details Duration
Backend Build the backend to handle auth & webhook requests from Shopify 1 month
Frontend Build the frontend for the users 1 month
Beta Release to some beta customers 0.5 month
Bug fixes Handle issues reported from customers 0.5 month
Go live Release to everybody 0.5 month

Sounds good? Yes, this is a perfect plan from a Software Engineer perspective, but…

Read more

This may not always be true. It really depends on the type of company that I will go through in this post. This also sometimes sounds strange from a Software Engineering perspective, but as we get used to, it really did help FMG (my old company) grow to the leading player that niche market

Software engineers love technology, for sure. We love building things, love applying the latest technology into our product. However, we usually forget one important thing: The technology that doesn’t fit in the product, cannot make a profitable business, is a useless one. That sounds obvious, right? Surprisingly, a lot of Software Engineers that I’ve met made this mistake, especially the talented ones.

Let me walk you through the 2 approaches, compare the differences between them and analyze some real examples. In latter posts, you can also find some techniques that I’ve applied in order to help build a better Product Engineer mindset.

img1

This is converted from a presentation that I made at work

Read more

Just a collection of tips to make working with Postgres (and other SQL-like databases) easier

Integration data

Usually when you build a system that integrates with other 3rd party service, you will need to store integration information related to the entity, for example the id of the entity on the 3rd party system or some of its configuration on that system. Imagine that you are building an e-commerce related product, you may want to sync the Sales order information from Shopify to do the analytics on customer behavior. The first solution you can think of is to add a column like shopify_entity_id on that table.

  • What will happens if you introduce another integration later? Does the name shopify_entity_id still make sense? You may consider renaming it to external_entity_id. How do you know where it comes from? Adding another source column? How do you store extra 3rd party information about the sales order? Keep adding columns like external_something? Do those columns actually belong to the sales_order table itself?
  • What will happen if an single entity exists on multiple 3rd party system? For instance, the sales order may be presented on both Shopify and on another Shipping service. How would you deal with it? Keep adding more columns? What if we introduct another integration?
    • A Json (Jsonb) column could solve the above issue but also creates a whole new problem. How about schema enforcement and constraint? How do we make sure that nobody will accidentally update it the an incorrect schema? How about null and undefined values (in case you are working with Javascript)? How about indexing the values for quick access? You can index inside the json but it just makes things more complicated due to those schema problems mentioned above.

The solution, of course, is a SQL approach: make an entity integration table (sales_order_integration in this case). It’s a 1-N relationship, 1 sales order could have 0 or multiple integrations

sales_order table

id shipping_address price weightMg
1 Ho Chi Minh city 10 20
2 Hanoi 20 30

sales_order_integration table

id sales_order_id external_entity_id source
1 1 external-id1 SHOPIFY
2 1 external-id2 WOOCOMMERCE
3 2 external-id3 SHOPIFY
Read more

Just a collection of tips to make working with Postgres (and other SQL-like databases) easier

Measurement Unit

Take this case for an example, you have a product table and you want to store product information like its size and weight by adding these columns to describe such properties: width, length, height and weight.

id name width length height weight
1 ipad 10 20 0.1 0.5
2 macbook 20 30 0.5 2

So what’s the problem with the above table? We are assuming that the size props (width, height and length) are measured in cm and the weight is measured in kg. Usually, we could put this logic in application layer to make sure we convert everything to cm and kg before inserting into the database. However, it could lead to even more problems

  • What will happen if a new dev join the team? How can you make sure that person will know when to convert and when not?
  • What will happen if a dev using pound join the team?
  • What will happen if a dev accidentally assume the value in weight is in mg?
  • Sometimes, you could do a double conversion, making thing worse.
  • You need to remember adding comment to every place in your code, just to remind people which measurement unit that function is using.
  • Which data type to choose? Integer, of course, is not a good choice. However, working with real, double, decimal or numeric is always harder compare to int. They could cause some problems with parsing and datatype for languages/libraries like Nodejs.
Read more

An example about configuring PubSub BigQuery Subscription with Pulumi

BigQuery Subscription

It’s hard to view the content of the messages that were published to a topic because the application has already processed and acknowledged them before you can do anything. Usually, you have to create another test subscription for the messages to be replicated to and then pull messages from that test subscription. However, the Google PubSub UI doesn’t provide any way to pull specific message by id. The GCloud Console UI is a frustrating UI itself, slow to load and had to pull several times to find the necessary messages.

Google offers BigQuery Subscription, a solution to that issue and also to provide a long term storage for your messages so you can troubleshoot and do complex query later. In this post, I’m going to show a sample BigQuery Subscription workflow with Pulumi.

Configure BigQuery Dataset and Table

First, you need to create a BigQuery Dataset and a BigQuery Table following the schema defined here. You can do it manually on the UI or via Pulumi

BigQuery Dataset

const pubsubDatasetId = `pubsub`;

export const pubsubDataset = new gcp.bigquery.Dataset(
  `my-dataset`,
  { datasetId: pubsubDatasetId }
);

BigQuery Table (a bit messy since the schema has to be defined in JSON string)

export const messageTable = new gcp.bigquery.Table(
  `my-table`,
  {
    datasetId: pubsubDatasetId,
    tableId: `message-values`,
    // if you don't want other people to accidentally delete is, set to true
    deletionProtection: true,
    schema: `
  [
    {
      "name": "data",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": "The message body"
    },
    {
      "name": "subscription_name",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": ""
    },
    {
      "name": "message_id",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": ""
    },
    {
      "name": "publish_time",
      "type": "TIMESTAMP",
      "mode": "NULLABLE",
      "description": ""
    },
    {
      "name": "attributes",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": "Message attributes as JSON string"
    }
  ]
  `,
  },
  {
    dependsOn: [pubsubDataset],
  }
);
Read more