Nothing special here. It’s just a blog post for summarising my algorithm learning course.

1. The 3-sum problem

The 3-sum problem is described as below

Given N distinct integers, how many triples sum to exactly zero?

Read more

Overview

Long time ago, I worked one a project which allowed users to select images on a 2D canvas and then draw that image on a cylinder surface (the mug in this case).

I had to Google for the suitable libraries but I couldn’t find any. I also asked a question on stackoverflow but the answer did not satisfy me. The reason is that it demonstrates how to stretch the image, not how to bend the image (which is what shown in the picture above). Because of that, I decided to implement it by myself and turned out that it was not as hard as I thought before. Everything is just a loop of basic mathematic formula that I had been taught in (Vietnamese) high school.

Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course. Here are some Quick Union related interview questions and my answers

1. Social network connectivity

Question

Given a social network containing n members and a log file containing m timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend … of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be mlogn or better and use extra space proportional to n.

Answer

The earliest time at which all members are connected is when we union all into 1 connected component (1 tree). That means all the nodes in the tree have the same root. This is an improvement of weighted quick union algorithm. Every time we call the union, we will check the weight of the tree to see whether it is equal to the size of n.

Read more

First part here Basic Logging & Debugging in Microservices - Part 1

In previous post, I have talked about the advantages of building a custom Logger module that can group all the related log data into one single log entry. In this post, I will continue discussing about some basic ideas to integrate it into the Microservices architecture and organise all the log data for better investigation.

Integrate custom MyLogger into your application

Integrating the custom MyLogger module into the application is a quite straightforward task. Instead of manually initialising and flushing the logs, you will need to implement a wrapper or higher order function to do that automatically. For example, if you are using Koa.js to for your http service, simply wrap the requests inside a logger middleware like this

const MyLogger = require('./my-logger.js')

// initialise koa app
// ...

function* myLoggerMdw(next) {
// init logger and assign to the context
  const metadata = {
    appName: 'your-service-name',
    routeName: this.request.routeName
  };
  const logger = new MyLogger(metadata);
  this.logger = logger;

  // wrap logger around your request
  try {
    logger.push('info', 'Start request', this.request.headers);
    yield next;
    logger.push('info', 'Request success', this.status);
    logger.write();
  } catch(e) {
    if (e.status < 500) {
      logger.push('warn', 'Handled error', e.message);
    } else {
      logger.push('error', 'Unhandled error', e.message);
    }
    logger.write();

    throw e;
  }
}

// custom middleware for MyLogger
app.use(myLoggerMdw)
Read more

Yes it’s RethinkDB. Please don’t shout at me why you still write about the optimisations for a discontinued product like RethinkDB. I’m neither a fan of RethinkDB nor NoSQL. It is because I have to work with RethinkDB right now, deal with all the pains of RethinkDB and NoSQL and the team cannot move away from it since there are a lot of services currently depend on RethinkDB. But hey, most of the enhancements that we made are actually the basic philosophy in database scaling and optimisation. All those theory can be applied later in other database systems, not just RethinkDB.

So, you may have already known that, at the time of writing this post, I am working at Agency Revolution. We have been running the system which relies on RethinkDB for more than 3 years. We have built a great and highly scalable system with it. Beside that, we also have faced a lot of difficulties when the system grew too quickly, when the number requests peek during real life events (the agencies needed to send a lot of emails before holiday or after the disaster) or when large amount of data came in and out of the system. We have applied a lot of solutions in order to cope with the increase of work load so that our RethinkDB clusters can still serve the user within an acceptable time range. Some of those optimisations will be mentioned in this post.

Read more

One of the biggest difficulties when working with Microservices (or with other Distributed systems) is to debug if any problems occur. It is because the business logic is divided into several small places. The code bug in one service can result in a cascading series of issues in many related services. Tracing which service is the root cause of the issue is always a challenging mission. By implementing a good Logging solution, you can reduce the time it takes to discover the bug. It also helps you feel more confident about what happened in your code as well as makes the problem easier to reason about.

Let’s get your feet wet!

So you decided it’s time to build a logging solution for your Microservices system, here are some steps that you probably need to do in order to build that.

  • First, design and implement your logging module so that it works well in one microservice.
  • Apply it to all the services in the system.
  • Implement a method to link all the correlated logs in different services.
  • Set up centralised logging server for processing and querying the log data.
  • Define which data you need to put into the log entries for better investigation.

Design your Logging library

Before starting with a full Logging solution for the whole large application, it is important that you get your smallest building block to work properly. You will first need to build a logging solution that can work well in one service, and then apply to all other services. You have to define a logging standard that all the other services will follow so that you can store all the log entries into another logging backend storage for later investigation.

The simplest logging way is to write the log immediately whenever you want. For example, when you receive one API request, when the HTTP request is done processing or when the server finishes update one record in the database. However, you will soon end up with a bunch of messy log entries because the web server usually processes multiple requests at the same time and you don’t know which ones have the correlation with the others. This is quite common in the concurrent and parallel world where the system can handle different tasks at once. You need to design a logging backend that can associate all the related log entries into one.

Read more

In the first post, I discussed the overhead that you have to pay for when working with Microservices. This time, I’m going to talk about another problem with Microservices. It is the problem of the distributed systems that you have to face with from the very beginning.

You have to deal with the problem of Distributed systems very early

Working with a Distributed system has never been an easy job. For Microservices, you have to face it from very early.

Handling Data Inconsistency

A distributed system with a lot of small services followed by difference data storages means that there are no constraints between those data storages. In a traditional SQL database, this can be solved easily by adding foreign keys between tables and perform a cascading update/delete whenever you want to modify the data. Ensuring that constraint in a Microservices design is really challenging.

Read more

It has been nearly 2 years since I started working at Agency Revolution, a team working on a software platform that utilizes Microservices architecture to build a highly scalable system for Automation Marketing. It comes with both pros and cons when building a Microservices system from scratch and I’m not for nor against Microservices. There are many articles and books on the Internet talking about the advantages of Microservices so I’m not going to write another post about the benefits of using Microservices. This post is just a summary of my experience and the difficulties after 2 years working with it as well as how we deal with those issues to get the most value of Microservices.

First, let me introduce a bit about the tech stack that we are using. We have been running our application on our private server for about 2 years before migrating to Google Cloud Platform. There are 3 types of service in the system. They are

  • HTTP services: the services for handling synchronous requests, the requests that need the response immediately (e.g. requests from frontend to display for users)
  • Google PubSub workers: the services for handling asynchronous requests. They are queued for later processing in the background and ensured by Google PubSub
  • Timer workers: The services that run at intervals.

HTTP services are used for handling simple requests, which can be completed within milliseconds/seconds. For the long-running tasks, we published a message to Google PubSub and schedule it to be processed later by the Google PubSub workers. Each of them is deployed and scaled as a pod in Kubernetes.

Read more

This post is the second part of the first post here. This post focuses on how to utilize RethinkDB Secondary Index in different use cases efficiently.

Index

Some rules when using RethinkDB Indexes

RethinkDB Indexes, similar to Indexes in other database, are the trade-off between read and write performance. Therefore, the basic rules for RethinkDB Indexes are similar to other database.

  • Don’t create index if not necessary.
  • Don’t create index if the data set is small enough so that you can use the filter query.
  • Indexes require memory to process, be careful with tables that have a lot of indexes.
  • Indexes can slow down write operations significantly.
Read more

It has been more than one year since my last post. But yeah, I’m still here, not going anywhere. This time, I write about the database that I have been working over the last one year at Agency Revolution, RethinkDB.

Index

At Agency Revolution, we make heavy use of RethinkDB. Nearly everything is stored in RethinkDB. Probably at the time you are reading this blog post, that will not be true anymore and we have been utilizing other databases as well. However, as it’s still one of our main data storage, we used to have a lot of performance issues related to storing and retrieving data (and we still have until now). This blog post is to summarize how we use RethinkDB indexes to solve those problems as well as some use cases for different kind of indexes in RethinkDB.

Read more