Nothing special here. It’s just a blog post for summarising my algorithm learning course.

Weighted Quick Union

  • Use Quick Union but avoid tall tree, to avoid traversing through very long path
  • Use an extra array to track the size of each tree (stored in the root node)
  • Link the smaller tree to the root of the larger tree
  • The size of the new tree is the total size of both tree

Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course.

Approach

The underlying data structure for Quick Union is similar to Quick Find. We need an array for all the items. The values are still the id, but it’s the id of the parent item. The connected component is organised using a tree structure like this

  • isConnected(p, q) check whether the 2 items have the same root
  • union(p, q) set the id of p’s root to the id of q’s root
Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course.

Approach

The Quick Find idea is to assign all the items in 1 connected component with 1 same id. p and q are connected if they have the same id. Every time we do the union command, we update those 2 items with the same id (and also all other items in the connected component).

The set is organised as an array, where the values are the id of that connected component. At first, all the items have the id the same with its index (connected to itself). After each call to union, we update the id of those 2 items to one of them

Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course.

Dynamic Connectivity

The problem

Given a data structure organised as a set of N objects

  • Union command: connect 2 objects.
  • Find/connected command: is there a path connecting 2 objects?

union(4, 3);
union(3, 8);
union(6, 5);
union(9, 4);
union(2, 1);

connected(0, 7) return false;
connected(8, 9) return true;

union(5, 0);
union(7, 2);
union(6, 1);
union(1, 0);

connected(0, 7) return true;

Can only answer the question with Yes or No. The Dynamic Connectivity implementation cannot answer the exact path between 2 objects. It can only answer whether there are any paths connecting 2 objects.

Read more

First part here Basic Logging & Debugging in Microservices - Part 1

In previous post, I have talked about the advantages of building a custom Logger module that can group all the related log data into one single log entry. In this post, I will continue discussing about some basic ideas to integrate it into the Microservices architecture and organise all the log data for better investigation.

Integrate custom MyLogger into your application

Integrating the custom MyLogger module into the application is a quite straightforward task. Instead of manually initialising and flushing the logs, you will need to implement a wrapper or higher order function to do that automatically. For example, if you are using Koa.js to for your http service, simply wrap the requests inside a logger middleware like this

const MyLogger = require('./my-logger.js')

// initialise koa app
// ...

function* myLoggerMdw(next) {
// init logger and assign to the context
  const metadata = {
    appName: 'your-service-name',
    routeName: this.request.routeName
  };
  const logger = new MyLogger(metadata);
  this.logger = logger;

  // wrap logger around your request
  try {
    logger.push('info', 'Start request', this.request.headers);
    yield next;
    logger.push('info', 'Request success', this.status);
    logger.write();
  } catch(e) {
    if (e.status < 500) {
      logger.push('warn', 'Handled error', e.message);
    } else {
      logger.push('error', 'Unhandled error', e.message);
    }
    logger.write();

    throw e;
  }
}

// custom middleware for MyLogger
app.use(myLoggerMdw)
Read more

Yes it’s RethinkDB. Please don’t shout at me why you still write about the optimisations for a discontinued product like RethinkDB. I’m neither a fan of RethinkDB nor NoSQL. It is because I have to work with RethinkDB right now, deal with all the pains of RethinkDB and NoSQL and the team cannot move away from it since there are a lot of services currently depend on RethinkDB. But hey, most of the enhancements that we made are actually the basic philosophy in database scaling and optimisation. All those theory can be applied later in other database systems, not just RethinkDB.

So, you may have already known that, at the time of writing this post, I am working at Agency Revolution. We have been running the system which relies on RethinkDB for more than 3 years. We have built a great and highly scalable system with it. Beside that, we also have faced a lot of difficulties when the system grew too quickly, when the number requests peek during real life events (the agencies needed to send a lot of emails before holiday or after the disaster) or when large amount of data came in and out of the system. We have applied a lot of solutions in order to cope with the increase of work load so that our RethinkDB clusters can still serve the user within an acceptable time range. Some of those optimisations will be mentioned in this post.

Read more

One of the biggest difficulties when working with Microservices (or with other Distributed systems) is to debug if any problems occur. It is because the business logic is divided into several small places. The code bug in one service can result in a cascading series of issues in many related services. Tracing which service is the root cause of the issue is always a challenging mission. By implementing a good Logging solution, you can reduce the time it takes to discover the bug. It also helps you feel more confident about what happened in your code as well as makes the problem easier to reason about.

Let’s get your feet wet!

So you decided it’s time to build a logging solution for your Microservices system, here are some steps that you probably need to do in order to build that.

  • First, design and implement your logging module so that it works well in one microservice.
  • Apply it to all the services in the system.
  • Implement a method to link all the correlated logs in different services.
  • Set up centralised logging server for processing and querying the log data.
  • Define which data you need to put into the log entries for better investigation.

Design your Logging library

Before starting with a full Logging solution for the whole large application, it is important that you get your smallest building block to work properly. You will first need to build a logging solution that can work well in one service, and then apply to all other services. You have to define a logging standard that all the other services will follow so that you can store all the log entries into another logging backend storage for later investigation.

The simplest logging way is to write the log immediately whenever you want. For example, when you receive one API request, when the HTTP request is done processing or when the server finishes update one record in the database. However, you will soon end up with a bunch of messy log entries because the web server usually processes multiple requests at the same time and you don’t know which ones have the correlation with the others. This is quite common in the concurrent and parallel world where the system can handle different tasks at once. You need to design a logging backend that can associate all the related log entries into one.

Read more

In the first post, I discussed about the overhead that you have to pay for when working with Microservices. This time, I’m going to talk about another problem with Microservices. It is the problem of the distributed systems that you have to face with from the very beginning.

You have to deal with the problem of Distributed systems very early

Working with a Distributed system has never been an easy job. For Microservices, you have to face it from very early.

Handling Data Inconsistency

A distributed system with a lot of small services followed by difference data storages means that there are no constraints between those data storages. In a traditional SQL database, this can be solved easily by adding foreign keys between tables and perform a cascading update/delete whenever you want to modify the data. Ensuring that constraint in a Microservices design is really challenging.

Read more

It has been nearly 2 years since I started working at Agency Revolution, a team working on a software platform that utilizes Microservices architecture to build a highly scalable system for Automation Marketing. It comes with both pros and cons when building a Microservices system from scratch and I’m not for nor against Microservices. There are many articles and books on the Internet talking about the advantages of Microservices so I’m not going to write another post about the benefits of using Microservices. This post is just a summary of my experience and the difficulties after 2 years working with it as well as how we deal with those issues to get the most value of Microservices.

First, let me introduce a bit about the tech stack that we are using. We have been running our application on our private server for about 2 years before migrating to Google Cloud Platform. There are 3 types of service in the system. They are

  • HTTP services: the services for handling synchronous requests, the requests that need the response immediately (e.g. requests from frontend to display for users)
  • Google PubSub workers: the services for handling asynchronous requests. They are queued for later processing in the background and ensured by Google PubSub
  • Timer workers: The services that run at intervals.

HTTP services are used for handling simple requests, which can be completed within milliseconds/seconds. For the long-running tasks, we published a message to Google PubSub and schedule it to be processed later by the Google PubSub workers. Each of them is deployed and scaled as a pod in Kubernetes.

Read more

This post is the second part of the first post here. This post focuses on how to utilize RethinkDB Secondary Index in different use cases efficiently.

Index

Some rules when using RethinkDB Indexes

RethinkDB Indexes, similar to Indexes in other database, are the trade-off between read and write performance. Therefore, the basic rules for RethinkDB Indexes are similar to other database.

  • Don’t create index if not necessary.
  • Don’t create index if the data set is small enough so that you can use the filter query.
  • Indexes require memory to process, be careful with tables that have a lot of indexes.
  • Indexes can slow down write operations significantly.
Read more