Basic Logging & Debugging in Microservices - Part 2

21 April 2018
misc

First part here Basic Logging & Debugging in Microservices - Part 1

In previous post, I have talked about the advantages of building a custom Logger module that can group all the related log data into one single log entry. In this post, I will continue discussing about some basic ideas to integrate it into the Microservices architecture and organise all the log data for better investigation.

Integrate custom MyLogger into your application

Integrating the custom MyLogger module into the application is a quite straightforward task. Instead of manually initialising and flushing the logs, you will need to implement a wrapper or higher order function to do that automatically. For example, if you are using Koa.js to for your http service, simply wrap the requests inside a logger middleware like this

const MyLogger = require('./my-logger.js')

// initialise koa app
// ...

function* myLoggerMdw(next) {
// init logger and assign to the context
  const metadata = {
    appName: 'your-service-name',
    routeName: this.request.routeName
  };
  const logger = new MyLogger(metadata);
  this.logger = logger;

  // wrap logger around your request
  try {
    logger.push('info', 'Start request', this.request.headers);
    yield next;
    logger.push('info', 'Request success', this.status);
    logger.write();
  } catch(e) {
    if (e.status < 500) {
      logger.push('warn', 'Handled error', e.message);
    } else {
      logger.push('error', 'Unhandled error', e.message);
    }
    logger.write();

    throw e;
  }
}

// custom middleware for MyLogger
app.use(myLoggerMdw)

Some optimizations in RethinkDB - Part 1

10 March 2018
rethinkdb

Yes it’s RethinkDB. Please don’t shout at me why you still write about the optimisations for a discontinued product like RethinkDB. I’m neither a fan of RethinkDB nor NoSQL. It is because I have to work with RethinkDB right now, deal with all the pains of RethinkDB and NoSQL and the team cannot move away from it since there are a lot of services currently depend on RethinkDB. But hey, most of the enhancements that we made are actually the basic philosophy in database scaling and optimisation. All those theory can be applied later in other database systems, not just RethinkDB.

So, you may have already known that, at the time of writing this post, I am working at Agency Revolution. We have been running the system which relies on RethinkDB for more than 3 years. We have built a great and highly scalable system with it. Beside that, we also have faced a lot of difficulties when the system grew too quickly, when the number requests peek during real life events (the agencies needed to send a lot of emails before holiday or after the disaster) or when large amount of data came in and out of the system. We have applied a lot of solutions in order to cope with the increase of work load so that our RethinkDB clusters can still serve the user within an acceptable time range. Some of those optimisations will be mentioned in this post.

Basic Logging & Debugging in Microservices - Part 1

26 February 2018
misc

One of the biggest difficulties when working with Microservices (or with other Distributed systems) is to debug if any problems occur. It is because the business logic is divided into several small places. The code bug in one service can result in a cascading series of issues in many related services. Tracing which service is the root cause of the issue is always a challenging mission. By implementing a good Logging solution, you can reduce the time it takes to discover the bug. It also helps you feel more confident about what happened in your code as well as makes the problem easier to reason about.

Let’s get your feet wet!

So you decided it’s time to build a logging solution for your Microservices system, here are some steps that you probably need to do in order to build that.

First, design and implement your logging module so that it works well in one microservice.
Apply it to all the services in the system.
Implement a method to link all the correlated logs in different services.
Set up centralised logging server for processing and querying the log data.
Define which data you need to put into the log entries for better investigation.

Design your Logging library

Before starting with a full Logging solution for the whole large application, it is important that you get your smallest building block to work properly. You will first need to build a logging solution that can work well in one service, and then apply to all other services. You have to define a logging standard that all the other services will follow so that you can store all the log entries into another logging backend storage for later investigation.

The simplest logging way is to write the log immediately whenever you want. For example, when you receive one API request, when the HTTP request is done processing or when the server finishes update one record in the database. However, you will soon end up with a bunch of messy log entries because the web server usually processes multiple requests at the same time and you don’t know which ones have the correlation with the others. This is quite common in the concurrent and parallel world where the system can handle different tasks at once. You need to design a logging backend that can associate all the related log entries into one.

The downsides of Microservices - Part 2 - Distributed system problems

06 January 2018
misc

In the first post, I discussed the overhead that you have to pay for when working with Microservices. This time, I’m going to talk about another problem with Microservices. It is the problem of the distributed systems that you have to face with from the very beginning.

You have to deal with the problem of Distributed systems very early

Working with a Distributed system has never been an easy job. For Microservices, you have to face it from very early.

Handling Data Inconsistency

A distributed system with a lot of small services followed by difference data storages means that there are no constraints between those data storages. In a traditional SQL database, this can be solved easily by adding foreign keys between tables and perform a cascading update/delete whenever you want to modify the data. Ensuring that constraint in a Microservices design is really challenging.

The downsides of Microservices - Part 1 - Overhead

25 December 2017
misc

It has been nearly 2 years since I started working at Agency Revolution, a team working on a software platform that utilizes Microservices architecture to build a highly scalable system for Automation Marketing. It comes with both pros and cons when building a Microservices system from scratch and I’m not for nor against Microservices. There are many articles and books on the Internet talking about the advantages of Microservices so I’m not going to write another post about the benefits of using Microservices. This post is just a summary of my experience and the difficulties after 2 years working with it as well as how we deal with those issues to get the most value of Microservices.

First, let me introduce a bit about the tech stack that we are using. We have been running our application on our private server for about 2 years before migrating to Google Cloud Platform. There are 3 types of service in the system. They are

HTTP services: the services for handling synchronous requests, the requests that need the response immediately (e.g. requests from frontend to display for users)
Google PubSub workers: the services for handling asynchronous requests. They are queued for later processing in the background and ensured by Google PubSub
Timer workers: The services that run at intervals.

HTTP services are used for handling simple requests, which can be completed within milliseconds/seconds. For the long-running tasks, we published a message to Google PubSub and schedule it to be processed later by the Google PubSub workers. Each of them is deployed and scaled as a pod in Kubernetes.

Utilize RethinkDB Index - Part 2 - Secondary Index

14 October 2017
rethinkdb

This post is the second part of the first post here. This post focuses on how to utilize RethinkDB Secondary Index in different use cases efficiently.

Some rules when using RethinkDB Indexes

RethinkDB Indexes, similar to Indexes in other database, are the trade-off between read and write performance. Therefore, the basic rules for RethinkDB Indexes are similar to other database.

Don’t create index if not necessary.
Don’t create index if the data set is small enough so that you can use the filter query.
Indexes require memory to process, be careful with tables that have a lot of indexes.
Indexes can slow down write operations significantly.

Utilize RethinkDB Index - Part 1 - Primary key Index

08 October 2017
rethinkdb

It has been more than one year since my last post. But yeah, I’m still here, not going anywhere. This time, I write about the database that I have been working over the last one year at Agency Revolution, RethinkDB.

At Agency Revolution, we make heavy use of RethinkDB. Nearly everything is stored in RethinkDB. Probably at the time you are reading this blog post, that will not be true anymore and we have been utilizing other databases as well. However, as it’s still one of our main data storage, we used to have a lot of performance issues related to storing and retrieving data (and we still have until now). This blog post is to summarize how we use RethinkDB indexes to solve those problems as well as some use cases for different kind of indexes in RethinkDB.

Implement a simple log trace in Clojure Ring

26 August 2016
misc

Why I need a Log trace

There are many logging libraries for Clojure and Ring out there which support basic logging per each request that Ring server handles. However, all of them produce multiple log entries per request, one when the request starts and one when the request ends. Also, they cannot log the steps happen inside the handler function’s execution. For example, with Ring-logger, the default setup logs:

an :info-level message when a request begins;
an :info level message when a response is generated without any server errors (i.e. its HTTP status is < 500);
an :error level message when a response’s HTTP status is >= 500;
an :error level message with a stack trace when an exception is thrown during response generation

If there are multiple requests that process at the same time, the log entries in the log file could be something like this

Starting request 1
Starting request 2
End request 2
End request 1

That’s hard for me unite all the logs into one place and search the all the related log information whenever debugging one specific request. There is also no way for me to track the flow of execution steps inside the handler function of that request. Although I can simply do (timbre/info "Start some database queries"), the problem than come back to the previous one

Starting request 1
Starting request 2
Start a query for request 1
Start a query for request 2
Write file for request 2
Write file for request 1
End request 2
End request 1

Hmmm. Something like this would be much better

[1] Starting request {id}
[2] Start query to database
[3] Found one record
[4] Processing data
[5] Finished request {id} in 20ms
[1] Starting request {id}
[2] Start query to database
[3] Exception: database down
[4] Finish request {id} in 10ms

What I want is one single log entry per request with the trace of its steps so I can easily find out how the code works as well as where it can break, where it doesn’t function normally.

Moving away from D3.js, I'm using React.js for DOM Manipulation now

20 August 2016
javascript

Why React in place of D3?

So, I’m migrating my web app to full client-side rendering using React recently. The reason is that I mixed both server-side rendering and client-side render too much. At first, all the pages use server rendering (something like jinja2 template). However, as a consequence of the increase in interaction with user on the web app, I started to add more and more js code, which leads to the logic duplication in both backend and frontend. I decided to move all the rendering to React.js and that makes my life much easier dealing with all the DOM manipulation and two-way binding.

The only thing I need to deal with is the diagram that I implemented using D3.js before. I have been researching on the internet for a good solution and I was very closed to follow one of those tutorials, which suggests hooking D3.js rendering in React’s componentDidMount event (actually, most of the tutorials suggest that). Suddenly, one of my frontend friend recommended me throwing away D3.js for all those tasks. He said that React.js is very good at DOM manipulation stuff, why I have to mixed D3 inside that to lose all the flexibility of two-way binding, virtual DOM update,… Yeah, that sounds logical and I decided to give it a try, threw away all my old code and started in a fresh new React way. Of course, I didn’t D3.js completely, I still use it because of its supporting functions for calculating the diagram position and coordination.

Implement the Tree diagram in React.js

Okay, the first thing I need to do is to convert this old piece of code from D3 to React. The requirement is to draw a family tree like this. In contrast to my imagination, rendering the tree diagram using React is an amazingly effortless task.

Jekyll on iOS - Be the geek on the go

13 May 2016
misc

This post is written entirely on iOS on my iPhone and iPad, from many places, at several times, in different situations.

So… You don’t blog very regularly recently.
Hmm, I don’t have enough time!
Too busy on work?
Nope, just enjoying the fun of the youth that I have missed for years :D

But…

During that, I waste a lot of time without actually doing any useful thing, mostly in waiting time, e.g waiting to my gf to be ready (oh the girl! 😅), waiting for my friends to come for a coffee or any other kind of waiting. I started to think about blogging on the go. However, the only thing that I have in thoses cases is my smart phone, an iOS powered one. And dealing with all those jekyll and git stuffs on a smartphone is really a big challenge.

Let’s make the impossible become real.

First obstacle: Git, of course

Coming from the terminal and Emacs world, I have never imagined how I would use git without them. But now I do :D

Working copy by Anders Borum is a quite good choice. You have the option to pay $14.99 in order to unlock the push feature. Actually, you have to pay. Who can use git without push feature :LOL:

For me, this’s quite adequate. All the steps to clone and push from Github are set up automatically, just input your credential and done. It just took me few minutes to get used to the UI. There are Git2Go at the same price, but I feel satisfied with this so I will leave Git2Go for the next chance.

working copy