Nothing special here. It’s just a blog post for summarising my algorithm learning course. Although this was already taught in the University, it’s still god to summarize here

1. Symbol Tables

Key-value pair abstraction.

  • Insert a value with specified key.
  • Given a key, search for the corresponding value.

Example

domain name IP address
www.cs.princeton.edu 128.112.136.11
www.princeton.edu 128.112.128.15
www.yale.edu 130.132.143.21
www.harvard.edu 128.103.060.55
www.simpsons.com 209.052.165.60

Symbol Table APIs

Symbol Tables act as an associative array, associate one value with each key.

public class ST<Key, Value> {
    void put(Key key, Value, val);
    Value get(Key key);
    void delete(Key key);
    boolean contains(Key key);
    boolean isEmpty();
    int size();
    Iterable<Key> keys();
}
Read more

At the time of this writing, I have been working at Agency Revolution (AR) for more than 2 years, on a product focusing mostly on automation email marketing for the Insurance Agencies. I have been working on this product since it was in beta, when it could only serve only a few clients, send thousands of emails each month and handle very little amount of integration data without downtime until it can deliver millions of emails each month, store and react to terabytes of data flow every day. The dev team has been working very hard and suffering a lot of problem to cope with the increasing number of customers that the sale team brought to us. Here the summary of some techniques and strategies that we have applied in order to deliver a better user experience.

The problem of On-demand computing

By On-demand, I mean the action of computing the required data only when it is needed.

One of the core value of our system is to deliver the right messages to the right people at the right time. Our product allows users to set up automated emails, which will be sent at a suitable time in the future. The emails are customised to each specific recipient based on their newest data at the time they receive the email, for example the current customer status, whether that customer is an active or lost customer at that time, how many policies he/she has or the total value that customer has spent until that time.

Read more

Part 1 here Some Optimizations in RethinkDB - Part 1

Yes, it’s RethinkDB, a discontinued product. Again, read my introduction in the previous post. It’s not only about RethinkDB but it also the basic idea for many other database systems. This post introduces other techniques that I and the team have applied at AR to maximize the workload that RethinkDB can handle but most of them can be applied for other database systems as well.

Increase the Memory with NVME SSD

Well, sound like a very straight forward solution, huh? More memory, better performance, sound quite obvious! Yes, the key thing is how to increase the memory without significant cost. The answer is to setup swap as the temporary space for storing RethinkDB cached data. RethinkDB, as well as other database systems, caches the query result data into memory so that it can be re-used next time the same query executes again. The problem is that swap is much slower than RAM, because we rely on the disk to store the data. However, since we are running on Google Cloud and Google Cloud offers the Local-SSDs solution, we have been exploiting this to place our swap data. Here is the Local-SSDs definition, according to Google

Local SSDs are physically attached to the server that hosts your virtual machine instance. Local SSDs have higher throughput and lower latency than standard persistent disks or SSD persistent disks. The data that you store on a local SSD persists only until the instance is stopped or deleted.

Read more

Feature Toggle is a very popular technique that enables you to test the new feature on real production environment before releasing it to your clients. It’s also helpful when you want to enable the feature for just some beta clients or just some clients who pay for the specific features. The technique requires both backend and frontend work involved. In this post, I’m going to talk about some simple solutions that I and the team at AR have applied as well as some other useful ways that we are still discussing and may apply one day in the future.

1. Backend Data Organization

Feature Flag table

Of course, the simplest solution is to create a specific table for storing the all the feature flags in the database. The table may looks like this

{
  featureName: <string>,
  released: <bool>,
  enabledList: <array>, // enabled clients list
  disabledList: <array> // disabled clients list
}

The above mentioned data structure may be suitable for the case your system has a lot of users. You can simply add some admin user to the enabledList and test the new feature on production before releasing it to your users.

Inline User feature data

If your product is to serve business clients, you can also store the enabled feature directly to the client object itself. This can save you extra queries to the database to get the feature information. If that’s the case, your Client object might look like this

{
  clientId: <string>,
  enabledFeatured: <array>
}

Unix Permission style

Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course. Probably this was taught in the University but I don’t remember anything, I have no idea about its definition and applications until I take this course. Part 1 here Binary Heap & Heapsort Summary - Part 1 - Binary Heap

The Idea

  • Start with array of keys in arbitrary order
    arbitrary order
  • Create max-heap with all N keys.
    create heap
  • Repeatedly remove the maximum key (in place) to create a sorted array.
    heap sort
Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course. Probably this was taught in the University but I don’t remember anything, I have no idea about its definition and applications until I take this course.

Heap-ordered Binary Tree

Heap-ordered Binary Tree

  • Each node represents a key
  • Parent’s key is not smaller than children’s keys

Array Representation

Array Representation

Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course. Although this was already taught in the University, I remember nothing about it because I haven’t touched it for the long time.

Quick Sort - Duplicate Keys Problem

  • Quick Sort goes quadratic unless partitioning stops on equal keys!
  • ~½N2 compares when all keys equal.
  • B A A B A B B B C C C
  • A A A A A A A A A A A
  • Solve by using 3-way partitioning

3-way Partitioning

Partition array into 3 parts so that:

  • Entries between lt and gt equal to partition item v
  • No larger entries to left of lt
  • No smaller entries to right of gt

3-way

  • Let v be partitioning item a[lo]
  • Scan i from left to right.
    • (a[i] < v): exchange a[lt] with a[i]; increment both lt and i
    • (a[i] > v): exchange a[gt] with a[i]; decrement gt
    • (a[i] == v): increment i
Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course. Although this was already taught in the University, I didn’t even know that it can be used for Selection Problem

Selection Problem

Given an array of N items, find a kth smallest item. For example, an array A = [5, 2, 9, 4, 10, 7], the 3rd smallest item is 5, the 2nd smallest item is 4 and the smallest item is 2

The idea

  • Based on Quick Sort
  • Partition the array so that
    • Entry a[j] is in place.
    • No larger entry to the left of j
    • No smaller entry to the right of j
  • Repeat in one sub-array, depending on j, finished when j equals k
Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course. Although this was already taught in the University, I remember nothing about it because I haven’t touched it for the long time.

Quick Sort

The Idea

  • Shuffle the array.
  • Select one item, can be the first item or last item as the pivot (the partitioned item).
  • Partition the array into 2 parts, so that
    • The pivot entry is in the right place
    • No larger entry to the left of the pivot item
    • No smaller entry to the right of pivot item
  • Sort each piece recursively.

Alt Text

Read more

Nothing special here. It’s just a blog post for summarising my algorithm learning course. Here are some questions related to merge sort.

Merging with smaller auxiliary array

Question

Suppose that the subarray a[0] to a[n-1] is sorted and the subarray a[n] to a[2*n-1] is sorted. How can you merge the two subarrays so that a[0] to a[2*n-1] is sorted using an auxiliary array of length n (instead of 2n)?

Answer

Instead of copying the whole array into the auxiliary array, copy only the left half to that auxiliary array (a[0] to a[n-1]). At this time, the first half of the original array is free to overwrite any value. Apply merge sort and write the result back to the original array, from the first position. You will always have enough space in the original array to do that.

Counting inversions

Question

An inversion in an array a[] is a pair of entries a[i] and a[j] such that i<j but a[i]>a[j]. Given an array, design a linearithmic algorithm to count the number of inversions.

Read more