Flake: A Decentralized, K-Ordered Unique ID Generator in Erlang

Posted by on January 12th, 2012

At Boundary we have developed a system for unique id generation. This started with two basic goals:

  • Id generation at a node should not require coordination with other nodes.
  • Ids should be roughly time-ordered when sorted lexicographically. In other words they should be k-ordered 1, 2.

All that is required to construct such an id is a monotonically increasing clock and a location 3. K-ordering dictates that the most-significant bits of the id be the timestamp. UUID-1 contains this information, but arranges the pieces in such a way that k-ordering is lost. Still other schemes offer k-ordering with either a questionable representation of ‘location’ or one that requires coordination among nodes.

With these goals in mind, we started with Twitter’s snowflake scheme 4 as a conceptual framework. The first thing we noticed is that, in Twitter’s scheme, ids are limited to 64-bits in width meaning some trickery is necessary to cram both the location and timestamp into a small space. First, a new epoch is defined to reduce timestamp size. In addition, workers coordinate with a zookeeper ensemble on initialization to atomically grab a worker id representing ‘location’.

While these are fine tradeoffs that allow ids to fit into a small space, we found that they didn’t align well with the goals I mentioned earlier. We decided to build a generator that widens the address space to 128-bits in order to accommodate millisecond-resolution unix timestamps and a larger worker id to eliminate coordination between nodes doing id generation. In doing so we’ve created our own id generation service that not only meets our original goals but also has no external dependencies, has incurred next to no operational cost, and has been running problem-free for months.

Anatomy

Boundary snowflake ids are 128-bits wide composed as follows:

flake

Notice that we reserve a full 64-bits for millisecond timestamp resolution, which means we don’t have to do any truncation or redefinition of the epoch in order to fit the timestamp into the high order bits of an id.

The 48-bit worker id is typically the MAC address of a particular hardware interface on the machine doing the id generation. A flake server pulls its worker id from a configurable hardware interface on startup. This means that one flake server can run per network device on a node in a distributed system without risk of collision.

The remaining 16 bits hold the sequence id, which is incremented each time an id is requested in the same millisecond. This means that each server can produce 2^16 – 1 unique ids per millisecond without overflow.

Notice that the constituent parts do not favor a particular implementation or runtime. Our offering is in Erlang, but implementing in your language of choice is straightforward.

Implementation

Our flake server is built as a standalone Erlang/OTP application which will generate ids on behalf of any application capable of speaking Erlang distribution protocol. At this time all clients in our infrastructure in need of id generation speak Erlang, but our roadmap includes an HTTP interface.

Ids can be given back as raw binary or encoded in a base of your choice.

The following example function will get a new flake id from a running server as a 128-bit Erlang binary.

flake() ->
    Node = {flake, flake@localhost},
    {ok, FlakeId} = gen_server:call(Node, get),
    {ok, FlakeIdBase62} = gen_server:call(Node, {get,62}),
    %% example id decomposition for demonstration only
    %% <<_Time:64/integer,_WorkerId:48/integer,_Sequence:16/integer>> = FlakeId,
    FlakeId.

Caveat emptor — while base-62 encoding ids in the scheme described here (without padding) will produce k-ordered ids for the next ~300 years, using a small base may require ids be given leading padding to ensure k-orderedness.

Code

https://github.com/boundary/flake

Credits

1 T. Altman and Y. Igarashi. 1989. Roughly sorting: sequential and parallel approach. J. Inf. Process. 12, 2 (January 1989), 154-158, ACM

2 Yoshihide Igarashi and Derick Wood. 1991. Roughly sorting: a generalization of sorting. J. Inf. Process. 14, 1 (January 1991), 36-42, ACM

3 Lamport Timestamps

4 Snowflake – scheme originally developed at twitter. It differs primarily in that we allow ourselves a wider address space in which to fit ids which means we don’t have to think about timestamp truncation or coordination of worker ids.

5 James Golick’s lexical_uuid – a scheme similar to our own also requiring no coordination. Worker ids are based on the fqdn of the machine generating ids and the lexical ordering behavior is slightly different.

  1. Hi,

    I’m having problems with flake. I’ve started it running in terminal (on mac) and it shows as running on flake@127.0.0.1. I then run this code in my app:

    createToken() ->
    Node = {flake, “flake@127.0.0.1″},
    {ok, FlakeId} = gen_server:call(Node, get),
    FlakeId.

    but this raises an error because the gen_server call can’t find the node. If I run net_adm:ping(‘flake@127.0.0.1′). in the terminal, I get pang.

    I’m not exactly an Erlang pro; so what exactly do I do to get the flake server to be discoverable?

    Thanks,
    Lee

    Comment by Lee Sylvester on April 1, 2013 at 9:53 am