blog tags:


I'm Dmitry Popov,
lead developer and director of Infognition.

Known in the interwebs as Dee Mon since 1997. You could see me as thedeemon on reddit or LiveJournal.

Articles Technology Blog News Company
Static introspection for message passing
April 23, 2016

There is a famous in PLT community paper on free theorems, giving praise to generics and parametricity allowing you to infer a lot about some generic function behavior just from its type, because since such generic function does not know anything about the type of value it gets, it cannot really do much with this value: mostly just pass it around, store somewhere or throw away. On the other end of generics spectrum (call it "paid theorems" if you will) we have languages like D where a generic function can ask different questions about the type it gets as argument and depending on the answers, i.e. properties of the passed type, do one thing or another. I think this end is more interesting and practical.

In my recent D project the app consisted of two different processes sending messages to each other through pipes. Types of those messages are defined as simple structs, sometimes empty, sometimes containing simple values, sometimes arrays of strings or other structs:

struct MsgMoveFilter {
    int curPos, newPos;

struct MsgStartFilterScan {}

struct VDFilterDesc { 
    string filename, name, desc, author; 

struct MsgFoundFilters {
    VDFilterDesc[] vdfs;

struct MsgCodecLists {
    string[] videoCodecs;
    string[] audioCodecs;

struct AudioFormat {
    int freq, nchan, kbps;

struct MsgAudioFormats {
    AudioFormat[] formats;
    int selected;

For serialization purposes I'm using Cerealed library that uses compile-time introspection to convert pretty much any type to an array of bytes and back. This array of bytes gets sent down the pipe, prepended by a simple header of two words: message type id and length of data. This is how it looks:

void send(Msg)(File pipe, ref Msg msg) { 
    auto enc = Cerealiser();
    enc ~= msg;
    uint[2] header = [MsgTypeHash!(Msg), enc.bytes.length];

Here Msg is a type argument, we can send messages of lots of different types. MsgTypeHash is my compile-time function that maps a type to a number:

enum MsgTypeHash(Msg) = hashOf(Msg.stringof ~ Fields!Msg.stringof);

Here I use Fields function from std.traits module of D's standard library, it returns a list of types: the types of Msg's fields. This type list is then converted to a string like "(AudioFormat[], int)" and the real name of Msg is added, giving a string like "MsgAudioFormats(AudioFormat[], int)", then a hash of that string gets calculated (thanks to D's compile-time function execution) and returned. With this approach I don't have to write out and maintain a list of ids for different message types, they are generated automatically and get updated whenever there are changes in a message structure, nothing ever goes out of sync.

The receiving side reads the two-words header with message id and length, and it needs to deserialize the data to a struct of appropriate type and dispatch it to proper handler. For each message type I have a handler function that receives a struct with the message and performs some action. These handler functions, of course, live in a state monad and implicitly receive a reference to state data, in other words, they are methods of a class. And they all look alike:

    void react(MsgMoveFilter m) { ... }
    void react(MsgStartFilterScan m) { ... }
    void react(MsgFoundFilters m) { ... }

The function for receiving and dispatching messages looks really short and simple:

void receive(Reactor)(File pipe, Reactor r) {
    uint[2] headerBuf;
    auto header = pipe.rawRead(headerBuf);
    if (header.length < 2) throw new CommException("eof"); 
    switch(header[0]) {
        foreach(T; MessageTypes!Reactor) {
            case MsgTypeHash!T:
                auto data = header[1] > 0 ? pipe.rawRead(new ubyte[header[1]]) : null;
                return r.react(decerealise!T(data));
        default: ... // handle the unknown message case

This function doesn't know in advance which class will be reacting to messages, its type is passed as an argument Reactor, however it presumes a thing or two. It calls my compile-time function MessageTypes which maps a type to a list of types of messages that the passed class can react to:

alias MessageTypes(C) = staticMap!(Parameters, MemberFunctionsTuple!(C, "react"));

Here some functions from the standard library are used to first get a list of all C's methods called "react", map this list with a function that extracts types of their arguments, producing a list of message types.
Then my receive function walks this list of message types with
foreach(T; MessageTypes!Reactor), so on each iteration step the type T is different. From this type T a hash gets calculated (MsgTypeHash!T) giving the id for this type of messages. This value is used in case clause, so the following line turns into something like
case 264541:. In the following two lines we read data from the pipe, call deserialization function for this type, and the deserialized message as a struct of type T is passed to proper overload of react method.
All this foreach loop is inside a switch, and it gets unrolled to as many case clauses as there are "react" overloads in Reactor type argument. Each clause is dealing with its own message type, calling different versions of deserialization functions and "react" methods. And that's all the source code! No need to list out all the messages and their ids, risking to forget to update the list. Empty messages are serialized to zero bytes, i.e. only the two-words header is sent. Because their types are named differently they have different message ids, so we can still tell them apart and call appropriate react overloads. In case of a hash collision the compiler will show an error about repeating case clauses, but so far there has been no collisions in practice.

I find this compile-time instrospection and metaprogramming power really amazing, and it makes D language quite addictive.