Ceci n'est pas un pipe: outubro 2012

Fun with variable typing in C++

Warning. The following listing contains disturbing code. If you are the type of viewer who is easily disturbed by the sight of templates, lambda expressions and other diabolic concoctions do not read this post.

VariantValue factorial(vector<VariantValue> args)
{
    return expr_switch(args, {
        {pattern(0), []{ return 1; }},
        {pattern(Type<int>()), [&](){ return args[0].value<int>()
            * factorial({args[0].value<int>()-1}).value<int>(); }}
   });
}

Ok, I have to admit that this is the ugliest factorial function I've ever written. But that is not the point of this example. Let me first tell you how I came to this. I was reading the PhD thesis of Joe Armstrong [1], the author of the Erlang programmming language. As you may know, Erlang is a functional language and is dynamically typed. In Erlang you define functions as a sequence of head-body clauses where the head consists of a patterns that must be matched for the corresponding body to be executed. So, for example, you could write the factorial funtion like this:

factorial(0) -> 1;
factorial(N) -> N * factorial(N-1).

In this example the first clause is executed if the argument matches the value "0". If it doesn't the second clause is executed and the argument is bound to the identifier "N". If the arguments don't match any clause, an exception is raised. This mechanism has the following advantages:

When a body is executed, most preconditions have been checked, so there is no need to perform further tests.
If no clause is matched, processing stops immediately.

In other words, no defensive programming is necessary and errors are detected early. In his thesis Armstrong argues that this intentional programming together with fast failure detection is fundamental for building reliable and fault-tolerant systems. So this made me think how we could do the same thing in C++11.

As it happens I've been building a reflection system for C++11 (check out my
github page [2]) and it's most important building block is a class called VariantValue that enables dynamic typing. This class is similar to boost::any,
but it has some enhancements that make it much more usable (but this is for
another post). Anyway, I decided to see if I could emulate this mechanism using this VariantValue and C++11 lambdas. It turns out I can... sort of.

But first things first. Let me introduce you to the basic API of this VariantValue:

class VariantValue {
public:
    // set the content to type T passing a
    // variable number of arguments to its constructor
    template<class T, class... Args>
    void construct(Args&& as);

    // return true if the contained type is T
    template<class T>
    bool isA() const;

    // get the value as T, throws exception if type is wrong
    template<class T>
    T value() const;

    // same as above, but can try a several conversions
    template<class T>
    T convert() const;

};

These variants have value semantics and can be put inside containers. So a
function with dynamic arguments can have, for example, a vector<VariantValue> as argument. Writing code to test if each position contains a certain type and maybe a specific value is easy, but we want a generic solution. What we want is to pass a list of types and values to the matching function and let it figure out if a sequences of variants matches the pattern.

The problem is that values and types, like oil and water, don't mix in C++, at
least not without an emulgent. Because encoding a value as a type is much more difficult, we will represent a type as a value.

template<class T>
struct Type {
    typedef T type;
};

We can use temporaries of our "emulgent" class as placeholders for values of a specific type. Next on our list is the matching engine. As you may have
guessed, this involves templates, and worse, of the variadic kind. Because there is no specialization for fucntions, we will define a little helper template. First we declare the generic template:

template<class... A>
struct matcher_impl;

Then we need to handle the case of the empty argument list:

template<>
struct matcher_impl<> {
    static bool match(vector<VariantValue>& v,
                      vector<VariantValue>::iterator it)
    {
        return it == v.end();
    }
};

If there is no argument left, we expect the iterator to have reached the end of the argument vector. This is because we want to match the exact number of arguments.

Next we define the case for the argument list beginning with a value:

template<class A1, class... As>
struct matcher_impl<A1, As...> {
    static bool match(A1&& a1,
                      As&&... as, vector<VariantValue>& v,
                      vector<VariantValue>::iterator it)
    {
        if (it == v.end()) {
            return false;
        }
        if (!it->isA<A1>()) {
            return false;
        }
        if (it->value<A1>() != a1) {
            return false;
        }
        return matcher_impl<As...>::match(
            std::forward<As&&>(as)..., v, ++it);
    }
};

As you can see this is fairly straight-forward. We check if the iterator is
valid, check the type, check the value and incremente the iterator before we
"recurse". The case for the type placeholder is very similar, except that we
omit the value check.

template<class A1, class... As>
struct matcher_impl<Type<A1>, As...> {
    static bool match(Type<A1> t1, As&&... as,              
                      vector<VariantValue>& v,
                      vector<VariantValue>::iterator it)
    {
        if (it == v.end()) {
            return false;
        }
        if (!it->isA<A1>()) {
            return false;
        }
        return matcher_impl<As...>::match(
            std::forward<As&&>(as)..., v, ++it);
    }
};

And finally we can write a front-end function to hide all this template ugliness. For a reason that will be apparent soon, instead of returning
immediately a bool, we return a function object that takes a vector as
argument and checks if it matches the pattern:

template<class... As>
function<bool(vector<VariantValue>&)> pattern(As&&... as)
{
    return [&](std::vector<VariantValue>& v) {
        return pattern_impl<As...>::match(
            std::forward<As&&>(as)...,
            v, v.begin());
    };
}

And finally we want an expression that selects the first match in a list of
patterns and calls the corresponding body or fails. But we don't want to write it by hand every time we define a function. Fortunately with initializer_lists we can write a function that does all this without being too ugly:

VariantValue expr_switch(
    vector<VariantValue>& values, 
    vector<pair<function<bool(vector<VariantValue>&)>, 
    function<VariantValue()>>> pairs)
{
    for (auto pair: pairs) {
        if (pair.first(values)) {
            return pair.second();
        }
    }
    throw logic_error("unmatched arguments");
}

Voilà! We take a list of pairs where the first element is the pattern matching
lambda and the second is the function body. We call the pattern matching
lambdas in sequence and return on the first match. If no pattern is matched we raise an exception.

Now that you saw the mechanims behind the factorial function lets see how to call it:

std::cout <<
    "factorial of 4 = " <<
    factorial({4}).value<int>() <<
    std::endl;

It could be worse, right? Another interesting example is to define a function
to treat the dreaded Square and Rectangle example (is a square a special case of a rectangle or the other way around? [3]):

struct Square {
    int w;
};

struct Rectangle {
    int w;
    int h;
};

VariantValue area(vector<VariantValue> args)
{
    return expr_switch(args, {
       {pattern(Type<Square>()), [&]() -> VariantValue {
            return args[0].value<Square&>().w
                  *args[0].value<Square&>().w;
        }},
        {pattern(Type<Rectangle>()), [&]() -> VariantValue {
            return args[0].value<Rectangle&>().w
                  *args[0].value<Rectangle&>().h;
        }}
    });
}

Not bad. The only thing that bothers me is that in the function body I'm forced to refer to the generic vector of arguments. It would be nice if I could give a name to the type placeholder in the pattern. Then in the function body I could refer directly to this name and no further type conversions would be needed. I haven't figured out how to do this and I think it's not possible.

I have only written patterns for types and values, but it would be easy to
extend this basic framework to do more advanced pattern matching. Using the same idea of the type placeholder, we could have a checking type placeholder. For example, Between<int>(1, 10) would match an integer between 1 and 10.

References:

sábado, 20 de outubro de 2012

Fun with variable typing in C++