segunda-feira, 23 de julho de 2012

C++ name demangling

C++ name demangling


As you probably know, C++ compilers employ a scheme called name mangling to produce unique names that can be handled by traditional C linkers. This is because language features such as function overloading and templates allow you create different functions with the same name. Because overloaded functions must have different signatures, compilers take the function signature and generate a name that contains the name and the parameter type information encoded as a string.

For example, g++ encodes
int foo(double d, float a);
as
_Z3foodf

While the C linker is perfectly happy with the mangled names, for most programmers they are not easily readable. Fortunately all mangling schemes I'm aware of allow you to decode the mangled name back into C++ names.
Even better, some of them provide libraries to do this for you. Of course, these libraries are not part of the C++ standard and therefore are not portable.

GNU mangling

The g++ compiler adopts the mangling scheme of the Itanium ABI standard. This scheme is recognisable by the "_Z" preceding the function name. G++ provides a demangling function in the header cxxabi.h:

namespace abi {
  char* __cxa_demangle(const char* mangled_name, char* output_buffer, size_t* length, int* status);
}

This function takes a string containing the mangled name and returns a the mangled name in a buffer that must be preallocated with malloc. If the buffer is not big enough for the demangled name, the function tries to obtain more space calling realloc. You can also pass NULL for the second and third arguments. In this case the function will allocate a buffer for you. The status argument is used to give more information in case of failure.

With this function in hand we can write a little command line utility to demangle names passed as command line arguments:

#include <iostream>
#include <cxxabi.h>
#include <stdlib.h>
using namespace std;

int main(int argc, char* argv[])
{
    int status = 0;
    char* demangled = abi::__cxa_demangle(argv[1], nullptr, nullptr, &status);
    switch (status) {
        case 0:
            cout << demangled << endl;
            break;
        case -1:
            cout << "Error: memory allocation failure" << endl;
            break;
        case -2:
            cout << "Error: " << argv[1] << " is not a valid mangled name"  << endl;
            break;
        case -3:
            cout << "Error: invalid argument"  << endl;
            break;
    }
    if (demangled) {
        free(demangled);
    }
    return 0;
}

For our previous example, the output will be:

> ./demangler _Z3foodf
foo(double, float)

This function is useful even for windows because MinGW compiled binaries use the same mangling scheme. This function is also available for programs compiled with clang, at least in the case where clang is using g++'s standard library

Microsoft mangling


Microsoft's Visual C++ compiler uses it's own proprietary mangling scheme. The names it generates always start with a "?". Microsoft API provides the unmangling function UnDecorateSymbolName in the imagehlp.h system header. Like __cxa_demangle, it takes input and output buffer but you are responsible for providing a buffer that is long enough to contain the unmagled name.  The last argument to this function is a set of flags that allow you to extract only the information of the you are interested in. A version for windows of our command line utility could look like this:

#include <windows.h>
#include <imagehlp.h>
#include <iostream>
using namespace std;

int main(int argc, char* argv[])
{
    char buffer[1024];
    DWORD result = UnDecorateSymbolName(argv[1], buffer, sizeof(buffer), UNDNAME_COMPLETE);

    if (result) {
        cout << buffer << endl;
    } else {
        cout << "Error: demangling failed. error code = " << GetLastError()  << endl;
    }
    return 0;
}

References:

http://en.wikipedia.org/wiki/Name_mangling
http://www.kegel.com/mangle.html
MSDN UnDecorateSymbolName

Epilogue: What if I don't want a function name to get mangled?

If you need a symbol to be compiled unmangled just declare it as extern "C":
extern "C" int foo(double d, float a);
or
extern "C" {
    int foo(double d, float a);
}

But remember, if you do this you're back to C, you cannot overload names declared as extern "C".

Nenhum comentário:

Postar um comentário