Reflection in C++... could we nail it now?

Maybe you’ve seen multiple times the symbol of Ouroboros in different movies and games that about a fancy snake who biting its own tail, despite whether it’s painful (just hope those serpents would have a better control when biting themselves!), these kind of self-dependent problems emerging often when we started to consider one of the most desired features in C++ — how could we inspect the information of the language itself in runtime?

Anybody who has been writing not only std::cout << "hello world!" for a certain amount would know that C++ is a static, nominative, partially inferred programming language, a language designed to be written in a human-readable format and then compiled directly into a bunch of CPU-readable instructions. The only information we could get in touch within a runtime environment is those values stored on heap and stack since after an executable file was loaded into main memory and started to execute, the available scope of the program itself is bound by its own virtual mapped memory space. This would be fine for quite a lot usage cases if we’d only expect C++ as a tool to drive the computer to work for our certain computational tasks, then we’d write the human-readable source code with our well-designed logic business model, and later let the compiler translate them into whatever a CPU wants. But when you started to build a framework, a middleware, a tool for tool production, or anything complicated enough to scare people, there would often appear a requirement that you’d want the program to know itself in a human-readable way and feedback the user these pieces of information in runtime. For example, the simplest case is you want to give the user a magic log printing function, that could automatically print the variable’s name:

int IAmAnInt = 42;

void MagicLogger(int rhs)
{
	std::cout << "The variable ";
	//
	// Imagine a magic here
	//
	std::cout << " is ";
	//
	// Imagine another magic here
	//
}

The result we’d expect is:

The variable IAmAnInt is an Int

Super unfortunately, I randomly grab a version of ISO C++ Standard draft and it is 1448 pages long, which described all the necessary details of the language that could help us keep building our modernism today. We could access a memory address, we could calculate a vector multiplication on CPU by SIMD, all directly with the original syntax. But still, we can’t get the one-line magic logger through the language itself, C++ is still way too low-level! Even with some of the RTTI (Run-time type information) features like typeid() or std::type_info or <type_traits>, what we’d got is only a piece of elementary type information with quite a lot limitations. If we want something fast, then we give up the indirectivity, if we want something noobie-friendly, we give up the speed, what a frustrating reality we are in!

All we need is just a little patience

The ability of a program to inspect itself is called introspection, and if it could furthermore modify itself then it’s called intercession, the combination of these two abilities is named as reflection. As we’ve known, C++ is designed to be statically assembled but with the ability to perform dynamic behavior. All the execution procedures are pre-defined by the programmer, and we’d use the conditional branch and the polymorphism to achieve dynamic in runtime. The actual implementation of the virtual function pointer lookup is called vtable, it is fully hidden by the compiler. Furthermore with the limited RTTI feature like dynamic_cast<T> we could say we’ve almost gotten the intercession in C++.

But when we want some flexible introspection, the language itself didn’t provide us enough facilities. Like the previous example of the magic logger, we can’t get any human-readable information about the source code, since all the executable files or libraries are binary, they are CPU instructions some 0 and 1, unless we would manually bury some strings inside of the program we can’t preserve these pieces of information. So now we’d like to modify a little bit previous example:

struct IAmAnInt
{
	const char* VarName = "IAmAnInt";
	const char* TypeName = "Int";
	int VarValue = 42;
}

void MagicLogger(IAmAnInt rhs)
{
	std::cout << "The variable ";
	std::cout << rhs.VarName;
	std::cout << " is ";
	std::cout << rhs.TypeName;
}

Congratulations! You just invented Java or C#. All these higher-level interpreted languages would like to model all the fundamental types as the object, everything is a certain object and all objects came from, well, another mother-of-all object. Then for what we still play around with C++? If we had to write every variable instance with a type declaration then it’s really a waste of life. So you may come with an idea to use the template:

template <typename T>
struct Var
{
	Var(T&& rhs, const char* varName, const char* typeName)
	{
		VarValue = rhs;
		VarName = varName;
		TypeName = typeName;
	};
	const char* VarName;
	const char* TypeName;
	T VarValue;
};

template <typename T>
void MagicLogger(const Var<T>& rhs)
{
	std::cout << "The variable ";
	std::cout << rhs.VarName;
	std::cout << " is ";
	std::cout << rhs.TypeName;
}

And the user code:

int main()
{
	auto v = Var(42, "v", “Int”);
	MagicLogger(v);
	return 0;
}

It’s more like a Java or C# now. Even before you doubt the waste of the time to write the type name each time, another question comes immediately, what if it’s a user-defined type?

struct MyType
{
	Var<int> AInt;
	Var<float> AFloat;
}

And now you’d have to provide some boilerplate constructor to initialize the name like:

MyType(int v1, float v2)
{
 AInt =  Var(v1, "AInt", “Int”);
 AInt =  Var(v2, "AFloat", "Float");
}

But you want flexibility, you want extensibility, then later you may go back to the declaration of the Var or MyType and try to modify it to be variadic, and then you may get a:

template <typename... Ts> struct Var {};

template <typename T, typename... Ts>
struct Var<T, Ts...> : Var<Ts...>
{
  Var(T t, Ts... ts) : Var<Ts...>(ts...), tail(t) {}

  T tail;
};

This, basically is a hand-crafted tuple template, and then you’d dig your head into the painful path of the template specialization and workaround for the name string and so on and on… And even you’d like to give up and use std::tuple, another problem about how to generate back the parameter pack would stuck you up for a while. And furthermore, every member of the custom type needs a variable name and type name, just imagine how your custom type would have lots of intrinsic data embedded inside. And this is just the case of the data member, what if the user wants to inspect an array? What about inheritance? We don’t even consider the function yet, oh gosh.

Godmode activated

When the serpent bites its tail, it could never fully swallow itself, and when we want to inspect C++ only by C++ itself, we could never free ourselves from the feature dependency deadlock. But if we stand a step back, what we want is just the information about the source code itself, it is not even fully related to C++, in another language case what we want is still the same thing, and all of them would be written exactly once into the source file by our hands. So rather than we expect the language would have features for inspection, why don’t we generate the information ahead of the actual build time?

Let’s take a look at how a text source file would be processed into a binary executable file:

…“Preprocessing phase.The preprocessor (cpp) modifies the original C program according to directives that begin with the ‘#’ character…” — Computer Systems - A Programmer’s Perspective, Third edition

The most fully extended version of the source file is the Translation Unit (abbr. TU, or Compilation Unit), it contains all the code we’ve written inside the original source file but also with the replacement of the inclusion directives (#include) and macros (e.g. #ifdef) to the actual content, and you could imagine it should be a longer source file if we included any mega headers like STL or Boost. It seems a good stage for us to get some additional pieces of information.

And now we need to find that manual “Emergency Stop” button of the build toolchain, that we could get the actual TU files in hand. If you are a GCC user then simply gcc -E a source file and you’ll get the preprocessed source code. When switching to Clang it’s exactly the same clang -E, or clang --preprocess. And if you are sticking with MSVC then CL /P /C. Basically the preprocessor is just a text parser with a functionally to replace the directive keyword. And if we could get a TU file, then the only step left is to extract human-readable information from it. It doesn’t matter how we implement the extraction feature, we could write a Python script, we could build yet another C++ program, all choices are possible and flexible, because we won’t have any limitations if we change our goal from “Reflect C++ by C++” to “Extract some text from a file”!

So let’s start from a simplest case, a static primitive type variable declared in global scope:

int IAmAnInt;

And we would like to utilize the introspection info in the same C++ project later, then we could declare a type to describe the data of the data, or we call it metadata:

struct Metadata
{
	const char* VarName;
	const char* TypeName;
};

What we want is something like this:

Metadata IAmAnInt_Metadata = { "IAmAnInt", "Int" };

But this line of the definition should be generated automatically, not handcrafted, and we could query it in runtime, this means whether we need to associate the actual variable to the metadata ahead of runtime or we’d provide a runtime mechanism to map a memory address of IAmAnInt to the IAmAnInt_Metadata. As we’ve talked about before it’s impossible to achieve the last design without any intrusive modification to the original variable. So we’d have to still write some glue code and I would use a macro to solve it:

#define GetMetadata(name) name##_Metadata

And the magic logger would be implemented just like:

void MagicLoggerImpl(const Metadata& rhs)
{
	std::cout << "The variable ";
	std::cout << rhs.VarName;
	std::cout << " is ";
	std::cout << rhs.TypeName;
}

#define MagicLogger(name) MagicLoggerImpl(GetMetadata(name))

And finally in user code things should be as simple as:

	MagicLogger(IAmAnInt);

And you’d get in the console:

The variable IAmAnInt is Int

Now the only thing left is how to generate the metadata! We’ve got all the idea about how to fly to Mars, now what we don’t have is just 1 trillion euros to craft the rocket!

Доброе утро, мистер Матриошка

Quite a lot reflection solutions (this nice post by Jeff Preshing, also this nice post by Konstantin Knizhnik, yet another nice post by Veselin Karaganev, the reflection lib rttr and other uncountable projects) I’ve seen chose an compromised approach, they use macro to generate the metadata by requiring user to write additional marker codes:

struct TestStructA
{
	char* testChar;
	bool testBool;
	int testInt;
};

REFLECT_BEGIN()
REFLECT_STRUCT(TestStructA)
REFLECT_FIELD(char_ptr, testChar)
REFLECT_FIELD(bool, testBool)
REFLECT_FIELD(float, testInt)
REFLECT_END()

This approach is totally fine with a minimum additional overwork, the actual reflected code would be generated directly by the preprocessor, but the overall cons are also significant — it’s macro, nothing would be straightforward within a macro-rich environment. And from another point of view, we’ve provided the unique information once by the structure declaration, then for what to write it again in a different form just to get some info we provided by ourselves? This kind of duplication is quite a torture for people lazy like me. After all my life is finite, I don’t have to repeat a joke twice if a dude can’t catch up with it!

So my criteria are crystal clear: no additional duplicated code for reflection in the original source file. And my only hope now is crafting my own C++ syntax analyzer to interpret the code and extract the metadata. The fundamental idea is simple, we need to implement a string pattern match with certain policies to cover the necessary C++ keywords. But wait, isn’t this the job that a compiler would do every day? They analyze the TU and translate them into an assembly, if they could analyze TU then that means they’ve had a mechanism to decompose a literal C++ file into a logically related syntax tree (it has a name Abstract Syntax Tree, abbr. AST)! Now all the works left for me are just, find a way to stop the compiler’s work in the middle, and ask them to generate the metadata I wanted instead of the assembly code.

Let’s evaluate the most commonly used 3 C++ frontends today:

MSVC: Nevermind, since it’s a proprietary IDE/compiler, we can’t hijack in.
GCC: GCC is open source, so the access to its C++ frontend is available. But it doesn’t have a native architecture for user extension, any additional custom feature support would require us to intrusively modify the frontend. I’d like to recommend you to have a look at an awesome series of blog post by Roger Ferrer Ibáñez if you would want to deal with it.
Clang: Clang is basically a light-weight GCC frontend which initially designed by Apple but open-sourced (somehow sounds like a Sith joined the rebellions), which now developed and maintained by the LLVM Developer Group. It was naturally modeled with a different workflow from GCC that all modules of Clang are individual libs, so it’s always flexible to be used as a general frontend despite the actual usage cases, whether you’d like to just compile your C-family source to binary or you want to customize out some functionalities like code refactoring or formatting, it would be totally possible with a non-intrusive approach to implement them atop Clang.

It’s obvious that Clang is the most suitable frontend that could help us with our goal, there are even multiple choices inside of Clang. It has LibTooling and LibASTMatchers that we could just write the matcher pattern and then query the source files to get the wanted AST node, the introduction and tutorial demonstrated well enough how to use it. Since I’m more eager to build the wheel by myself and lazy to learn the details of Clang’s AST, I’d like to use another more lower-level lib called LibClang, which is described as a “stable high-level C interface to clang” from the page about how to choose the right interface for your application. My goal is an individual tool that could generate reflected C++ code with full awareness of the original source code, so I think it’s better to get my hand dirtier than have some not so necessary dependencies.

Simple == Brutal

The actual implementation is surprisingly stupid, we just need to #include "clang-c/Index.h", and use the functions it provided to parse the TU as our wish by a Visitor pattern. We need to define the visitor callback which would recursively visit all the cursor children like a typing machine:

CXChildVisitResult visitor(CXCursor cursor, CXCursor parent, CXClientData clientData)
{
	return CXChildVisitResult::CXChildVisit_Recurse;
};

As now we just return the visit result without any additional process, but remember this is the core gameplay mechanism. And then we could load the TU. Before that, we need to create an index object “that consists of a set of translation units that would typically be linked together into an executable or library”, and also indicate that we’d like to parse C++ files rather than C or something else. And I assume you’ve had a file name string:

char* args[] = { "--language=c++" };
auto index = clang_createIndex(0, 0);
auto translationUnit = clang_parseTranslationUnit(index, fileName.c_str(), args, 1, nullptr, 0, 0);

And then we trigger the actual visit from the first cursor position of the TU:

auto cursor = clang_getTranslationUnitCursor(translationUnit);
clang_visitChildren(cursor, visitor, nullptr);

And finally dispose all resources after the job done:

clang_disposeTranslationUnit(translationUnit);
clang_disposeIndex(index);

That’s all the code you were promised to write when someone trying to sell you a “Learn C++ in 30 days”. The actual implementation of the visitor callback is definitely underestimated here, just imagine you’d parse a random C++ file that written by one of your teammates, that almost every kind of language features appeared inside. But in practice the most scenes that require reflection support are those custom type declarations, we don’t need to reflect every detail of a procedure or a complex folded variadic template often, so we could start from some simple test case and try to extend it later. Let’s take a look at a previous example:

struct TestStructA
{
	char* testChar;
	bool testBool;
	int testInt;
};

A very simple POD type, and if we just print out the CXCursor’s type information by changing the visitor callback to:

CXChildVisitResult visitor(CXCursor cursor, CXCursor parent, CXClientData clientData)
{
	auto cursorKindName = clang_getCString(clang_getCursorKindSpelling(cursor.kind));
	auto cursorEntityName = clang_getCString(clang_getCursorSpelling(cursor));
	std::cout << " CursorKindName: " << cursorKindName << " CursorEntityName: " << cursorEntityName << "\n";
	return CXChildVisitResult::CXChildVisit_Recurse;
};

And what we’d get in the console is:

CursorKindName: StructDecl CursorEntityName: TestStructA
CursorKindName: FieldDecl CursorEntityName: testChar
CursorKindName: FieldDecl CursorEntityName: testBool
CursorKindName: FieldDecl CursorEntityName: testInt

Seems we’re on the correct track! The actual information of a cursor is more detailed and could be queried by different interfaces, and I’d introduce some of them later by examining some of the fundamental keywords in C++ and how they are identified by the LibClang:

Record

Although the record in C++ could be considered only as structure, we still could treat union, class and even enum and namespace as the record, since they all satisfy the definition of a “record” that provides the direct data access ability. The way to evaluate whether a cursor is a record declaration is by invoking:

unsigned clang_isDeclaration(enum CXCursorKind);

The CXCursorKind type is a member of CXCursor, and some of the fundamental declaration keywords in C++ would be:

C++ keyword	In LibClang
struct	CXCursorKind::CXCursor_StructDecl
union	CXCursorKind::CXCursor_UnionDecl
class	CXCursorKind::CXCursor_ClassDecl
enum/enum class	CXCursorKind::CXCursor_EnumDecl
namespace	CXCursorKind::CXCursor_Namespace

But the result would surprise you a little bit because the access specifier is also interpreted as a declaration. So I’d filter out the CXCursorKind::CXCursor_CXXAccessSpecifier case since they would be handled in a different way later.

Access Specifier

I’d evaluate the access specifier of some cursors separately by querying:

CX_CXXAccessSpecifier clang_getCXXAccessSpecifier(CXCursor);

And the result:

C++ keyword	In LibClang
	CX_CXXAccessSpecifier::CX_CXXInvalidAccessSpecifier
public	CX_CXXAccessSpecifier::CX_CXXPublic
protected	CX_CXXAccessSpecifier::CX_CXXProtected
private	CX_CXXAccessSpecifier::CX_CXXPrivate

As the comment of the query interface said:

If the cursor refers to a C++ declaration, its access control level within its parent scope is returned. Otherwise, if the cursor refers to a base specifier or access specifier, the specifier itself is returned.

And as my observation, for a POD type declaration, if there were any type references in the field of it then the actual access specifier is public. Also if it occurred in an inheritance case, then the cursor kind would be a CXCursor_CXXBaseSpecifier rather than CXCursor_CXXAccessSpecifier, although they are literally the same in the source file.

Type

This is somehow an ambiguous word in this context, here “type” means the information that indicates an entity in the source file what its category is, for example like int in the int testIntdeclaration. We could get this type info by:

 CXType clang_getCursorType(CXCursor C);

And CXType is a structure that could be investigated furthermore by lots of functions, like:

CXType clang_getCanonicalType(CXType T);
unsigned clang_isConstQualifiedType(CXType T);
CXCursor clang_getTypeDeclaration(CXType T);

Or we could simply check the CXTypeKind member of it:

C++ keyword	In LibClang
bool	CXTypeKind::CXType_Bool
int	CXTypeKind::CXType_Int
unsigned long long	CXTypeKind::CXType_ULongLong
float	CXType_Float

The types that are not primitive types also have the corresponding CXTypeKind:

C++ keyword	In LibClang
lvalue reference ”&”	CXTypeKind::CXType_LValueReference
rvalue reference ”&&”	CXTypeKind::CXType_RValueReference
pointer ”*”	CXTypeKind::CXType_Pointer
array ”[]”	CXTypeKind::CXType_ConstantArray
custom type	CXTypeKind::CXType_Record

Field

For enumeration type, the field of it would be the enum constants as CXCursorKind::CXCursor_EnumConstantDecl; For a structure, class or union, the field would be the member data as CXCursorKind::CXCursor_FieldDecl. The actual relationship of a record declaration and its member declarations is a simple one-root subtree in the entire AST, so we could choose to store it and use later when we need to evaluate the semantic relationship between the metadatas.

Method

A method in C++ could be a function or any callable object, that basically accepts an (optional) input and return an (optional) result. The cursor kind of normal function is CXCursorKind::CXCursor_CXXMethod. As we all know that a named function (in contrast of a lambda function) must be unique, thus the signature and the resident scope should be different from any of others, so if there were any overloaded or overridden functions then we could and the machine itself could distinguish in between. If you try to print out the display name clang_getCursorDisplayName(cursor) and the entity name (clang_getCursorSpelling(cursor)) of a cursor, you could see the difference when it meets a method, that display name would include the function signature while the entity name is just the function’s literal name. For example, the printed result of the function:

bool testFuncA(int testParm1, const char* testParm2);

Would be:

CursorDisplayName: testFuncA(int, const char *) CursorEntityName: testFuncA

And when writing the export metadata to the file, I use a simple string hash to generate unique but readable names for each function, but you could choose to add a GUID field to the metadata then they could have the same name for display.

Inheritance

We could use CXCursorKind::CXCursor_CXXBaseSpecifier as the pivot to find the base classes because the next few cursors must be CXCursorKind::CXCursor_TypeRef which refers to the base classes, and it would end in a field or a method declaration. For example in a class inheritance hierarchy:

class TestClassA {};
class TestClassB : public TestClassA {};
class TestClassC {};
class TestClassD : public TestClassB, TestClassC {};

The CXCursorKind of them should be:

ClassDecl // "class TestClassA"
ClassDecl // "class TestClassB"
C++ base class specifier // "public"
TypeRef // "TestClassA"
ClassDecl // "class TestClassC"
ClassDecl // "class TestClassD"
C++ base class specifier // "public"
TypeRef // "TestClassB"
C++ base class specifier // "public"
TypeRef // "TestClassC"

And we could assign the inheritance relationship between the metadata later after parsing the entire TU (if we consider this relationship as an abstract data structure, actually it would form a Directional Acyclic Graph).

Inclusion directive

In a real codebase certainly, we would use #include to establish source file relations, but if we have to parse all the codes inside the included files each time then it’s a waste of time and space. LibClang provides us a few useful functions to distinguish the cursor’s origin:

int clang_Location_isInSystemHeader(CXSourceLocation location);
int clang_Location_isFromMainFile(CXSourceLocation location);

We could obtain the source location by:

auto range = clang_getCursorExtent(cursor);
CXSourceLocation location = clang_getRangeStart(range);

Additionally, there is another visitor callback at a higher level to parse TU, that could provide us the inclusion directives’ information:

void inclusionVisitor(CXFile included_file, CXSourceLocation* inclusion_stack, unsigned include_len, CXClientData client_data)

We could dispatch the visit by:

clang_getInclusions(CXTranslationUnit tu, CXInclusionVisitor visitor, CXClientData client_data);

Because we’d like to only parse each file once so it’s important to know whether the included file has been parsed or not, and furthermore with these pieces of information we could later build the exact same inclusion relationships among the generated metadata files.

No free lunch, at least not today

Some of the frameworks (e.g. Unreal Engine, Qt) would give the user the freedom to choose whether to reflect a certain sector or not in the source files, and they require users to markup them by macros. And then the reflected code would be generated by their customized tools. Since we won’t mess us up to such level, we could utilize the custom annotations in C++ which most of the people would ignore — the attributes to get the similar result. Since most of the compilers should simply ignore the unknown attributes if targeting at C++17, we could use them to markup the wanted region in a source file to reflect. One of the design would look like:

#define STRUCT() struct __attribute__((annotate("refl_record")))
#define FIELD() __attribute__((annotate("refl_field")))
#define METHOD() __attribute__((annotate("refl_method")))

STRUCT() TestStructA
{
    FIELD()
	char* testChar;

	FIELD()
	bool testBool;

	FIELD()
	int testInt;

	METHOD()
	float testFunc();
};

But it introduced a tight coupling over the choices of the compiler, that __attribute__ is GNU style while MSVC requires __declspec, and if we use __attribute__ for reflection then we can’t compile the source file successfully with MSVC. A possible workaround would be using another one macro to distinguish between them like:

#ifdef _MSC_VER
#define STRUCT() struct [[refl_struct]]
#define FIELD() [[refl_field]]
#define METHOD() [[refl_method]]
#else
#define STRUCT() struct __attribute__((annotate("refl_struct")))
#define FIELD() __attribute__((annotate("refl_field")))
#define METHOD() __attribute__((annotate("refl_method")))
#endif

Since in my project the overall architecture is more data-driven thus I would choose to reflect on a granularity level between source files, rather than markup the specific sectors inside a type declaration.

And finally, we could implement the actual reflection modules, here is my approach:

Store the LibClang generated metadata in an array, in such kind of format:

struct ClangMetadata
{
	CXString displayName;
	CXString entityName;
	CXCursorKind cursorKind;
	CX_CXXAccessSpecifier accessSpecifier;
	CXTypeKind typeKind;
	CXString typeName;
	bool isPtr = false;
	bool isPOD = false;
	CXTypeKind returnTypeKind;
	CXString returnTypeName;
	size_t arraySize = 0;
	ClangMetadata* inheritanceBase = nullptr;
	ClangMetadata* semanticParent = nullptr;
	size_t totalChildrenCount = 0;
	size_t validChildrenCount = 0;
};

Assign the inheritance relationship (Let’s just consider the single inheritance case for the sake of simplicity):

auto l_clangMetadataCount = m_clangMetadata.size();
for (size_t i = 0; i < l_clangMetadataCount; i++)
{
	auto& l_clangMetadata = m_clangMetadata[i];

	if (l_clangMetadata.cursorKind == CXCursorKind::CXCursor_CXXBaseSpecifier)
	{
		m_clangMetadata[i - 1].inheritanceBase = &m_clangMetadata[i + 1];
	}
}

Generate custom metadata definition by simply writing string into a file (or if you want you could generate the serialization functions too), the runtime metadata has such a structure:

struct Metadata
{
	const char* name;
	DeclType declType;
	AccessType accessType;
	TypeKind typeKind;
	const char* typeName;
	Metadata* typeRef;
	bool isPtr;
	Metadata* base;
};

Include your generated metadata file to the runtime source code.

Here are two header files I used as the test case:

//TestA.h
#pragma once

enum class TestEnumA
{
	case1, case2, case3
};

union TestUnionA
{
	int I;
	float F;
	char* C;
};

struct TestStructA
{
	TestEnumA testEnumA = TestEnumA::case2;
	char* testChar = 0;
	bool testBool = false;
	int testInt = 42;
};

class TestClassA
{
public:
	virtual bool testFuncA(uint64_t testParm1, const char* testParm2) = 0;
	bool testFuncA(float testParm1, const TestUnionA& testParm2);
	short testShort = 4;
	long testLong = 4;
	long long testLongLong = 4;
	float testFloat = 4.20f;
protected:
	double testDouble = 4.2;
private:
	TestStructA testStructA = {};
};

template <typename T>
struct TestTemplateStructA
{
	T testVar;

	template <typename T, typename U>
	bool testFuncA(T testParm1, const U& testParm2);
};

using TestTemplateStructAInt = TestTemplateStructA<int>;

And:

#pragma once

#include "TestA.h"

template <typename ...T>
struct TestTemplateStructB
{
};

template <class T, class... Ts>
struct TestTemplateStructB<T, Ts...> : TestTemplateStructB<Ts...>
{
	TestTemplateStructB(T t, Ts... ts) : TestTemplateStructB<Ts...>(ts...), tail(t) {}

	T tail;
};

using TestTemplateStructBICF = TestTemplateStructB<int, char*, float>;

struct TestStructB
{
	TestEnumA testEnumA = TestEnumA::case2;
	wchar_t* testWchar_t = L"42";
	TestClassA* testClassA = 0;
	int8_t testInt8t = 42;
};

namespace TestNamespace
{
	struct TestStructC : public TestStructB
	{
		int16_t testInt16t = 42;
		int32_t testInt32t = 42;
		int64_t testInt64t = 42;
	};
}

class TestClassB : public TestClassA
{
	bool testFuncA(uint64_t testParm1, const char* testParm2) override { return false; };
};

And the generated metadata files:

//TestA.refl.h
#pragma once

Metadata refl_TestEnumA = { "TestEnumA", DeclType::Enum, AccessType::Invalid, TypeKind::Enum, "TestEnumA", nullptr, false, nullptr };
Metadata refl_TestEnumA_member[3] =
{
	{ "case1", DeclType::EnumConstant, AccessType::Invalid, TypeKind::Enum, "TestEnumA", nullptr, false, nullptr },
	{ "case2", DeclType::EnumConstant, AccessType::Invalid, TypeKind::Enum, "TestEnumA", nullptr, false, nullptr },
	{ "case3", DeclType::EnumConstant, AccessType::Invalid, TypeKind::Enum, "TestEnumA", nullptr, false, nullptr },
};
Metadata refl_TestStructA = { "TestStructA", DeclType::Struct, AccessType::Invalid, TypeKind::Custom, "TestStructA", nullptr, false, nullptr };
Metadata refl_TestStructA_member[4] =
{
	{ "testEnumA", DeclType::Var, AccessType::Public, TypeKind::Enum, "TestEnumA", &refl_TestEnumA, false, nullptr },
	{ "testChar", DeclType::Var, AccessType::Public, TypeKind::SChar, "char", nullptr, true, nullptr },
	{ "testBool", DeclType::Var, AccessType::Public, TypeKind::Bool, "bool", nullptr, false, nullptr },
	{ "testInt", DeclType::Var, AccessType::Public, TypeKind::SInt, "int", nullptr, false, nullptr },
};
Metadata refl_TestClassA = { "TestClassA", DeclType::Class, AccessType::Invalid, TypeKind::Custom, "TestClassA", nullptr, false, nullptr };
Metadata refl_TestClassA_member[8] =
{
	{ "TestClassA_testFuncA_18297869138227429096", DeclType::Function, AccessType::Public, TypeKind::Invalid, "bool (int, const char *)", nullptr, false, nullptr },
	{ "TestClassA_testFuncA_11769634238821699591", DeclType::Function, AccessType::Public, TypeKind::Invalid, "bool (float, const TestUnionA &)", nullptr, false, nullptr },
	{ "testShort", DeclType::Var, AccessType::Public, TypeKind::SShort, "short", nullptr, false, nullptr },
	{ "testLong", DeclType::Var, AccessType::Public, TypeKind::SLong, "long", nullptr, false, nullptr },
};
Metadata refl_TestTemplateStructA = { "TestTemplateStructA", DeclType::ClassTemplate, AccessType::Invalid, TypeKind::Invalid, "", nullptr, false, nullptr };
Metadata refl_TestTemplateStructA_member[2] =
{
	{ "testVar", DeclType::Var, AccessType::Public, TypeKind::Invalid, "T", nullptr, false, nullptr },
	{ "TestTemplateStructA<T>_testFuncA_8483520174557475787", DeclType::FunctionTemplate, AccessType::Public, TypeKind::Invalid, "bool (T, const U &)", nullptr, false, nullptr },
};

And:

#pragma once

#include "TestA.refl.h"

Metadata refl_TestTemplateStructB = { "TestTemplateStructB", DeclType::ClassTemplate, AccessType::Invalid, TypeKind::Invalid, "", nullptr, false, nullptr };
Metadata refl_TestStructB = { "TestStructB", DeclType::Struct, AccessType::Public, TypeKind::Custom, "TestStructB", nullptr, false, nullptr };
Metadata refl_TestStructB_member[4] =
{
	{ "testEnumA", DeclType::Var, AccessType::Public, TypeKind::Enum, "TestEnumA", &refl_TestEnumA, false, nullptr },
	{ "testWchar_t", DeclType::Var, AccessType::Public, TypeKind::WChar, "wchar_t", nullptr, true, nullptr },
	{ "testClassA", DeclType::Var, AccessType::Public, TypeKind::Custom, "TestClassA", &refl_TestClassA, true, nullptr },
	{ "testInt8t", DeclType::Var, AccessType::Public, TypeKind::SInt, "int", nullptr, false, nullptr },
};
Metadata refl_TestStructC = { "TestStructC", DeclType::Struct, AccessType::Public, TypeKind::Custom, "TestNamespace::TestStructC", nullptr, false, &refl_TestStructB };
Metadata refl_TestStructC_member[3] =
{
	{ "testInt16t", DeclType::Var, AccessType::Public, TypeKind::SInt, "int", nullptr, false, nullptr },
	{ "testInt32t", DeclType::Var, AccessType::Public, TypeKind::SInt, "int", nullptr, false, nullptr },
	{ "testInt64t", DeclType::Var, AccessType::Public, TypeKind::SInt, "int", nullptr, false, nullptr },
};
Metadata refl_TestClassB = { "TestClassB", DeclType::Class, AccessType::Invalid, TypeKind::Custom, "TestClassB", nullptr, false, &refl_TestClassA };
Metadata refl_TestClassB_member[1] =
{
	{ "TestClassB_testFuncA_17046919256446738461", DeclType::Function, AccessType::Private, TypeKind::Invalid, "bool (int, const char *)", nullptr, false, nullptr },
};

As you could see here, my data structure design for runtime metadata isn’t so optimized and intuitive, there are tons of unfinished features, for example, add syntax sugars like some getter and logger functions or organize all the metadata into a unified list for memory coherency. But as soon as you started to swim in the water of LibClang, you’ve already been exposed to all the possibilities that a frontend could have in front of you, and you could design your own reflection pipeline based on your specific usage. But there are still too many (not so) edge cases I didn’t talk about, for example how to cover the specialized template since they would be expended in compile-time, and how to solve the multi-references between metadata with the respect of complex inheritance and so on. I would keep enriching the reflection module in my project, you could take a look at the actual code about how it is finally implemented. And since the language support for the reflection that has been postponed to the next standard C++23, until then I thought I might still focus on crafting my reflection mirror for C++. But maybe it’s a good breaking-up moment from C++ and finds new ones like Dlang or Rust? After all, it’s not an easy way to breath if you have to craft oxygen on your own!

Published Dec 25, 2019

Random randomness in randomized randomization.Hang Zhang on Twitter