Description
As we all know, macros pose a problem for the 3C rewriter because it works at the level of the expanded code and does not know how it can (or should!) change a macro definition to produce the desired expansion. IIUC, currently, when 3C sees a pointer inside a macro expansion, it does one of two things:
- Constrain the pointer's annotation to not change. Drawback: This reduces the extent to which 3C can achieve its goal of making the program checked.
- Inline the macro expansion and then rewrite as usual. Drawback: This defeats the purpose (whatever it is) for which the user defined the macro in the first place.
Below are a few ideas of other things 3C could do that might be more useful.
3. Insert proposed expansions and let the user update the macro later
Inline the rewritten macro expansion as in (2), but leave the original macro call inside a preprocessor macro that discards it. The user can later go back and update the macro definition to produce appropriate expansions.
Consider this example C program for manipulating vectors in three-dimensional space (contrived and I may be wrong on some details, but hopefully it makes the point):
#define VEC3_PARAM(v) double *v
int getX(VEC3_PARAM(vec)) { return a[0]; }
int getY(VEC3_PARAM(vec)) { return a[1]; }
int getZ(VEC3_PARAM(vec)) { return a[2]; }
int getCoordinate(double *vec, int axis) {
assert(0 <= axis && axis < 3);
return vec[axis];
}
3C would produce the following output:
#define _3C_REWRITTEN(orig,new) new
#define VEC3_PARAM(v) double *v
int getX(_3C_REWRITTEN( VEC3_PARAM(vec), _Array_ptr<double> vec : count(1) )) { return a[0]; }
int getY(_3C_REWRITTEN( VEC3_PARAM(vec), _Array_ptr<double> vec : count(2) )) { return a[1]; }
int getZ(_3C_REWRITTEN( VEC3_PARAM(vec), _Array_ptr<double> vec : count(3) )) { return a[2]; }
int getCoordinate(_Array_ptr<double> vec : count(3), int axis) {
_Dynamic_check(0 <= axis && axis < 3);
return vec[axis];
}
The code still passes the Checked C type checker since the _3C_REWRITTEN
macro uses the rewritten code. The user can later update the code to what they presumably intend:
#define VEC3_PARAM(v) _Array_ptr<double> v : count(3)
int getX(VEC3_PARAM(vec)) { return a[0]; }
int getY(VEC3_PARAM(vec)) { return a[1]; }
int getZ(VEC3_PARAM(vec)) { return a[2]; }
int getCoordinate(_Array_ptr<double> vec : count(3), int axis) {
_Dynamic_check(0 <= axis && axis < 3);
return vec[axis];
}
(A previous proposed design was to leave the code unmodified and print the proposed rewritten expansions in diagnostics. I think this design is better because it makes it easier for the user to see both the macro calls and the rewritten expansions while editing the code; it's a bit cluttered, but we may be able to improve that and anyway I think it's probably a lesser evil than having to look back and forth between code and diagnostics. This design also lets the code pass the type checker in the interim.)
Note that while 3C naively suggested different expansions for the VEC3_PARAM
calls in getX
, getY
, and getZ
, the user instead chose to update the macro in a way that maintained the intended uniformity among these functions. It's a more general problem that 3C's independent choice of the "best" solution for each pointer may not maintain intended uniformities in the code, but in the absence of macros, I think we typically consider it OK to let 3C introduce a non-uniformity and let the user fix it if and when they notice and care. With a macro, we need to make sure the user understands that they aren't required to make the macro produce the proposed non-uniformity.
In cases in which the expansions should actually be different (e.g., because the macro is used to generate several pointer types that should be different in Checked C but were all *
in C), the user may need to make the macro definition more sophisticated and potentially even add parameters or split the macro into several variants in order to produce the different expansions.
Of course, in this example, it would have been more sensible for the user to use a typedef rather than a macro, and then 3C would probably just do the right thing (or we could easily make it do so if it doesn't already). But conceivably, there could be a case in which a typedef doesn't work and the user has to use a macro.
4. Link pointers in macro definitions and expansions
If we can track the relationship between a pointer type in a macro body and all the places it appears in the preprocessor output, then we can use a single constraint variable for all of those occurrences. After solving the constraints, 3C can try to rewrite the macro body itself or it can issue a diagnostic for the user to do so.
One approach is to have the user mark the pointer type. For the example program of the previous section, the user might modify it as follows (the vec3
in _3C_Ptr_vec3
is an arbitrary name chosen by the user):
#define _3C_Ptr_vec3 *
#define VEC3_PARAM(v) double _3C_Ptr_vec3 v
int getX(VEC3_PARAM(vec)) { return a[0]; }
int getY(VEC3_PARAM(vec)) { return a[1]; }
int getZ(VEC3_PARAM(vec)) { return a[2]; }
int getCoordinate(double *vec, int axis) {
assert(0 <= axis && axis < 3);
return vec[axis];
}
When the user runs the Checked C compiler, VEC3_PARAM(v)
expands to double * v
as before. But when the user runs 3C, before preprocessing the input, 3C scans for macro definitions of the form #define _3C_Ptr_FOO *
, introduces a constraint variable for each one, and removes the macro definition. So the preprocessed program looks like this (note that _3C_Ptr_vec3
was not expanded because its definition was removed):
int getX(double _3C_Ptr_vec3 v) { return a[0]; }
int getY(double _3C_Ptr_vec3 v) { return a[1]; }
int getZ(double _3C_Ptr_vec3 v) { return a[2]; }
int getCoordinate(double *vec, int axis) {
assert(0 <= axis && axis < 3);
return vec[axis];
}
The Checked C parser is enhanced to recognize _3C_Ptr_FOO
as equivalent to *
but tag the pointer type with the information that it came from _3C_Ptr_FOO
. So in getX
, when 3C sees the type "pointer (from _3C_Ptr_vec3
) to double", it uses the existing constraint variable for _3C_Ptr_vec3
rather than generating a new one. Along with the output program rewritten in all other respects, 3C will output the information that _3C_Ptr_vec3
solved to _Array_ptr<double> : count(3)
. The body of VEC3_PARAM
can then be updated manually or (perhaps) automatically.
As this example begins to suggest, the syntactic differences between T *v
, _Ptr<T> v
, and _Array_ptr<T> v : bounds(...)
present a huge obstacle to fully automating this process. If we could add parser support for alternative notations for _Ptr
, etc. and bounds declarations that are drop-in replacements for *
in the syntax, then we could potentially automate the rewrite and let the user convert the program to normal Checked C syntax later. For example, if we define:
T _S_Ptr x <-> _Ptr<T> x
T _S_Array_ptr(bounds_exp) x <-> _Array_ptr<T> x : bounds_exp
Then given double **foo
, if we determine the outer pointer should be a _Ptr
and the inner one should be an _Array_ptr
with 3 elements, we can automatically rewrite the declaration to double _S_Array_ptr(count(3)) _S_Ptr foo
and let the user turn this into real Checked C syntax later. (Where would the bounds declaration go in this case? _Ptr<_Array_ptr<double> : count(3)> foo
? The user might have to introduce a typedef. IIRC, Checked C might currently prohibit a pointer to a pointer with bounds altogether (even via a typedef), but we could argue for finding a way to lift that limitation for the benefit of 3C.)
A more extreme approach would be for 3C to automatically replace every occurrence of *
in a macro body (that is not inside a character or string literal) with a _3C_Ptr_X
token for fresh X and then proceed as above. Some of these *
symbols might be used as multiplication or dereference operators rather than pointer types, so the parser would have to accept _3C_Ptr_X
in that position, and when 3C sees that, it would have to constrain the pointer to remain *
(just in case the user was crazy and used the same macro both to generate pointer types and to generate multiplication or dereference operators).
The main limitation of this "automatic tracking" approach is that it cannot handle scenarios in which a macro body had a single *
that was sufficient in C but the macro should be enhanced to be able to produce different Checked C pointer types at different call sites.