-
Couldn't load subscription status.
- Fork 194
Writing robust, performant linters
The {lintr} codebase has a lot of accumulated knowledge about how to write robust and fast linters. This Wiki exists as a repository for tidbits on these topics.
It exists as a Wiki to make editing it open to all and with low overhead.
-
When writing a test for logical constants like
TRUEorFALSE, if you want the condition to match the shorthandsTandF, note that the former is aNUM_CONSTwhile the latter is aSYMBOL(c.f.getParseData(parse(text = "TRUE; T"))). -
Keep pipes (
%>%,|>) in mind when writing lints based on positional logic (e.g. if it's a lint for the 2nd argument to meet some condition, that will usually become the 1st argument inside a pipe chain). -
The magrittr pipe
%>%and the "native pipe"|>show up differently on the parse tree:SPECIALandPIPE, respectively. Note that all infix operators (e.g.%%,%in%,%*%) show up asSPECIAL, so you'll need to test thetext()as well for magrittr pipes. -
Often, it's better to anchor on
EQ_SUBinstead ofSYMBOL_SUBwhen writing conditions around named arguments. The latter need not be present in all cases, e.g. infoo("a" = 1), which is valid R code, the parse tree will have aSTR_CONSTfor"a", not aSYMBOL_SUB. -
Be wary of
*searches likepreceding-sibling::*[1]. Are you sure everything counts? One common mistake is to include<COMMENT>nodes here, so the XPath lands on a comment instead of the intended expression. Exclude such comments likepreceding-sibling::*[not(self::COMMENT)][1]. Be careful! The parser allows comments to show up basically anywhere! -
Also be wary of
exprsearches likepreceding-sibling::expr[1]. Are=assignment expressions excluded intentionally? Note that depending on the R version, the expressiona = 1will not show up like<expr><expr><SYMBOL>a</SYMBOL></expr><EQ_ASSIGN>=</EQ_ASSIGN><expr><NUM_CONST>1</NUM_CONST></expr></expr>(if we swapEQ_ASSIGNtoLEFT_ASSIGN, that's howa <- 1would appear). The outermost<expr>may be<equal_assign>or<expr_or_assign_or_help>instead.preceding-sibling::expr[1]will thus skip such an assignment, which is often a mistake. -
forloops are a bit of a trap: they appear quite differently on the AST than do similar constructs likewhile()andif(); see https://github.com/r-lib/lintr/issues/2564#issuecomment-2675831586. Specifically, the AST for a simplefor (x in 1:10) 1looks like:<expr> <FOR>for</FOR> <forcond> <OP-LEFT-PAREN>(</OP-LEFT-PAREN> <SYMBOL>a</SYMBOL> <IN>in</IN> <expr> <expr><NUM_CONST>1</NUM_CONST></expr> <OP-COLON>:</OP-COLON> <expr><NUM_CONST>10</NUM_CONST></expr> </expr> <OP-RIGHT-PAREN>)</OP-RIGHT-PAREN> </forcond> <expr> <NUM_CONST>1</NUM_CONST> </expr> </expr>
-
S4 slots extractions are a lot like dollar extractions in the parse tree (
x$yvs.x@y), except that the RHS of@is always aSLOTnode, whereas the RHS of$is aSYMBOLorSYMBOL_FUNCTION_CALL. That also means we need to take care to distinguish a name from a call on the RHS of@(x@yvs.x@y()) based on whether there is(, whereas for$we can just rely on the node name. -
Some linters work like "Compare expression 1 and expression 2; lint if they match [perhaps with other conditions]", for example,
regex_subset_linter()looks for<expr1>[grep(pattern, <expr2>)]and lints only if<expr1>and<expr2>match. XPath basically works here, since=applied to two nodes is evaluated based on the string value of the nodes (See the XPath standard). But beware comments! You might want to exclude and child<COMMENT>nodes before comparing.
- Avoid
//*XPaths like the plague! At least in the current {xml2}, it is almost always slower than alternatives. A good example is https://github.com/r-lib/lintr/pull/2025, which shows a 3x speed-up from avoiding//*even though the replacement is a long, inefficient-seeming chain of//A[expr] | //B[expr]-style repetitive expressions. - Similarly, avoid
//exprXPaths. See https://github.com/r-lib/lintr/issues/1358 -- more than 1/3 of all nodes are<expr>, so//expronly eliminates a relatively small portion of the parse tree. The more specific a node you can anchor on, the better, but the difference among nodes besides<expr>is not as important, so err on the side of readability/comprehensibility. - If you use
//SYMBOL_FUNCTION_CALLas an entry point, use thexml_find_function_calls()helper instead, because it returns cached results much faster, especially when testing for multiple options oftext() = 'foo'.