Map Dataflow captures steps in-between #235
-
Hello everyone, I'm trying to query if a map was populated with a specific key, however I get unwanted result from my query. I tried to figure it out for a quite a while already, and would be super grateful if somebody could please help me, as I'm stuck :) I made an exemplary GitHub repository. I hope this helps in understanding my question. (it contains a zipped CodeQL database, the source, and the query) The baseline of the query is mostly based on this answer, for the most part I only changed the "isSink" conditions. (See "MyDataFlowConfiguration" (Line 40) of my ql query within the repo) I'm trying to check if the key "secret" is added into the following "env" map. Map<String, Object> env = new HashMap<>();
env.put("test1", "test1");
env.put("test2", "test2");
env.put("secret", "secret");
env.put("test3", "test3"); // This is returned as CodeQL query result, even though the map has the expected key
new TestConstructor(null, env); // This is returned as CodeQL query result, even though the map has the expected key When running my query (also see the repo) it detects that a".put(secret)" has been called. However, even though I expect no results from my query, I still get two results for the map.put after the .put(secret). The first screenshot shows my example project without "secret" being set. As expected I get (5) results for my query. Map<String, Object> env = new HashMap<>();
env.put("test1", "test1");
env.put("test2", "test2");
// env.put("secret", "secret"); I'm not set and therefore vulnerable
env.put("test3", "test3");
new TestConstructor(null, env); This screenshot shows the query against the example project with "secret" in the map. I expect zero results, but I get two. Map<String, Object> env = new HashMap<>();
env.put("test1", "test1");
env.put("test2", "test2");
env.put("secret", "secret"); // I am set now, so this map should be considered safe
env.put("test3", "test3");
new TestConstructor(null, env); The query (also in the repo):
Now I wanted to ask if somebody has an answer for my problem. Basically I only want the initial variable within the path, but nothing in between. I tried limiting my query to only get the initial variable in a few ways (e.g. only variables which are initialised, the nth-0 variable, tried to change have additional not-conditions, terminating if the put is detected, and probably a few more) but I never really could get the query right. Maybe my approach is completely wrong and I should takle this issue in another way? Thank you very much in advance, any help is appreciated :) Greetings |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Well, as I wrote in #160 the query was untested... (and written late at night just like this answer :P) Looking back at it, I'm pretty sure the code as is would have never really worked. Anyway, I'm pinging @aschackmull and @aibaars . Maybe they can review the approach I'm suggesting now: The new approach only traces It is then checked in Note, that in theory the sanitizing /**
* @name TODO
* @kind problem
* @problem.severity warning
* @id java/example/TODO
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.DataFlow2
import semmle.code.java.Maps
import DataFlow::PathGraph
/** Models flow from `new HashMap<>()` to a `map.put("secret", value)` call. */
class MapToPutSecretConfiguration extends DataFlow2::Configuration {
MapToPutSecretConfiguration() { this = "MapToPutSecretConfiguration" }
override predicate isSource(DataFlow::Node source) {
source.asExpr().(ClassInstanceExpr).getConstructedType() instanceof MapType
}
override predicate isSink(DataFlow2::Node sink) { putsSecretKey(sink.asExpr()) }
/** Holds if a `put` call on `qualifier` puts the compile time constant `"secret"` into the map. */
private predicate putsSecretKey(Expr qualifier) {
exists(MapPutCall put | put.getKey().(CompileTimeConstantExpr).getStringValue() = "secret" |
put.getQualifier() = qualifier and
put.getMethod().(MapMethod).getReceiverKeyType().getName() = "String" and
put.getMethod().(MapMethod).getReceiverValueType().getName() = "Object"
)
}
}
/** Models flow from `new HashMap<>()` to the argument of a `TestConstructor` call. */
class MapToDangerousFunctionConfiguration extends DataFlow::Configuration {
MapToDangerousFunctionConfiguration() { this = "MapToDangerousFunctionConfiguration" }
override predicate isSource(DataFlow::Node source) {
source.asExpr().(ClassInstanceExpr).getConstructedType() instanceof MapType
}
override predicate isSink(DataFlow::Node sink) {
exists(ConstructorCall ccall |
sink.asExpr() = ccall.getArgument(1) and
ccall.getConstructor().getName() = "TestConstructor"
)
}
}
from
MapToDangerousFunctionConfiguration dataflow, DataFlow::PathNode source, DataFlow::PathNode sink
where
// The map created by `new HashMap<>()` has to a) flow to the sink and b) there must not exist a (different) sink that would put `"secret"` into `source`.
dataflow.hasFlowPath(source, sink) and
not exists(MapToPutSecretConfiguration conf | conf.hasFlow(source.getNode(), _))
select source, sink |
Beta Was this translation helpful? Give feedback.
-
Wow! thanks a lot for the comprehensive and fast answer. Can't thank you enough for that :) After a quick check it seems to work just fine for the use case I had in mind, even in a somewhat bigger example. I'll have a closer look tomorrow, and I'll try to tackle some of the edge cases you described. If you don't mind, I have one more question. Where I used Thanks again, and have a nice night :) |
Beta Was this translation helpful? Give feedback.
Well, as I wrote in #160 the query was untested... (and written late at night just like this answer :P)
Looking back at it, I'm pretty sure the code as is would have never really worked.
At that time I assumed that the
isBarrier
method would somehow mark the barrier node (env
) and all subsequent nodes (and calls onenv
) after the barrier as sanitized.Anyway, I'm pinging @aschackmull and @aibaars . Maybe they can review the approach I'm suggesting now:
The new approach only traces
new HashMap<>()
.So it could be possible to miss cases where the hashmap is created by code that is not visible to CodeQL.
It is then checked in
MapToDangerousFunctionConfiguration
whether your dangerous functio…