Skip to content

Java: allow the definition of java/unsafe-deserialization sinks using data extensions #20067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: feature
---
* You can now add sinks for the query "Deserialization of user-controlled data" (`java/unsafe-deserialization`) using [data extensions](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-java-and-kotlin/#extensible-predicates-used-to-create-custom-models-in-java-and-kotlin) by extending `sinkModel` and using the kind "unsafe-deserialization". The existing sinks which do not require extra logic to determine if they are unsafe are now defined in this way.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
category: deprecated
---
* The module `semmle.code.java.frameworks.Castor` has been deprecated and will be removed in a future release.
* The module `semmle.code.java.frameworks.JYaml` has been deprecated and will be removed in a future release.
* The classes `UnsafeHessianInputReadObjectMethod` and `BurlapInputReadObjectMethod` in the module `semmle.code.java.frameworks.HessianBurlap` have been deprecated and will be removed in a future release.
* The class `YamlBeansReaderReadMethod` in the module `semmle.code.java.frameworks.YamlBeans` has been deprecated and will be removed in a future release.
* The class `MethodApacheSerializationUtilsDeserialize` in the module `semmle.code.java.frameworks.apache.Lang` has been deprecated and will be removed in a future release.
7 changes: 7 additions & 0 deletions java/ql/lib/ext/com.alibaba.com.caucho.hessian.io.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["com.alibaba.com.caucho.hessian.io", "AbstractHessianInput", True, "readObject", "", "", "Argument[this]", "unsafe-deserialization", "manual"]
- ["com.alibaba.com.caucho.hessian.io", "Hessian2StreamingInput", True, "readObject", "", "", "Argument[this]", "unsafe-deserialization", "manual"]
6 changes: 6 additions & 0 deletions java/ql/lib/ext/com.caucho.burlap.io.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["com.caucho.burlap.io", "BurlapInput", True, "readObject", "", "", "Argument[this]", "unsafe-deserialization", "manual"]
7 changes: 7 additions & 0 deletions java/ql/lib/ext/com.caucho.hessian.io.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["com.caucho.hessian.io", "AbstractHessianInput", True, "readObject", "", "", "Argument[this]", "unsafe-deserialization", "manual"]
- ["com.caucho.hessian.io", "Hessian2StreamingInput", True, "readObject", "", "", "Argument[this]", "unsafe-deserialization", "manual"]
7 changes: 7 additions & 0 deletions java/ql/lib/ext/com.cedarsoftware.util.io.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["com.cedarsoftware.util.io", "JsonReader", False, "jsonToJava", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["com.cedarsoftware.util.io", "JsonReader", True, "readObject", "", "", "Argument[this]", "unsafe-deserialization", "manual"]
Copy link
Contributor

@michaelnebel michaelnebel Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to use "" as the selector for the signature of jsonToJava to capture all overloads, but for readObject there are only a single method with signature () - is this intentional (same question for other MaD sinks)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familar with what to put in this column because I'm most familiar with Go, which doesn't allow overloads, so this column is always left blank. My thinking was that if there is only one overload then it is fine to use "" as it will match the one that we want. Is it better to specify the signature when there is only one overload? I note that the QL model doesn't specify anything except the name and declaring type, so it would match any new overloads that were added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For C#, I think we try to only use "" if there are many overloads and where it is unlikely that another method will be introduced that behaves differently. However, I agree with you that the models as they are now more closely mirrors how the QL was defined - it makes sense to just keep it as it is.

6 changes: 6 additions & 0 deletions java/ql/lib/ext/com.esotericsoftware.yamlbeans.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["com.esotericsoftware.yamlbeans", "YamlReader", True, "read", "", "", "Argument[this]", "unsafe-deserialization", "manual"]
5 changes: 5 additions & 0 deletions java/ql/lib/ext/java.beans.model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,8 @@ extensions:
- ["java.beans", "PropertyEditor", "getValue", "()", "summary", "df-manual"] # needs to be modeled by regular CodeQL matching the get and set keys to reduce FPs
- ["java.beans", "PropertyEditor", "setAsText", "()", "summary", "df-manual"] # needs to be modeled by regular CodeQL matching the get and set keys to reduce FPs
- ["java.beans", "PropertyEditor", "setValue", "()", "summary", "df-manual"] # needs to be modeled by regular CodeQL matching the get and set keys to reduce FPs
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["java.beans", "XMLDecoder", True, "readObject", "()", "", "Argument[this]", "unsafe-deserialization", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the logic behind when the models are also applied to overrides (some models are set to "True" and others are set to "False")? It looks like the QL code doesn't apply to overrides.

Copy link
Contributor Author

@owen-mc owen-mc Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried to go for "True" except for static methods, where I put "False" as it doesn't make any sense to override a static method. (I prefer to put "True" for those for consistency, but when doing the equivalent for Go models I was outvoted.) I may have made mistakes though. In general I think we agreed that there isn't much point to having models with "False" in that column, and you even had a PR open to make that column meaningless for C#. Hence why I defaulted to "True" without really thinking about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I was just curious about the reasoning.

5 changes: 5 additions & 0 deletions java/ql/lib/ext/org.apache.commons.lang.model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,8 @@ extensions:
data:
- ["org.apache.commons.lang", "StringEscapeUtils", true, "escapeHtml", "(String)", "", "Argument[0]", "ReturnValue", "taint", "manual"]
- ["org.apache.commons.lang", "StringEscapeUtils", true, "escapeHtml", "(Writer,String)", "", "Argument[1]", "Argument[0]", "taint", "manual"]
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.commons.lang", "SerializationUtils", False, "deserialize", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
1 change: 1 addition & 0 deletions java/ql/lib/ext/org.apache.commons.lang3.model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ extensions:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.apache.commons.lang3", "SerializationUtils", False, "deserialize", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
# Note these sinks do not use the sink kind `regex-use[0]` because the regex injection query needs to select them separately from
# other `regex-use[0]` sinks in order to avoid FPs. As a result, these sinks are currently not used in the polynomial ReDoS query.
# TODO: refactor the `regex-use%` sink kind so that the polynomial ReDoS query can also use these sinks.
Expand Down
6 changes: 6 additions & 0 deletions java/ql/lib/ext/org.exolab.castor.xml.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.exolab.castor.xml", "Unmarshaller", True, "unmarshal", "", "", "Argument[0..1]", "unsafe-deserialization", "manual"]
Copy link
Contributor

@michaelnebel michaelnebel Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does there exist multiple overloads of unmarshal? AFAIR we might get an error if we try to apply a model with eg. argument selector Argument[1] to a method which only has one argument.

Copy link
Contributor Author

@owen-mc owen-mc Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, many overloads, some with only 1 argument and some with 2. I wasn't aware that there might be a problem - I think QL will just say "well, there isn't one of those" and move on. I feel like I have done this before, when there are overloads with different numbers of arguments, though I don't remember specifics. I ran MRVA on 1000 repos and didn't observe any problems. And, now that I look more carefully, there is a test with a call to a one-argument overload of unmarshal, and it still alerts fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then all is good! Thank you for checking.

13 changes: 13 additions & 0 deletions java/ql/lib/ext/org.ho.yaml.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.ho.yaml", "Yaml", False, "load", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["org.ho.yaml", "Yaml", False, "loadStream", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["org.ho.yaml", "Yaml", False, "loadStreamOfType", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["org.ho.yaml", "Yaml", False, "loadType", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["org.ho.yaml", "YamlConfig", False, "load", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["org.ho.yaml", "YamlConfig", False, "loadStream", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["org.ho.yaml", "YamlConfig", False, "loadStreamOfType", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
- ["org.ho.yaml", "YamlConfig", False, "loadType", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
6 changes: 6 additions & 0 deletions java/ql/lib/ext/org.jabsorb.model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
extensions:
- addsTo:
pack: codeql/java-all
extensible: sinkModel
data:
- ["org.jabsorb", "JSONSerializer", True, "fromJSON", "", "", "Argument[0]", "unsafe-deserialization", "manual"]
8 changes: 6 additions & 2 deletions java/ql/lib/semmle/code/java/frameworks/Castor.qll
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
/**
* DEPRECATED: Now modeled using data extensions instead.
*
* Provides classes and predicates for working with the Castor framework.
*/
overlay[local?]
module;
deprecated module;

import java

Expand All @@ -13,7 +15,9 @@ class CastorUnmarshaller extends RefType {
CastorUnmarshaller() { this.hasQualifiedName("org.exolab.castor.xml", "Unmarshaller") }
}

/** A method with the name `unmarshal` declared in `org.exolab.castor.xml.Unmarshaller`. */
/**
* A method with the name `unmarshal` declared in `org.exolab.castor.xml.Unmarshaller`.
*/
class CastorUnmarshalMethod extends Method {
CastorUnmarshalMethod() {
this.getDeclaringType() instanceof CastorUnmarshaller and
Expand Down
12 changes: 9 additions & 3 deletions java/ql/lib/semmle/code/java/frameworks/HessianBurlap.qll
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,12 @@ class UnsafeHessianInput extends RefType {
}

/**
* DEPRECATED: Now modeled using data extensions instead.
*
* A AbstractHessianInput or Hessian2StreamingInput subclass readObject method.
* This is either `AbstractHessianInput.readObject` or `Hessian2StreamingInput.readObject`.
*/
class UnsafeHessianInputReadObjectMethod extends Method {
deprecated class UnsafeHessianInputReadObjectMethod extends Method {
UnsafeHessianInputReadObjectMethod() {
this.getDeclaringType().getAnAncestor() instanceof UnsafeHessianInput and
this.getName() = "readObject"
Expand All @@ -34,8 +36,12 @@ class BurlapInput extends RefType {
BurlapInput() { this.hasQualifiedName("com.caucho.burlap.io", "BurlapInput") }
}

/** A method with the name `readObject` declared in `com.caucho.burlap.io.BurlapInput`. */
class BurlapInputReadObjectMethod extends Method {
/**
* DEPRECATED: Now modeled using data extensions instead.
*
* A method with the name `readObject` declared in `com.caucho.burlap.io.BurlapInput`.
*/
deprecated class BurlapInputReadObjectMethod extends Method {
BurlapInputReadObjectMethod() {
this.getDeclaringType() instanceof BurlapInput and
this.getName() = "readObject"
Expand Down
4 changes: 3 additions & 1 deletion java/ql/lib/semmle/code/java/frameworks/JYaml.qll
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
/**
* DEPRECATED: Now modeled using data extensions instead.
*
* Provides classes and predicates for working with the JYaml framework.
*/
overlay[local?]
module;
deprecated module;

import java

Expand Down
8 changes: 6 additions & 2 deletions java/ql/lib/semmle/code/java/frameworks/Jabsorb.qll
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,12 @@ class JabsorbUnmarshallMethod extends Method {
}
}

/** The deserialization method `fromJSON`. */
class JabsorbFromJsonMethod extends Method {
/**
* DEPRECATED: Now modeled using data extensions instead.
*
* The deserialization method `fromJSON`.
*/
deprecated class JabsorbFromJsonMethod extends Method {
JabsorbFromJsonMethod() {
this.getDeclaringType().getAnAncestor() instanceof JabsorbSerializer and
this.getName() = "fromJSON"
Expand Down
8 changes: 6 additions & 2 deletions java/ql/lib/semmle/code/java/frameworks/YamlBeans.qll
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,12 @@ class YamlBeansReader extends RefType {
YamlBeansReader() { this.hasQualifiedName("com.esotericsoftware.yamlbeans", "YamlReader") }
}

/** A method with the name `read` declared in `com.esotericsoftware.yamlbeans.YamlReader`. */
class YamlBeansReaderReadMethod extends Method {
/**
* DEPRECATED: Now modeled using data extensions instead.
*
* A method with the name `read` declared in `com.esotericsoftware.yamlbeans.YamlReader`.
*/
deprecated class YamlBeansReaderReadMethod extends Method {
YamlBeansReaderReadMethod() {
this.getDeclaringType() instanceof YamlBeansReader and
this.getName() = "read"
Expand Down
4 changes: 3 additions & 1 deletion java/ql/lib/semmle/code/java/frameworks/apache/Lang.qll
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ class TypeApacheRandomStringUtils extends Class {
}

/**
* DEPRECATED: Now modeled using data extensions instead.
*
* The method `deserialize` in either `org.apache.commons.lang.SerializationUtils`
* or `org.apache.commons.lang3.SerializationUtils`.
*/
class MethodApacheSerializationUtilsDeserialize extends Method {
deprecated class MethodApacheSerializationUtilsDeserialize extends Method {
MethodApacheSerializationUtilsDeserialize() {
this.getDeclaringType()
.hasQualifiedName(["org.apache.commons.lang", "org.apache.commons.lang3"],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,16 @@
*/

import semmle.code.java.dataflow.FlowSources
private import semmle.code.java.dataflow.ExternalFlow

Check warning

Code scanning / CodeQL

Redundant import Warning

Redundant import, the module is already imported inside
semmle.code.java.dataflow.FlowSinks
.
private import semmle.code.java.dataflow.FlowSinks
private import semmle.code.java.dispatch.VirtualDispatch
private import semmle.code.java.frameworks.Kryo
private import semmle.code.java.frameworks.XStream
private import semmle.code.java.frameworks.SnakeYaml
private import semmle.code.java.frameworks.FastJson
private import semmle.code.java.frameworks.JYaml
private import semmle.code.java.frameworks.JsonIo
private import semmle.code.java.frameworks.YamlBeans
private import semmle.code.java.frameworks.HessianBurlap
private import semmle.code.java.frameworks.Castor
private import semmle.code.java.frameworks.Jackson
private import semmle.code.java.frameworks.Jabsorb
private import semmle.code.java.frameworks.Jms
Expand Down Expand Up @@ -51,13 +50,6 @@
}
}

private class XmlDecoderReadObjectMethod extends Method {
XmlDecoderReadObjectMethod() {
this.getDeclaringType().hasQualifiedName("java.beans", "XMLDecoder") and
this.hasName("readObject")
}
}

private module SafeXStreamConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node src) {
any(XStreamEnableWhiteListing ma).getQualifier().(VarAccess).getVariable().getAnAccess() =
Expand Down Expand Up @@ -149,8 +141,15 @@

private module SafeKryoFlow = DataFlow::Global<SafeKryoConfig>;

private class DefaultUnsafeDeserializationSink extends DataFlow::Node {
DefaultUnsafeDeserializationSink() { sinkNode(this, "unsafe-deserialization") }
}

/**
* Holds if `ma` is a call that deserializes data from `sink`.
*
* Note that this does not include deserialization methods that have been
* specified using models-as-data.
*/
predicate unsafeDeserialization(MethodCall ma, Expr sink) {
exists(Method m | m = ma.getMethod() |
Expand All @@ -162,9 +161,6 @@
sink = ma.getQualifier() and
not DataFlow::exprNode(sink).getTypeBound() instanceof SafeObjectInputStreamType
or
m instanceof XmlDecoderReadObjectMethod and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class XmlDecoderReadObjectMethod can be deleted (it is private and not used anywhere else).

sink = ma.getQualifier()
or
m instanceof XStreamReadObjectMethod and
sink = ma.getAnArgument() and
not SafeXStreamFlow::flowToExpr(ma.getQualifier())
Expand All @@ -173,33 +169,13 @@
sink = ma.getAnArgument() and
not SafeKryoFlow::flowToExpr(ma.getQualifier())
or
m instanceof MethodApacheSerializationUtilsDeserialize and
sink = ma.getArgument(0)
or
ma instanceof UnsafeSnakeYamlParse and
sink = ma.getArgument(0)
or
ma.getMethod() instanceof FastJsonParseMethod and
not fastJsonLooksSafe() and
sink = ma.getArgument(0)
or
ma.getMethod() instanceof JYamlLoaderUnsafeLoadMethod and
sink = ma.getArgument(0)
or
ma.getMethod() instanceof JsonIoJsonToJavaMethod and
sink = ma.getArgument(0)
or
ma.getMethod() instanceof JsonIoReadObjectMethod and
sink = ma.getQualifier()
or
ma.getMethod() instanceof YamlBeansReaderReadMethod and sink = ma.getQualifier()
or
ma.getMethod() instanceof UnsafeHessianInputReadObjectMethod and sink = ma.getQualifier()
or
ma.getMethod() instanceof CastorUnmarshalMethod and sink = ma.getAnArgument()
or
ma.getMethod() instanceof BurlapInputReadObjectMethod and sink = ma.getQualifier()
or
ma.getMethod() instanceof ObjectMapperReadMethod and
sink = ma.getArgument(0) and
(
Expand All @@ -215,9 +191,6 @@
sink = ma.getArgument(2) and
UnsafeTypeFlow::flowToExpr(ma.getArgument(1))
or
m instanceof JabsorbFromJsonMethod and
sink = ma.getArgument(0)
or
m instanceof JoddJsonParseMethod and
sink = ma.getArgument(0) and
(
Expand All @@ -244,10 +217,17 @@

/** A sink for unsafe deserialization. */
class UnsafeDeserializationSink extends ApiSinkNode, DataFlow::ExprNode {
UnsafeDeserializationSink() { unsafeDeserialization(_, this.getExpr()) }
MethodCall mc;

UnsafeDeserializationSink() {
unsafeDeserialization(mc, this.getExpr())
or
this instanceof DefaultUnsafeDeserializationSink and
this.getExpr() = [mc.getQualifier(), mc.getAnArgument()]
Copy link
Contributor

@michaelnebel michaelnebel Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is because all the default sinks are are either Argument[this] (corresponds to the qualifier of a sink API) or Argument[n] (corresponds to an argument of the sinks API)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If you can think of other things that should be added then let me know. It is possible that a customer in future will try to use some other feature of MaD (Parameter?) and this won't work. But I can't think of any that are realistic, and we can deal with that if it happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this future proof enough 😄 .

}

/** Gets a call that triggers unsafe deserialization. */
MethodCall getMethodCall() { unsafeDeserialization(result, this.getExpr()) }
MethodCall getMethodCall() { result = mc }
}

/** Holds if `node` is a sanitizer for unsafe deserialization */
Expand Down
Loading
Loading