Skip to content

Commit 5e9dced

Browse files
authored
docs: add a tutorial for finding malicious artifact uploads (#474)
Signed-off-by: behnazh-w <[email protected]>
1 parent 1f5ed10 commit 5e9dced

File tree

8 files changed

+288
-8
lines changed

8 files changed

+288
-8
lines changed
27.3 KB
Loading
294 KB
Loading
92.1 KB
Loading
37.7 KB
Loading

docs/source/index.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
.. References/links
1111
.. _Witness: https://github.com/testifysec/witness
12+
.. _SLSA: https://slsa.dev
1213

1314
=====================
1415
Macaron documentation
@@ -40,6 +41,8 @@ To start with Macaron, see the :doc:`Installation </pages/installation>` and :do
4041

4142
For all services and technologies that Macaron supports, see the :doc:`Supported Technologies </pages/supported_technologies/index>` page.
4243

44+
.. _checks:
45+
4346
-------------------------
4447
Current checks in Macaron
4548
-------------------------
@@ -61,7 +64,7 @@ the requirements that are currently supported by Macaron.
6164
- Identify and validate build script(s).
6265
* - 1
6366
- **Provenance available** - Provenances are available.
64-
- Check for existence of provenances, which can be SLSA or `Witness`_ provenances. If there is no provenance, the repo can still be compliant to level 1 given the build script is available.
67+
- Check for existence of provenances, which can be `SLSA`_ or `Witness`_ provenances. If there is no provenance, the repo can still be compliant to level 1 given the build script is available.
6568
* - 1
6669
- **Witness provenance** - One or more `Witness`_ provenances are discovered.
6770
- Check for existence of `Witness`_ provenances, and whether artifact digests match those in the provenances.
@@ -106,7 +109,8 @@ intermediate representations as abstractions. Using such abstractions, Macaron i
106109

107110
pages/installation
108111
pages/using
109-
pages/output_files
110112
pages/cli_usage/index
113+
pages/tutorials/index
114+
pages/output_files
111115
pages/supported_technologies/index
112116
pages/developers_guide/index
Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
.. Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
2+
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
3+
4+
.. References/links
5+
.. _Witness: https://github.com/testifysec/witness
6+
.. _SLSA: https://slsa.dev
7+
8+
9+
.. _tutorials:
10+
11+
=========
12+
Tutorials
13+
=========
14+
15+
On this page, you will find tutorials to get you started with Macaron. The tutorials show Macaron
16+
in action, analyzing a software component and its dependencies that are built using GitHub Actions
17+
or GitLab. Macaron supports artifacts published on GitHub release, `Maven Central <https://central.sonatype.com>`_,
18+
or privately hosted registries, such as `JFrog <https://jfrog.com/>`_.
19+
20+
---------------------------------------------------------------------
21+
Detect a malicious Java dependency uploaded manually to Maven Central
22+
---------------------------------------------------------------------
23+
24+
In this tutorial we show how Macaron can determine whether the dependencies of a Java project are built
25+
and published via transparent CI workflows or manually uploaded to Maven Central. When an artifact is
26+
manually uploaded, the artifact distributor can modify the artifact and potentially include malicious
27+
code without being detected.
28+
29+
The example project we analyze in this tutorial is `example-maven-app <https://github.com/behnazh-w/example-maven-app>`_,
30+
which is hosted on GitHub. This example application uses Maven to build and manage dependencies, and has two
31+
dependencies:
32+
33+
34+
.. list-table::
35+
:widths: 25 50
36+
:header-rows: 1
37+
38+
* - Artifact name
39+
- `Package URL (PURL) <https://github.com/package-url/purl-spec>`_
40+
* - `guava <https://central.sonatype.com/artifact/com.google.guava/guava>`_
41+
- ``pkg:maven/com.google.guava/[email protected]?type=jar``
42+
* - `jackson-databind <https://central.sonatype.com/artifact/io.github.behnazh-w.demo/jackson-databind>`_
43+
- ``pkg:maven/io.github.behnazh-w.demo/[email protected]?type=jar``
44+
45+
While the ``guava`` dependency follows best practices to publish artifacts automatically with minimal human
46+
intervention, ``jackson-databind`` is a malicious dependency that pretends to provide data-binding functionalities
47+
like `the official jackson-databind <https://github.com/FasterXML/jackson-databind>`_ library (note that
48+
this artifact is created for demonstration purposes and is not actually malicious).
49+
50+
Now let's see how Macaron can help us with evaluating the supply chain security posture of
51+
``example-maven-app`` and its dependencies.
52+
53+
************
54+
Installation
55+
************
56+
57+
Please follow the instructions :ref:`here <installation-guide>`. In summary, you need:
58+
59+
* Docker
60+
* the ``run_macaron.sh`` script to run the Macaron image.
61+
62+
.. note:: At the moment, Docker alternatives (e.g. podman) are not supported.
63+
64+
*************
65+
Prerequisites
66+
*************
67+
68+
You need to provide Macaron with a GitHub token through the ``GITHUB_TOKEN`` environment variable.
69+
70+
To obtain a GitHub Token:
71+
72+
* Go to ``GitHub settings`` → ``Developer Settings`` (at the bottom of the left side pane) → ``Personal Access Tokens`` → ``Fine-grained personal access tokens`` → ``Generate new token``. Give your token a name and an expiry period.
73+
* Under ``"Repository access"``, choosing ``"Public Repositories (read-only)"`` should be good enough in most cases.
74+
75+
Now you should be good to run Macaron. For more details, see the documentation :ref:`here <prepare-github-token>`.
76+
77+
78+
***********************
79+
Run ``analyze`` command
80+
***********************
81+
82+
First, we need to run the ``analyze`` command of Macaron to run a number of :ref:`checks <checks>` and collect evidence for ``example-maven-app`` and its dependencies.
83+
84+
.. code-block:: shell
85+
86+
./run_macaron.sh analyze -rp https://github.com/behnazh-w/example-maven-app
87+
88+
.. note:: By default, Macaron clones the repositories and creates output files under the ``output`` directory. To understand the structure of this directory please see :ref:`Output Files Guide <output_files_guide>`.
89+
90+
By default, this command analyzes the the latest commit of the default branch of the repository. You can also analyze the repository
91+
at a specific commit by providing the branch and commit digest. See the :ref:`CLI options<analyze-action-cli>` of the ``analyze`` command for more information.
92+
After running the ``analyze`` command, we can view the data that Macaron has gathered about the ``example-maven-app`` repository in an HTML report.
93+
94+
.. code-block:: shell
95+
96+
open output/reports/github_com/behnazh-w/example-maven-app/example-maven-app.html
97+
98+
.. _fig_example-maven-app:
99+
100+
.. figure:: ../../_static/images/tutorial_example_maven_app_report.png
101+
:alt: HTML report for ``example-maven-app``
102+
:align: center
103+
104+
|
105+
106+
The image above shows the results of the checks for `example-maven-app <https://github.com/behnazh-w/example-maven-app>`_ repository itself.
107+
As you can see, some of the checks are passing and some are failing. In summary, this project
108+
109+
* is not producing any `SLSA`_ or `Witness`_ provenances (``mcn_provenance_available_1``)
110+
* is using GitHub Actions to build and test using ``mvnw`` (``mcn_build_service_1``)
111+
* but it is not deploying any artifacts automatically (``mcn_build_as_code_1``)
112+
* and no CI workflow runs are detected that automatically publish artifacts (``mcn_infer_artifact_pipeline_1``)
113+
114+
As you scroll down in the HTML report, you will see a section for the dependencies that were automatically identified:
115+
116+
.. _fig_example-maven-app-deps:
117+
118+
.. figure:: ../../_static/images/tutorial_example_maven_app_report_dependencies.png
119+
:alt: HTML report for dependencies of ``example-maven-app``
120+
:align: center
121+
122+
|
123+
| Macaron has found the two dependencies as expected:
124+
125+
* ``io.github.behnazh-w.demo:jackson-databind:1.0``
126+
* ``com.google.guava:guava:32.1.2-jre``
127+
128+
When we open the reports for each dependency, we see that ``mcn_infer_artifact_pipeline_1`` is passed for ``com.google.guava:guava:32.1.2-jre``
129+
and a GitHub Actions workflow run is found for publishing version ``32.1.2-jre``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``.
130+
This means that ``io.github.behnazh-w.demo:jackson-databind:1.0`` could have been built and published manually to Maven Central
131+
and could potentially be malicious.
132+
133+
.. _fig_infer_artifact_pipeline_guava:
134+
135+
.. figure:: ../../_static/images/tutorial_guava_infer_pipeline.png
136+
:alt: mcn_infer_artifact_pipeline_1 for com.google.guava:guava:32.1.2-jre
137+
:align: center
138+
139+
``com.google.guava:guava:32.1.2-jre``
140+
141+
.. _fig_infer_artifact_pipeline_bh_jackson_databind:
142+
143+
.. figure:: ../../_static/images/tutorial_bh_jackson_databind_infer_pipeline.png
144+
:alt: mcn_infer_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0
145+
:align: center
146+
147+
``io.github.behnazh-w.demo:jackson-databind:1.0``
148+
149+
|
150+
151+
After running the ``analyze`` command, all the check results are stored in ``output/macaron.db``.
152+
Next, we show how to use the policy engine to detect if the dependencies of ``example-maven-app``
153+
are not published from a publicly available CI workflow run.
154+
155+
*****************************
156+
Run ``verify-policy`` command
157+
*****************************
158+
159+
While the ``analyze`` command shown in the previous section collects information,
160+
it does not automatically confirm whether a repository satisfies **your** security requirements.
161+
This is where the ``verify-policy`` command comes in. With Macaron, you can use `Soufflé Datalog <https://souffle-lang.github.io/index.html>`_
162+
in order to express the security requirements and let Macaron automatically validate it against the collected data.
163+
Datalog is very similar to SQL and allows writing declarative queries for the
164+
results collected by the ``analyze`` command. We use such queries as policy rules as described next.
165+
166+
The security requirement in this tutorial is to mandate dependencies of our project to have a
167+
transparent artifact publish CI workflows. To write a policy for this requirement, first we need to
168+
revisit the checks shown in the HTML report in the previous :ref:`step <fig_example-maven-app>`.
169+
The result of each of the checks can be queried by the check ID in the first column. For the policy in this tutorial,
170+
we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks:
171+
172+
.. code-block:: c++
173+
174+
#include "prelude.dl"
175+
176+
Policy("detect-malicious-upload", component_id, "") :-
177+
is_component(component_id, _),
178+
!violating_dependencies(component_id).
179+
180+
.decl violating_dependencies(parent: number)
181+
violating_dependencies(parent) :-
182+
transitive_dependency(parent, dependency),
183+
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
184+
!check_passed(dependency, "mcn_provenance_level_three_1").
185+
186+
apply_policy_to("detect-malicious-upload", component_id) :-
187+
is_repo(_, "github.com/behnazh-w/example-maven-app", component_id).
188+
189+
190+
This policy requires that all the dependencies
191+
of repository ``github.com/behnazh-w/example-maven-app`` either pass the ``mcn_provenance_level_three_1`` (have non-forgeable
192+
`SLSA`_ provenances) or ``mcn_infer_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced
193+
by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_infer_artifact_pipeline_1`` needs to pass
194+
only if ``mcn_provenance_level_three_1`` fails.
195+
196+
Let's take a closer look at this policy to understand what each line means.
197+
198+
.. code-block:: c++
199+
200+
#include "prelude.dl"
201+
202+
This line imports the predefined Datalog relations into your Datalog specification. These relations
203+
can be thought of as select statements specifically provided by Macaron to make it easier for you
204+
to write policies. In our example policy, the following relations are pre-defined:
205+
206+
* ``Policy(policy_id: symbol, target_id: number, message: symbol)``
207+
* ``is_component(component_id: number, purl: symbol)``
208+
* ``transitive_dependency(parent: number, dependency: number)``
209+
* ``check_passed(component_id: number, check_name: symbol)``
210+
* ``apply_policy_to(policy_id: symbol, component_id: number)``
211+
* ``is_repo(repo_id: number, repo_complete_name: symbol, component_id: number)``
212+
213+
And the following relation is declared in this policy:
214+
215+
* ``violating_dependencies(parent: number)``
216+
217+
Feel free to browse through the available
218+
relations `here <https://github.com/oracle/macaron/blob/main/src/macaron/policy_engine/prelude/>`_
219+
to see how they are constructed before moving on.
220+
221+
.. code-block:: c++
222+
223+
Policy("detect-malicious-upload", component_id, "") :-
224+
is_component(component_id, _),
225+
!violating_dependencies(component_id).
226+
227+
This rule populates the ``Policy`` relation if ``component_id`` exists in the database and
228+
``violating_dependencies`` relation for this component is empty.
229+
230+
.. code-block:: c++
231+
232+
.decl violating_dependencies(parent: number)
233+
violating_dependencies(parent) :-
234+
transitive_dependency(parent, dependency),
235+
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
236+
!check_passed(dependency, "mcn_provenance_level_three_1").
237+
238+
This is the rule that the user needs to design to detect dependencies that violate a security requirement.
239+
Here we declare a relation called ``violating_dependencies`` and populate it if the dependencies in the
240+
``transitive_dependency`` relation do not pass any of the ``mcn_infer_artifact_pipeline_1`` and
241+
``mcn_provenance_level_three_1`` checks.
242+
243+
.. code-block:: c++
244+
245+
apply_policy_to("detect-malicious-upload", component_id) :-
246+
is_repo(_, "github.com/behnazh-w/example-maven-app", component_id).
247+
248+
Finally, the ``apply_policy_to`` rule applies the policy ``detect-malicious-upload`` on the
249+
repository ``github.com/behnazh-w/example-maven-app``. Note that each run of Macaron analyzes a repository at a specific
250+
commit. So, the database can include more than one result for a repository and this policy will be
251+
validated on all commits available in the database.
252+
253+
Let's name this policy ``example-maven-app.dl``. To verify this policy run:
254+
255+
.. code-block:: shell
256+
257+
./run_macaron.sh verify-policy --database ./output/macaron.db --file ./example-maven-app.dl
258+
259+
You can see the policy result both in the console and ``output/policy_report.json``. The results
260+
printed to the console will look like the following:
261+
262+
.. code-block:: javascript
263+
264+
passed_policies
265+
component_satisfies_policy
266+
failed_policies
267+
['detect-malicious-upload']
268+
component_violates_policy
269+
['1', 'pkg:github.com/behnazh-w/example-maven-app@34c06e8ae3811885c57f8bd42db61f37ac57eb6c', 'detect-malicious-upload']
270+
271+
As you can see, the policy has failed because the ``io.github.behnazh-w.demo:jackson-databind:1.0``
272+
dependency is manually uploaded to Maven Central and does not meet the security requirement.
273+
274+
You can use this policy in your GitHub Actions to prevent a deployment or fail a CI test during the
275+
development. Alternatively, you can treat the result as a warning and manually investigate the
276+
dependencies to make sure they are secure and can be trusted.

src/macaron/policy_engine/prelude/helper_rules.dl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,12 @@ not_self_hosted_git(repo_id, message) :- repository(repo_id, _, _, _, _, _, _, r
2323
match("^.*(github.com|gitlab.com).*$", remote), message=remote.
2424

2525
/**
26-
* This fact exists iff a repository with id dependency is a dependency of repository with id repo.
26+
* This fact exists iff a component with id dependency is a dependency of component with id parent.
2727
*/
28-
.decl transitive_dependency(repo_id: number, dependency: number)
29-
transitive_dependency(repo_id, dependency) :- dependency(repo_id, dependency).
30-
transitive_dependency(repo_id, dependency) :-
31-
transitive_dependency(repo_id, a), transitive_dependency(a, dependency).
28+
.decl transitive_dependency(parent: number, dependency: number)
29+
transitive_dependency(parent, dependency) :- dependency(parent, dependency).
30+
transitive_dependency(parent, dependency) :-
31+
transitive_dependency(parent, a), transitive_dependency(a, dependency).
3232

3333
/**
3434
* Extract the id and PURL from the component relation.

src/macaron/slsa_analyzer/checks/provenance_available_check.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -490,7 +490,7 @@ def run_check(self, ctx: AnalyzeContext, check_result: CheckResult) -> CheckResu
490490
]
491491
return CheckResultType.PASSED
492492

493-
check_result["justification"].append("Could not find any SLSA provenances.")
493+
check_result["justification"].append("Could not find any SLSA or Witness provenances.")
494494
return CheckResultType.FAILED
495495

496496

0 commit comments

Comments
 (0)