Skip to content

Commit 6c5ab5a

Browse files
authored
Architecture docs v0 (#225)
# Motivation <!-- Why is this change necessary? --> # Content <!-- Please include a summary of the change --> # Testing <!-- How was the change tested? --> # Please check the following before marking your PR as ready for review - [ ] I have added tests for my changes - [ ] I have updated the documentation or added new documentation as needed - [ ] I have read and agree to the [Contributor License Agreement](/codegen-sh/codegen-sdk/blob/develop/CLA.md)
1 parent a3e6016 commit 6c5ab5a

37 files changed

+754
-34
lines changed

.github/pull_request_template.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
11
# Motivation
2+
23
<!-- Why is this change necessary? -->
34

45
# Content
6+
57
<!-- Please include a summary of the change -->
8+
69
# Testing
10+
711
<!-- How was the change tested? -->
12+
813
# Please check the following before marking your PR as ready for review
914

1015
- [ ] I have added tests for my changes

.pre-commit-config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,13 @@ repos:
103103
language: system
104104
pass_filenames: false
105105
always_run: true
106+
- repo: https://github.com/hukkin/mdformat
107+
rev: 0.7.22 # Use the ref you want to point at
108+
hooks:
109+
- id: mdformat
110+
# Optionally add plugins
111+
additional_dependencies:
112+
- mdformat-gfm
113+
- mdformat-ruff
114+
- mdformat-config
115+
- mdformat-pyproject

CLA.md

Lines changed: 30 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,44 +7,49 @@
77
**Project Owner/Organization:** Codegen, Inc.
88

99
1. **Definitions**
10-
1. **“You”** or **“Contributor”** means the individual or entity (and its Affiliates) that Submits a Contribution.
11-
2. **“Contribution”** means any work of authorship (including any modifications or additions) that is intentionally Submitted by You for inclusion in the Project, in any form (including but not limited to source code, documentation, or other materials).
12-
3. **“Submit”** or **“Submitted”** means any act of transferring a Contribution to Codegen, Inc. via pull request, email, or any other method of communication for the purpose of inclusion in the Project.
13-
2. **Grant of Copyright License**
1410

15-
Subject to the terms and conditions of this CLA, You hereby grant to Codegen, Inc. and to recipients of software distributed by Codegen, Inc.:
11+
1. **“You”** or **“Contributor”** means the individual or entity (and its Affiliates) that Submits a Contribution.
12+
1. **“Contribution”** means any work of authorship (including any modifications or additions) that is intentionally Submitted by You for inclusion in the Project, in any form (including but not limited to source code, documentation, or other materials).
13+
1. **“Submit”** or **“Submitted”** means any act of transferring a Contribution to Codegen, Inc. via pull request, email, or any other method of communication for the purpose of inclusion in the Project.
1614

17-
- A perpetual, worldwide, non-exclusive, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.
18-
3. **Grant of Patent License**
15+
1. **Grant of Copyright License**
1916

20-
Subject to the terms and conditions of this CLA, You hereby grant to Codegen, Inc. and to recipients of software distributed by Codegen, Inc. a perpetual, worldwide, non-exclusive, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer Your Contribution, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution alone or by combination of Your Contribution with the Project to which You Submitted it.
17+
Subject to the terms and conditions of this CLA, You hereby grant to Codegen, Inc. and to recipients of software distributed by Codegen, Inc.:
2118

22-
If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that Your Contribution, or the Project to which You have contributed, directly or indirectly infringes any patent, then any patent licenses granted to that entity under this CLA for that Contribution or Project shall terminate as of the date such litigation is filed.
19+
- A perpetual, worldwide, non-exclusive, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.
2320

24-
4. **Representations and Warranties**
25-
1. **Original Work**. You represent that each of Your Contributions is an original work of authorship and that You have the necessary rights to grant the licenses under this CLA.
26-
2. **Third-Party Rights**. If Your employer(s) or any third party has rights to intellectual property that You create, You represent that You have received permission to make Contributions on behalf of that employer or third party (or that such employer or third party has waived those rights for Your Contributions).
27-
3. **No Other Agreements**. You represent that You are not aware of any other agreement or obligation that is inconsistent with the rights granted under this CLA.
28-
5. **Disclaimer of Warranty**
21+
1. **Grant of Patent License**
2922

30-
UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, YOU PROVIDE YOUR CONTRIBUTIONS ON AN **“AS IS”** BASIS, **WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND**, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
23+
Subject to the terms and conditions of this CLA, You hereby grant to Codegen, Inc. and to recipients of software distributed by Codegen, Inc. a perpetual, worldwide, non-exclusive, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer Your Contribution, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution alone or by combination of Your Contribution with the Project to which You Submitted it.
3124

32-
6. **Limitation of Liability**
25+
If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that Your Contribution, or the Project to which You have contributed, directly or indirectly infringes any patent, then any patent licenses granted to that entity under this CLA for that Contribution or Project shall terminate as of the date such litigation is filed.
3326

34-
IN NO EVENT SHALL CODEGEN, INC. OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE), ARISING IN ANY WAY OUT OF OR IN CONNECTION WITH THIS AGREEMENT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
1. **Representations and Warranties**
3528

36-
7. **Subsequent Contributions and Updates**
29+
1. **Original Work**. You represent that each of Your Contributions is an original work of authorship and that You have the necessary rights to grant the licenses under this CLA.
30+
1. **Third-Party Rights**. If Your employer(s) or any third party has rights to intellectual property that You create, You represent that You have received permission to make Contributions on behalf of that employer or third party (or that such employer or third party has waived those rights for Your Contributions).
31+
1. **No Other Agreements**. You represent that You are not aware of any other agreement or obligation that is inconsistent with the rights granted under this CLA.
3732

38-
You agree that all current and future Contributions to the Project Submitted by You shall be subject to the terms of this CLA. Codegen, Inc. may publish updates to this CLA from time to time; in such case, You may need to agree to new terms before any subsequent Contributions.
33+
1. **Disclaimer of Warranty**
3934

40-
8. **License Modification Rights**
35+
UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, YOU PROVIDE YOUR CONTRIBUTIONS ON AN **“AS IS”** BASIS, **WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND**, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
4136

42-
You agree that Codegen, Inc. may change the license(s) applicable to the open source project(s) to which Your Contributions relate at Codegen, Inc.’s sole discretion, including without limitation by re-licensing the project(s) and Your Contributions under any other open source or “free” software license, or a commercial or proprietary license of Codegen, Inc.’s choosing.
37+
1. **Limitation of Liability**
4338

44-
9. **Governing Law**
39+
IN NO EVENT SHALL CODEGEN, INC. OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE), ARISING IN ANY WAY OUT OF OR IN CONNECTION WITH THIS AGREEMENT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
4540

46-
This CLA shall be governed by and construed in accordance with the laws of the State of Delaware, without regard to its conflicts of laws provisions.
41+
1. **Subsequent Contributions and Updates**
4742

48-
10. **Signature / Electronic Consent**
43+
You agree that all current and future Contributions to the Project Submitted by You shall be subject to the terms of this CLA. Codegen, Inc. may publish updates to this CLA from time to time; in such case, You may need to agree to new terms before any subsequent Contributions.
4944

50-
By signing or otherwise indicating Your acceptance of this CLA, You acknowledge that You have read and agree to be bound by its terms. If You are signing on behalf of an entity, You represent and warrant that You have the authority to do so.
45+
1. **License Modification Rights**
46+
47+
You agree that Codegen, Inc. may change the license(s) applicable to the open source project(s) to which Your Contributions relate at Codegen, Inc.’s sole discretion, including without limitation by re-licensing the project(s) and Your Contributions under any other open source or “free” software license, or a commercial or proprietary license of Codegen, Inc.’s choosing.
48+
49+
1. **Governing Law**
50+
51+
This CLA shall be governed by and construed in accordance with the laws of the State of Delaware, without regard to its conflicts of laws provisions.
52+
53+
1. **Signature / Electronic Consent**
54+
55+
By signing or otherwise indicating Your acceptance of this CLA, You acknowledge that You have read and agree to be bound by its terms. If You are signing on behalf of an entity, You represent and warrant that You have the authority to do so.

CONTRIBUTING.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ Thank you for your interest in contributing to Codegen! This document outlines t
77
By contributing to Codegen, you agree that:
88

99
1. Your contributions will be licensed under the project's license.
10-
2. You have the right to license your contribution under the project's license.
11-
3. You grant Codegen a perpetual, worldwide, non-exclusive, royalty-free license to use your contribution.
10+
1. You have the right to license your contribution under the project's license.
11+
1. You grant Codegen a perpetual, worldwide, non-exclusive, royalty-free license to use your contribution.
1212

1313
See our [CLA](CLA.md) for more details.
1414

@@ -19,6 +19,7 @@ See our [CLA](CLA.md) for more details.
1919
UV is a fast Python package installer and resolver. To install:
2020

2121
**macOS**:
22+
2223
```bash
2324
brew install uv
2425
```
@@ -28,13 +29,15 @@ For other platforms, see the [UV installation docs](https://github.com/astral-sh
2829
### Setting Up the Development Environment
2930

3031
After installing UV, set up your development environment:
32+
3133
```bash
3234
uv venv
3335
source .venv/bin/activate
3436
uv sync --dev
3537
```
3638

3739
> [!TIP]
40+
>
3841
> - If sync fails with `missing field 'version'`, you may need to delete lockfile and rerun `rm uv.lock && uv sync --dev`.
3942
> - If sync fails with failed compilation, you may need to install clang and rerun `uv sync --dev`.
4043
@@ -51,10 +54,10 @@ uv run pytest tests/integration/codemod/test_codemods.py -n auto
5154
## Pull Request Process
5255

5356
1. Fork the repository and create your branch from `develop`.
54-
2. Ensure your code passes all tests.
55-
3. Update documentation as needed.
56-
4. Submit a pull request to the `develop` branch.
57-
5. Include a clear description of your changes in the PR.
57+
1. Ensure your code passes all tests.
58+
1. Update documentation as needed.
59+
1. Submit a pull request to the `develop` branch.
60+
1. Include a clear description of your changes in the PR.
5861

5962
## Release Process
6063

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@
2424

2525
[Codegen](https://docs.codegen.com) is a python library for manipulating codebases.
2626

27-
2827
```python
2928
from codegen import Codebase
3029

@@ -37,11 +36,13 @@ for function in codebase.functions:
3736
# Comprehensive static analysis for references, dependencies, etc.
3837
if not function.usages:
3938
# Auto-handles references and imports to maintain correctness
40-
function.move_to_file('deprecated.py')
39+
function.move_to_file("deprecated.py")
4140
```
41+
4242
Write code that transforms code. Codegen combines the parsing power of [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) with the graph algorithms of [rustworkx](https://github.com/Qiskit/rustworkx) to enable scriptable, multi-language code manipulation at scale.
4343

4444
## Installation and Usage
45+
4546
We support
4647

4748
- Running Codegen in Python 3.12 – 3.13
@@ -51,7 +52,6 @@ We support
5152
- Windows is not supported
5253
- Python, Typescript, Javascript and React codebases
5354

54-
5555
```
5656
# Install inside existing project
5757
uv pip install codegen
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# File Discovery
2+
3+
The file discovery process is responsible for identifying and organizing all relevant files in a project that need to be processed by the SDK.
4+
5+
## Initialization
6+
7+
- We take in either a list of projects or a path to a filesystem.
8+
- If we get a path, we'll detect the programming language, initialize the git client based on the path and get a Project
9+
10+
## File discovery
11+
12+
- We discover files using the git client so we can respect gitignored files
13+
- We then filter files based on the language and the project configuration
14+
- If specified, we filter by subdirectories
15+
- We also filter by file extensions
16+
17+
## Next Step
18+
19+
After file discovery is complete, the files are passed to the [Tree-sitter Parsing](../parsing/tree-sitter.md) phase, where each file is parsed into a concrete syntax tree.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Tree-sitter Parsing
2+
3+
Tree-sitter is used as the primary parsing engine for converting source code into concrete syntax trees. Tree-sitter supports two modes of operation:
4+
5+
```python
6+
def my_function():
7+
pass
8+
```
9+
10+
Tree sitter parses this as the following:
11+
12+
```
13+
module [0, 0] - [3, 0]
14+
function_definition [0, 0] - [1, 8]
15+
name: identifier [0, 4] - [0, 15]
16+
parameters: parameters [0, 15] - [0, 17]
17+
body: block [1, 4] - [1, 8]
18+
pass_statement [1, 4] - [1, 8]
19+
```
20+
21+
- An CST mode which includes syntax nodes (for example, the `def` keyword, spaces, or parentheses). The syntax nodes are "anonymous" and don't have any semantic meaning.
22+
- You don't see these nodes in the tree-sitter output, but they are there.
23+
- A AST mode where we only focus on the semantic nodes (for example, the `my_function` identifier, and the `pass` statement). These are 'named nodes' and have semantic meaning.
24+
- This is different from field names (like 'body'). These mean nothing about the node, they indicate what role the child node ('block') plays in the parent node ('function_definition').
25+
26+
## Implementation Details
27+
28+
- We construct a mapping between file type and the tree-sitter grammar
29+
- For each file given to us (via git), we parse it using the appropriate grammar
30+
31+
## Next Step
32+
33+
Once the concrete syntax trees are built, they are transformed into our abstract syntax tree representation in the [AST Construction](./B.%20AST%20Construction.md) phase.
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# AST Construction
2+
3+
The tree-sitter CST/AST is powerful but it focuses on syntax highlighting and not semantic meaning.
4+
For example, take decorators:
5+
6+
```python
7+
@decorator
8+
def my_function():
9+
pass
10+
```
11+
12+
```
13+
module [0, 0] - [3, 0]
14+
decorated_definition [0, 0] - [2, 8]
15+
decorator [0, 0] - [0, 10]
16+
identifier [0, 1] - [0, 10]
17+
definition: function_definition [1, 0] - [2, 8]
18+
name: identifier [1, 4] - [1, 15]
19+
parameters: parameters [1, 15] - [1, 17]
20+
body: block [2, 4] - [2, 8]
21+
pass_statement [2, 4] - [2, 8]
22+
23+
```
24+
25+
You can see the decorated_definition node has a decorator and a definition. This makes sense for syntax highlighting - the decorator is highlighted seperately from the function definition.
26+
27+
However, this is not useful for semantic analysis. We need to know that the decorator is decorating the function definition - there is a single function definition which may contain multiple decorators.
28+
This becomes visibile when we consider function call chains:
29+
30+
```python
31+
a().b().c().d()
32+
```
33+
34+
```
35+
module [0, 0] - [2, 0]
36+
expression_statement [0, 0] - [0, 15]
37+
call [0, 0] - [0, 15]
38+
function: attribute [0, 0] - [0, 13]
39+
object: call [0, 0] - [0, 11]
40+
function: attribute [0, 0] - [0, 9]
41+
object: call [0, 0] - [0, 7]
42+
function: attribute [0, 0] - [0, 5]
43+
object: call [0, 0] - [0, 3]
44+
function: identifier [0, 0] - [0, 1]
45+
arguments: argument_list [0, 1] - [0, 3]
46+
attribute: identifier [0, 4] - [0, 5]
47+
arguments: argument_list [0, 5] - [0, 7]
48+
attribute: identifier [0, 8] - [0, 9]
49+
arguments: argument_list [0, 9] - [0, 11]
50+
attribute: identifier [0, 12] - [0, 13]
51+
arguments: argument_list [0, 13] - [0, 15]
52+
```
53+
54+
You can see that the chain of calls is represented as a deeply nested structure. This is not useful for semantic analysis or performing edits on these nodes. Therefore, when parsing we need to build an AST that is more useful for semantic analysis.
55+
56+
## Implementation
57+
58+
- For each file, we parse a file-specific AST
59+
- We offer two modes of parsing:
60+
- Pattern based parsing: It maps a particular node type to a semantic node type. For example, we broadly map all identifiers to the `Name` node type.
61+
- Custom parsing: It takes a CST and builds a custom node type. For example, we can turn a decorated_definition node into a function_definition node with decorators. This involves careful arranging of the CST nodes into a new structure.
62+
63+
## Pattern based parsing
64+
65+
To do this, we need to build a mapping between the tree-sitter node types and our semantic node types. These mappings are language specific and stored in node_classes. They are processed by parser.py at runtime. We can access these via many functions - child_by_field_name, \_parse_expression, etc. These methods both wrap the tree-sitter methods and parse the tree-sitter node into our semantic node.
66+
67+
## Custom parsing
68+
69+
These are more complex and require more work. Most symbols (classes, functions, etc), imports, exports, and other complex constructs are parsed using custom parsing.
70+
71+
## Statement parsing
72+
73+
Statements have another layer of complexity. They are essentially pattern based but the mapping and logic is defined directly in the parser.py file.
74+
75+
## Next Step
76+
77+
After the AST is constructed, the system moves on to [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Import Resolution
2+
3+
TODO
4+
5+
## Next Step
6+
7+
After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by comprehensive [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md).
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Export Analysis
2+
3+
TODO
4+
5+
## Next Step
6+
7+
After export analysis is complete, for TypeScript projects, the system processes [TSConfig Support](./C.%20TSConfig.md) configurations. Then it moves on to [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md) to build a complete understanding of types and symbols.

0 commit comments

Comments
 (0)