Skip to content

Commit 67d83cd

Browse files
committed
Add initial compiler plugin documentation
1 parent 77635ea commit 67d83cd

File tree

7 files changed

+342
-2
lines changed

7 files changed

+342
-2
lines changed

data/participants.json

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
[
2+
{
3+
"id": "1",
4+
"participants": [
5+
{
6+
"name": {
7+
"firstName": "Alice",
8+
"lastName": "Cooper"
9+
},
10+
"age": 15,
11+
"city": "London"
12+
},
13+
{
14+
"name": {
15+
"firstName": "Bob",
16+
"lastName": "Dylan"
17+
},
18+
"age": 45,
19+
"city": "Dubai"
20+
}
21+
]
22+
},
23+
{
24+
"id": "2",
25+
"participants": [
26+
{
27+
"name": {
28+
"firstName": "Charlie",
29+
"lastName": "Daniels"
30+
},
31+
"age": 20,
32+
"city": "Moscow"
33+
},
34+
{
35+
"name": {
36+
"firstName": "Charlie",
37+
"lastName": "Chaplin"
38+
},
39+
"age": 40,
40+
"city": "Milan"
41+
}
42+
]
43+
}
44+
]

docs/StardustDocs/d.tree

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@
4343
<toc-element topic="DataColumn.md"/>
4444
<toc-element topic="DataRow.md"/>
4545
</toc-element>
46+
<toc-element topic="Compiler-Plugin.md">
47+
<toc-element topic="dataSchema.md"/>
48+
</toc-element>
4649
<toc-element topic="nanAndNa.md"/>
4750
<toc-element topic="numberUnification.md"/>
4851
<toc-element topic="operations.md">
@@ -195,7 +198,6 @@
195198
</toc-element>
196199
</toc-element>
197200
<toc-element topic="gradleReference.md"/>
198-
<toc-element topic="Compiler-Plugin.md"/>
199201
<toc-element topic="DataSchema-Data-Classes-Generation.md"/>
200202
<toc-element topic="_shadow_resources.md" hidden="true"/>
201203
</instance-profile>
19.2 MB
Binary file not shown.
80.7 KB
Loading
77.4 KB
Loading
Lines changed: 109 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,111 @@
11
# Kotlin DataFrame Compiler Plugin
22

3-
TODO
3+
Kotlin DataFrame compiler plugin: available in Gradle projects, is coming to Kotlin Notebook and Maven projects soon.
4+
5+
Check out this video that shows how expressions update the schema of a dataframe:
6+
<video src="compiler_plugin.mp4" controls/>
7+
8+
## Setup
9+
10+
Install [IntelliJ IDEA EAP](https://www.jetbrains.com/idea/nextversion/).
11+
Going forward, compiler plugin updates will be released with Kotlin plugin updates.
12+
Next release: 2025.2
13+
14+
Setup plugins in `build.gradle.kts`:
15+
16+
```kotlin
17+
kotlin("jvm") version "2.2.20-dev-3524"
18+
```
19+
20+
```kotlin
21+
kotlin("plugin.dataframe") version "2.2.20-dev-3524"
22+
```
23+
24+
Setup library dependency:
25+
```kotlin
26+
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta2")
27+
```
28+
29+
Plugin is released as a dev version, available in this maven repository:
30+
31+
```kotlin
32+
maven("https://packages.jetbrains.team/maven/p/kt/dev/")
33+
```
34+
35+
Setup repositories for dependencies in `build.gradle.kts`:
36+
```kotlin
37+
repositories {
38+
maven("https://packages.jetbrains.team/maven/p/kt/dev/")
39+
mavenCentral()
40+
}
41+
```
42+
43+
Setup repositories for plugins in `settings.gradle.kts`
44+
```kotlin
45+
pluginManagement {
46+
repositories {
47+
maven("https://packages.jetbrains.team/maven/p/kt/dev/")
48+
mavenCentral()
49+
gradlePluginPortal()
50+
}
51+
}
52+
```
53+
54+
Add this line to `gradle.properties`:
55+
```properties
56+
kotlin.incremental=false
57+
```
58+
59+
Disabling incremental compilation will no longer be necessary
60+
when https://youtrack.jetbrains.com/issue/KT-66735 is resolved.
61+
62+
## Features overview
63+
64+
### Static interpretation of DataFrame API
65+
66+
Plugin evaluates dataframe operations, given compile-time known arguments such as constant String, resolved types, property access calls.
67+
It updates the return type of the function call to provide properties that match column names and types.
68+
The goal is to reflect the result of operations you apply to dataframe in types and have convenient typed API
69+
70+
```kotlin
71+
val weatherData = dataFrameOf(
72+
"time" to columnOf(0, 1, 2, 4, 5, 7, 8, 9),
73+
"temperature" to columnOf(12.0, 14.2, 15.1, 15.9, 17.9, 15.6, 14.2, 24.3),
74+
"humidity" to columnOf(0.5, 0.32, 0.11, 0.89, 0.68, 0.57, 0.56, 0.5)
75+
)
76+
77+
weatherData.filter { temperature > 15.0 }.print()
78+
```
79+
80+
The schema of DataFrame, as the compiler plugin sees it,
81+
is displayed when you hover on an expression or variable:
82+
83+
![image.png](schema_info.png)
84+
85+
### @DataSchema declarations
86+
87+
Untyped DataFrame can be assigned a data schema - top-level interface or class that describes names and types of columns in the dataframe.
88+
89+
```kotlin
90+
@DataSchema
91+
data class Repositories(
92+
@ColumnName("full_name")
93+
val fullName: String,
94+
@ColumnName("html_url")
95+
val htmlUrl: java.net.URL,
96+
@ColumnName("stargazers_count")
97+
val stargazersCount: Int,
98+
val topics: String,
99+
val watchers: Int
100+
)
101+
102+
fun main() {
103+
val df = DataFrame
104+
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
105+
.convertTo<Repositories>()
106+
107+
df.filter { stargazersCount > 50 }.print()
108+
}
109+
```
110+
111+
[Learn more](dataSchema.md) about data schema declarations
Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# @DataSchema declarations
2+
3+
`DataSchema` can be used as an argument for cast and convertTo functions.
4+
It provides typed data access for raw dataframes you read from I/O sources and serves as a starting point for the compiler plugin to derive schema changes.
5+
6+
Example 1:
7+
```kotlin
8+
@DataSchema
9+
interface Person {
10+
val firstName: String
11+
}
12+
```
13+
14+
Generated code:
15+
```kotlin
16+
val DataRow<Person>.firstName: Int = this["firstName"] as String
17+
val ColumnsScope<Person>.firstName: DataColumn<Int> = this["firstName"] as DataColumn<String>
18+
```
19+
20+
Example 2:
21+
```kotlin
22+
@DataSchema
23+
interface Person {
24+
@ColumnName("first_name")
25+
val firstName: String
26+
}
27+
```
28+
29+
`ColumnName` annotation changes how generated extension properties pull the data from a dataframe:
30+
31+
Generated code:
32+
```kotlin
33+
val DataRow<Person>.firstName: Int = this["first_name"] as String
34+
val ColumnsScope<Person>.firstName: DataColumn<Int> = this["first_name"] as DataColumn<String>
35+
```
36+
37+
Generated extension properties are used to access values in `DataRow` and to access columns in `ColumnsScope`, which is either `DataFrame` or `ColumnSelectionDsl`
38+
39+
`DataRow`:
40+
```kotlin
41+
val row = df[0]
42+
row.firstName
43+
```
44+
45+
```kotlin
46+
df.filter { firstName.startsWith("L") }
47+
df.add("newCol") { firstName }
48+
```
49+
50+
`DataFrame`:
51+
```kotlin
52+
val col = df.firstName
53+
val value = col[0]
54+
```
55+
56+
`ColumnSelectionDsl`:
57+
58+
```kotlin
59+
df.convert { firstName }.with { it.uppercase() }
60+
df.select { firstName }
61+
df.rename { firstName }.into("name")
62+
```
63+
64+
## Data Class
65+
66+
DataSchema can be a top-level data class, in which case two additional API become available
67+
68+
```kotlin
69+
@DataSchema
70+
class WikiData(val name: String, val paradigms: List<String>)
71+
```
72+
73+
1. `dataFrameOf` overload that creates a dataframe instance from objects
74+
75+
```kotlin
76+
val languages = dataFrameOf(
77+
WikiData("Kotlin", listOf("object-oriented", "functional", "imperative")),
78+
WikiData("Haskell", listOf("Purely functional")),
79+
WikiData("C", listOf("imperative")),
80+
WikiData("Pascal", listOf("imperative")),
81+
WikiData("Idris", listOf("functional")),
82+
)
83+
```
84+
85+
2. `append` overload that takes an object and appends it as a row
86+
87+
```kotlin
88+
val ocaml = WikiData("OCaml", listOf("functional", "imperative", "modular", "object-oriented"))
89+
val languages1 = languages.append(ocaml)
90+
```
91+
92+
## Schemas for nested structures
93+
94+
Nested structure can be a JSON that you read from a file.
95+
96+
```json
97+
[
98+
{
99+
"id": "1",
100+
"participants": [
101+
{
102+
"name": {
103+
"firstName": "Alice",
104+
"lastName": "Cooper"
105+
},
106+
"age": 15,
107+
"city": "London"
108+
},
109+
{
110+
"name": {
111+
"firstName": "Bob",
112+
"lastName": "Dylan"
113+
},
114+
"age": 45,
115+
"city": "Dubai"
116+
}
117+
]
118+
},
119+
{
120+
"id": "2",
121+
"participants": [
122+
{
123+
"name": {
124+
"firstName": "Charlie",
125+
"lastName": "Daniels"
126+
},
127+
"age": 20,
128+
"city": "Moscow"
129+
},
130+
{
131+
"name": {
132+
"firstName": "Charlie",
133+
"lastName": "Chaplin"
134+
},
135+
"age": 40,
136+
"city": "Milan"
137+
}
138+
]
139+
}
140+
]
141+
```
142+
143+
You get dataframe with this schema
144+
145+
```text
146+
id: String
147+
participants: *
148+
name:
149+
firstName: String
150+
lastName: String
151+
age: Int
152+
city: String
153+
```
154+
155+
- `participants` is `FrameColumn`
156+
- `name` is `ColumnGroup`
157+
158+
Here's the data schema that matches it:
159+
160+
```kotlin
161+
@DataSchema
162+
data class Group(
163+
val id: String,
164+
val participants: List<Person>
165+
)
166+
167+
@DataSchema
168+
data class Person(
169+
val name: Name,
170+
val age: Int,
171+
val city: String?
172+
)
173+
174+
@DataSchema
175+
data class Name(
176+
val firstName: String,
177+
val lastName: String,
178+
)
179+
```
180+
181+
```kotlin
182+
val url = "https://raw.githubusercontent.com/Kotlin/dataframe/refs/heads/master/data/participants.json"
183+
val df = DataFrame.readJson(url).cast<Group>()
184+
```
185+
186+

0 commit comments

Comments
 (0)