Skip to content

Commit 6c735a2

Browse files
authored
Using recursive CTEs to explore hierarchical Twitter threads
1 parent 9c3caae commit 6c735a2

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

sql/recursive-cte-twitter-threads.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Using recursive CTEs to explore hierarchical Twitter threads
2+
3+
This TIL adapted from [a Gist](https://gist.github.com/simonw/656a8c6e4688f720773c474080abe1b0) I put together in 2019, before I started tracking TILs here.
4+
5+
My [twitter-to-sqlite](https://datasette.io/tools/twitter-to-sqlite) tool produced a SQLite table with an `in_reply_to_status` column that referenced another tweet ID, for recording reply-to conversations.
6+
7+
I wanted to find the "deepest" tweets in my database - the tweets at the end of the longest reply-to thread.
8+
9+
I started by adapting [this recipe](https://gist.github.com/robinhouston/f689a4b833dc027a3fd97e3de855927b) by [Robin Houston](https://mobile.twitter.com/robinhouston/status/1180893134265430016). Here's the query I came up with:
10+
11+
```sql
12+
with recursive thread as (
13+
select id, in_reply_to_status_id, 0 as depth
14+
from tweets
15+
where in_reply_to_status_id is null
16+
union
17+
select tweets.id, tweets.in_reply_to_status_id, 1 + thread.depth as depth
18+
from thread join tweets on tweets.in_reply_to_status_id = thread.id)
19+
select * from thread order by depth desc
20+
```
21+
This uses [a recursive CTE](https://www.sqlite.org/lang_with.html#recursive_query_examples) to sythensize a `thread` table.
22+
23+
The result I got looked like this (truncated):
24+
25+
| id | in_reply_to_status_id | depth |
26+
| --- | --- | --- |
27+
| 1576674019239407616 | 1576673163487821824 | 63 |
28+
| 1576673163487821824 | 1576672866770178048 | 62 |
29+
| 1574621292988440580 | 1574379782573531136 | 61 |
30+
| 1574845776152432649 | 1574845672087375873 | 61 |
31+
| 1574846026602713104 | 1574845672087375873 | 61 |
32+
| 1574847148218322954 | 1574845672087375873 | 61 |
33+
| 1574848163017547777 | 1574845672087375873 | 61 |
34+
| 1576672866770178048 | 1574845672087375873 | 61 |
35+
| 1574338300923777024 | 1574268713213210624 | 60 |
36+
37+
Sure enough, [tweet 1576674019239407616](https://twitter.com/wattmaller1/status/1576674019239407616) is a reply to a VERY long Twitter thread I had created about Stable Diffusion.
38+
39+
Matthew Somerville suggested the following improvement, which returns the full path of tweet IDs leading to that tweet:
40+
41+
```sql
42+
with recursive thread as (
43+
select id, in_reply_to_status_id, 0 as depth, id as ids
44+
from tweets
45+
where in_reply_to_status_id is null
46+
union
47+
select tweets.id, tweets.in_reply_to_status_id, 1 + thread.depth as depth, thread.ids || ',' || tweets.id as ids
48+
from thread join tweets on tweets.in_reply_to_status_id = thread.id)
49+
select * from thread where depth > 1 order by depth asc
50+
```
51+
The results look like this:
52+
53+
| id | in_reply_to_status_id | depth | ids |
54+
| --- | --- | --- | --- |
55+
| 4609905293 | 4608871398 | 2 | 4608471362,4608871398,4609905293 |
56+
| 27566142087 | 27564750598 | 2 | 27563022963,27564750598,27566142087 |
57+
| 28392727498 | 28062128369 | 2 | 28048800241,28062128369,28392727498 |

0 commit comments

Comments
 (0)