If you want to know what the database schema looks like, you can refer to the diagram about gitbase entities and relations, or just use regular SHOW
or DESCRIBE
queries.
In gitbase repository, you will find more SQL examples of queries.
Show me the repositories I am analyzing:
SELECT * FROM repositories;
Last commit messages for HEAD for every repository
SELECT commit_message
FROM refs
NATURAL JOIN commits
WHERE ref_name = 'HEAD';
Top 10 repositories by commit count from HEAD:
SELECT repository_id,commit_count
FROM (
SELECT
repository_id,
COUNT(*) AS commit_count
FROM ref_commits
WHERE ref_name = 'HEAD'
GROUP BY repository_id
) AS q
ORDER BY commit_count DESC
LIMIT 10;
10 top repos by file count at HEAD
SELECT repository_id, num_files FROM (
SELECT COUNT(*) num_files, repository_id
FROM refs
NATURAL JOIN commit_files
WHERE ref_name = 'HEAD'
GROUP BY repository_id
) AS t
ORDER BY num_files DESC
LIMIT 10;
Query for all LICENSE & README files across history:
SELECT file_path, repository_id, blob_size
FROM files
WHERE
file_path = 'LICENSE'
OR file_path = 'README.md';
Query all files at HEAD:
SELECT cf.file_path, f.blob_size
FROM ref_commits rc
NATURAL JOIN commit_files cf
NATURAL JOIN files f
WHERE
rc.ref_name = 'HEAD'
AND rc.history_index = 0;
Note: UAST values are returned as binary blobs; they're best visualized in the web UI interface rather than the CLI where are seen as binary data.
Retrieve the UAST for all files at HEAD:
SELECT * FROM (
SELECT cf.file_path,
UAST(f.blob_content, LANGUAGE(f.file_path, f.blob_content)) as uast
FROM ref_commits r
NATURAL JOIN commit_files cf
NATURAL JOIN files f
WHERE
r.ref_name = 'HEAD'
AND r.history_index = 0
) t WHERE uast != '';
Top committers per repository
SELECT * FROM (
SELECT
commit_author_email as author,
repository_id as id,
count(*) as num_commits
FROM commits
GROUP BY commit_author_email, repository_id
) AS t
ORDER BY num_commits DESC;