Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Sketch of SCIP-based GraphQL API #61217

Closed
wants to merge 20 commits into from
Closed

Sketch of SCIP-based GraphQL API #61217

wants to merge 20 commits into from

Conversation

varungandhi-src
Copy link
Contributor

@varungandhi-src varungandhi-src commented Mar 18, 2024

Part of #62148

TODO:

  • Sketch out ref panel like functionality
  • Describe how fuzzy navigation might be accommodated.
  • Design new API for hover docs
  • Describe how file navigation might be accommodated.
  • Describe how build identifiers/filter might be accommodated.
  • Customizable grouping? Let client handle this for now.
  • Add implementation notes on how to accommodate search-based code intel in this framework.

Test plan

n/a

@cla-bot cla-bot bot added the cla-signed label Mar 18, 2024
Comment on lines 282 to 312
type SymbolUsagesFilter {
# TODO: There needs to be a way to say 'current repo only' or any repos.
# Allow file filter
# Allow kind filter
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Document connections with Chromium code search, IntelliJ, and Linear's filtering API.

Comment on lines 249 to 260
"""
Either one of 'symbol' or 'range' must be provided, but this isn't enforced at the GraphQL
layer due to the lack of support for input unions.
https://github.com/graphql/graphql-wg/blob/main/rfcs/InputUnion.md
"""
usages(
symbol: LookupSCIPSymbol,
range: LookupRange,
filter: UsagesFilter,
first: Int,
after: String,
): UsageConnection!
Copy link
Contributor Author

@varungandhi-src varungandhi-src Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anton's Q: Why not split this into RangeUsageConnection and SymbolUsageConnection? This will cause more complexity in the outputs because of input dependence.

Varun's A: The output needs to have some complexity anyways because of needing to support search-based/syntactic/precise and some information will not be present depending on the data source. (Anton's idea: In GraphQL, you'd typically use union types for results so that caller can check the type and cast.) Also, depending on whether the symbol is file-local, repo-local or cross-repo, we do actually need different kinds of range information.

@@ -47,6 +47,64 @@ extend type GitBlob {
Experimental: This API is likely to change in the future.
"""
symbolInfo(line: Int!, character: Int!): SymbolInfo

codeGraphData: CodeGraphData
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass in a ProvenanceComparator as a parameter here? Should we pass anything else?

Comment on lines 54 to 89
"""
Invariants:
1. forall sym in symbols.
(sym.provenance, sym.dataSource) == (scipDocument.provenance, scipDocument.dataSource)
"""
type CodeGraphData {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This invariant is a bit annoying; should we remove the duplication the (provenance, dataSource) pair of fields? Right now, we have Provenance on SymbolInformation so that you can check it after invoking usagesBySymbol


input OccurrencesFilter {
rangeFilter: RangeFilter
rolesFilter: RolesFilter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add documentation here describing how this can be used for the symbols side bar so that the symbols sidebar is consistent with the navigation in the file. E.g. you can have a precise symbols sidebar if you have precise code intel instead of having to use type:symbol search in the file.

Comment on lines -121 to +227
line: Int!
line: Int

"""
The character (not byte) of the start line on which the symbol occurs (zero-based, inclusive).
"""
character: Int!
character: Int
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Revert these changes, we don't care about this API anymore

Comment on lines +470 to +515
input RepositoryFilter {
name: StringComparator!
}

input PathFilter {
name: StringComparator!
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Should we remove the ! here?

Comment on lines 489 to 540
type UsageConnection {
nodes: [Usage!]!
pageInfo: PageInfo!
}

type UsageRange {
repo: String!
revision: String!
range: Range!
}

type Usage {
symbol: SymbolInformation!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Think through the non-null cases more carefully, especially for UsageConnection

manually generating bindings using
https://github.com/sourcegraph/scip/blob/main/scip.proto
"""
data: String!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: would it be more natural to use proto's canonical JSON encoding for a GraphQL API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the downside is then there is temptation to not use the SCIP bindings and just use the deserialized JSON object directly. Also it doesn't look like the bindings support JSON serialization. So nevermind, ignore me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We don't want to update our GraphQL API for every change to SCIP, because most of our indexers may not adopt the new features right away. Some features also add a significant amount of complexity (e.g. this one being proposed which exposes full type structures Add support for SymbolInformation.signature scip#231), and I don't think we should expose that in our GraphQL API. Exposing the raw SCIP allows a customer to upload updated indexes using the new SCIP feature and implement custom functionality.
  2. Protobuf allows a client to parse the data in a streaming manner, which can allow for optimizations like interning-while-parsing (e.g. for symbol names), as well as just skipping needless input, without allocating a lot of memory.

Some of the important Document data is already exposed by the symbols and occurrences fields in the API today. So 'light' clients can make use of it directly. 'Heavy' clients can take on a bit more complexity by parsing the Protobuf for full access to all the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By shoving this into a string field, you lose the ability to do a subselect, which kind of defeats GraphQL. That said, if we really need a proto representation here, I don't think that is blocking.

However, from an APIs perspective, it feels a little cleaner to return JSON.

If having to support streaming decoding here, I think I have another concern:
Is this gonna be a huge string sent over? Base64 encoding adds another bit of overhead to the size of the proto, and you need to unmarshal the whole JSON GQL response to then grab this string to then streaming decode it.
Our backend has to build up the whole response in memory, and the clients load it into memory entirely. Is that a concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing here is that we want clients to use the SCIP proto-bindings for interacting with this field

cmd/frontend/graphqlbackend/schema.graphql Outdated Show resolved Hide resolved
cmd/frontend/graphqlbackend/schema.graphql Outdated Show resolved Hide resolved
manually generating bindings using
https://github.com/sourcegraph/scip/blob/main/scip.proto
"""
data: String!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By shoving this into a string field, you lose the ability to do a subselect, which kind of defeats GraphQL. That said, if we really need a proto representation here, I don't think that is blocking.

However, from an APIs perspective, it feels a little cleaner to return JSON.

If having to support streaming decoding here, I think I have another concern:
Is this gonna be a huge string sent over? Base64 encoding adds another bit of overhead to the size of the proto, and you need to unmarshal the whole JSON GQL response to then grab this string to then streaming decode it.
Our backend has to build up the whole response in memory, and the clients load it into memory entirely. Is that a concern?

cmd/frontend/graphqlbackend/codeintel.codenav.graphql Outdated Show resolved Hide resolved
cmd/frontend/graphqlbackend/codeintel.codenav.graphql Outdated Show resolved Hide resolved
cmd/frontend/graphqlbackend/codeintel.codenav.graphql Outdated Show resolved Hide resolved
or linesAfter is negative or exceeds the number of available lines,
the value is interpreted as until the start/end of the file.
"""
surroundingBlobContent(surroundingLines: SurroundingLines = {linesBefore: 0, linesAfter: 0}): String!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 meaning "no value given" is a very go-ey thing, I think you don't need a default value here, since surroundingLines: SurroundingLines already indicates nullable and the backend can then decide how to represent that in the given programming language

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, 0 doesn't mean "no value given", it literally means 0 because that makes sense as a default -- only the single line from the blob will be returned in the default case. I wrote the 0 explicitly here as it serves as documentation for an API client for what the behavior will be if they omit this parameter. Another potential default would be INT32_MAX (i.e. get the full blob contents), but we're deliberately picking 0 here, not INT32_MAX.

cmd/frontend/graphqlbackend/codeintel.codenav.graphql Outdated Show resolved Hide resolved
@varungandhi-src
Copy link
Contributor Author

varungandhi-src commented May 2, 2024

Thanks everyone for the feedback and questions @eseliger, @mmanela, @camdencheek and @keynmol. I think I've addressed most of other people's comments. Overall, API shape-wise, I think we're close to where we want to be.

The main TODO left is the API for the hover doc. If we think about it broadly, I think we really want to think about "get me information about this symbol" (or if a symbol is not available, such as in the case of search-based, then a range), which may include:

  1. Markdown formatted documentation
  2. A signature, maybe with highlighting information
  3. Relationships to other symbols etc.

The first field is already part of SymbolInformation. The second and third would also make sense on SymbolInformation. The API for getting a SymbolInformation based on a symbol/range would be an additive change, so I don't think that's too pressing to discuss right away.

I'm going to close this PR for now. I will tag people individually for subsequent PRs on an as-needed basis, especially if there is a big divergence from what has been discussed here.

@varungandhi-src varungandhi-src changed the title [DO NOT MERGE] SCIP-based GraphQL API Sketch of SCIP-based GraphQL API Sep 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants