-
Notifications
You must be signed in to change notification settings - Fork 1.2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
docstring format for api, possibly across code base? #3176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I don't think we should spend time formalizing this, we have very limited amount of user facing functions really. And we may find better way to spend our time than argue about docstring formatting. |
@Suor I think you brought this topic, right? :) Are we good to close it (discussion around formatting) then and let Jorge decide this, for example? |
I agree it's only important at the moment to have good full code doc for the API functions and I'm happy to just pick one format but would prefer to get some feedback from the @iterative/engineering team. In fact I wanted to summarize the options here (extracted from #3130 (comment) +):
Among doc generators, Sphinx is by far the most popular, with pdoc a remote 2nd place, it seems. Sphinx uses reStructuredText format. pdoc is interesting because it supports Markdown and other nicer-looking formats worth checking out actually (from Numpy and Google). As for IDEs, VSCode intellisense doesn't interpret any format in the docstrings it seems. It mostly focuses on building and inspecting the Python code by itself. From what I can see here, PyCharm auto completion works similarly. So easy to read docstrings e.g. numpydoc or Google style may be best if we expect IDEs code completion will be the main way users reach our docstrings. And especially since we're writing the API documentation manually anyway (not generating it). p.s. An interesting VSCode extension to help write docstrings in any of the above formats is https://marketplace.visualstudio.com/items?itemName=njpwerner.autodocstring, BTW. |
OK... I'm going to suggest we adopt the Google style for modules (files) and functions. Reasons:
All in favor? ✋ |
I am in favor of no style, it will only slow down reviews with no real gain. |
I'm in favour of an easy-to-use style so it doesn't slow down reviews :) |
Also in favor of not slowing down reviews for doc strings. |
I'm in favor of some simple consistent style for the external API functions (better to be more or less consistent internally, but would never block any review for this). If we have a ticket like this #2316 and we deal with this, it's better to agree what are some basic expectations and how this should looks like. I'm fine with Google style. |
I think it's less about exact styles and more about completeness of docstings. I.e. Google format specifies which sections to include in the docstring (basically description, args, result/yield, throws I think). Yes, a style is attached to that but it's pretty simple... And consistency is good!
Yes guys, this issue is mostly about the API code. I think it's totally up to the core engineering team to decide about docstrings in the rest of the code. I'm confused about the voting results so far: 1 for no format, 2 for Google, 2 invalid haha – no consensus yet :( |
What I don't like from Google style is that it "annotates" the arguments with types. I'm afraid that if we start documenting everything, we will end up including useless comments just for the sake of, like: def do_stuff(x, y):
"""A method that do_stuff given x and y
Args:
x (str): This is an X
y (str): This is a Y.
""" Ideally, the method name should be clear enough to understand what it does, so no need to put that on the description, same with argument names. Things that I think are important to document: Side effects Lines 67 to 81 in ec2c8ad
Explain convoluted implementations / What & Why? Lines 26 to 49 in ec2c8ad
Give examples when code is hard to read Lines 271 to 298 in ec2c8ad
DISCLAIMER: I wrote all of those docstrings. I'm completely bias. 🙈 |
I fully agree, except if we start doing intelligent parsing of docstrings in order to auto-gen docs in such a way that requires having to (annoyingly) document everything. I prefer to not document the obvious. And any intelligent parsing should look at the func signature as well as docstrings to pick up args. |
@MrOutis @casperdcl 💯 about the "code must be good enough to read without comments in the first place", no doubts about this! But for this discussion, let's focus on external APIs only. ( My bias and experience here. I really like when I can press a hotkey, get to the source code of the API and see some documentation there (~ short version of what you can get online, but enough in most cases to understand everything, probably no examples, but definitely edge cases, what do we expect from the params, etc). It tremendously simplifies the experience in my case. So, back to the question. Do we want to have a bit more then just an obvious "This is a get method, it gets something" kind of docstrings at all (again, for external, user-facing API)? If not, should we just delete existing one-sentence comments (that do not provide any value whatsoever)? Btw, a solution might be to put just a link the docs? (similar to what we do in CLI). |
Yes, definitely
Only if needed to describe a complex operation
Yes if no genuine value provided
Yes, though I'd recommend adding a travis job which checks for dead links. Overall in all cases I'd always recommend the original dev does the sensible thing, whether it's comment in prose, in bullet points, in code examples, in links - whatever they think will be most useful for people who may read it. There may not be a one-style-fits-all-functions approach. |
💯 ! In our case all operations (dvc.api) are not simple, neither self-descriptive imo.
If we though have some external APIs we want to describe args, exceptions, some edge cases for - it's better to use the same style, right? (And that's all this discussion is about.) And I believe we have those (I would find docstrings def useful for myself - again, I have a bias - I use IDEs and navigate to libraries all the time and prefer it over online docs). |
I also prefer inline to online. I grudgingly accept online if well done. I don't like any duplication. I feel like online should perhaps have prosaic expansion on examples, while inline should be as succinct as possible. |
I don't see this being of much value, you can just open he website and find the function easily on your own.
Unrelated @casperdcl but if you have any experience on this, please participate in iterative/dvc.org/issues/652 OK so still no decision yet on which docstring format 😢 I may have to just start using Google style for now to get the inline doc going but we can revisit later... |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I am against going for completeness too. Google format is a good example of how bloat is created by doing so. The same reason to avoid consistency if that causes bloat. Python stdlib lives happily without that. @shcheklein the doc-string should describe what function is doing, so that people wouldn't need to guess by name. Params should only be described if that is non-obvious, formal descriptions is bloat and will only stop people from reading it or at least slow them down. Any edge cases should never make it to doc-strings, unless they represent some common use case at the same time. |
don't see how does it create bloat. Any docs are valuable for me (list of exceptions to expect if it can be provided, types, brief param descriptions, etc). I do use them extensively if they exist. Agreed that we can omit parameters if they are obvious. On the other hand how is it different from CLI - why do we include brief description to seemingly obvious options?
Not sure I understand why should we rely on this. We can check other very popular libraries? Also, they have extensive docs for a lot of stuff even inside. Have you seen any guidelines they use by chance? Do they have any?
check To reiterate. I don't care that much about philosophy, style guides, stdlib, google, etc. As I said my bias is very simple - I prefer offline to online and I do read and use public APIs docs via IDEs extensively. Plus it's better to be consistent. That's why I voted the way I voted. |
Simple functions, like ones from |
Okay, to summarize. Looks no one wants to introduce a heavy formats like google's, etc (with sections, and whatnot) and more in favor of stdlib's approach, namely: single or multi-line block of text which might (and probably should) include param names, exceptions, etc, and describes what's happening in some reasonable way. I'm totally fine with this as well. If we all agree on this, @jorgeorpinel @efiop let's close this? For the issue of improving (non-) existing docstrings we should create a separate issue? which I think is way more important to be honest than discussing a specific format in the first place. @casperdcl summarized it well enough as well:
|
It sounds pretty vague to me. Is the policy "please write some sort of docstrings when you think its reasonable"? We would be back at square one but I can't really have a strong opinion here so up to you guys 🙂 p.s. just a question: does this mean it's not allowed to use a full format? Or is it totally fee for everyone to chose their favorite docstring flavor? Bc I've already been using Google's a little bit 😋 |
@jorgeorpinel I would avoid mixing different styles - e.g. Google and stdlib. The way I understand stdlib - you just write a "description" that is valuable to the end-user of what's going on, using param names as terms in it.
we go from a format discussion to the usefulness of them, let's move it to another topic please? |
I think we should write docstrings in some sane format. Google's style is a good choice too, but maybe there is something better. It will allow editors to pick them up easily, improving the dev experience. Sloppy docstrings like in stdlib are as good as nothing. If we start bothering with these, we might as well stick to some good format that will be actually useful. I don't like docstrings myself, but python is too high level to live without them. Even the exceptions introduce some non-obvious flow control, that you can't really tell from looking at one function's body. So I think we need to be good developers and use docstrings to document what needs to be documented. |
I just want to add a link to this discussion about exceptions in stdlib docstrings: #3352 (comment) The summary is that they seem to just be mentioned (not even with specific type) some times (when it seems relevant). So, a pretty flexible common-sense approach as well. |
@jorgeorpinel stdlib is quite old, so I would totally believe that they don't use proper docstrings because back in a day those didn't exist for python and nowadays no one wants to go through the whole codebase to consistently use new format everywhere. |
Yeah, I didn't pick that style. I voted for Google style originally, but others weren't convinced and decided to use a super simple "no style" docstrings inspired on stdlib (at least for now), which is why I'm focusing on stdlib for docstring related matters. (Please refer to discussion above.) Recently I've noticed we kind of already use Google style docstrings throughout the cod base, so I'm a bit confused again... But will continue to assume the resolution so far is in #3176 (comment) ^ Totally up to you guys. |
I don't really see a point of the discussions here, feels like bikeshedding. >> help(dvc.api.open) Then, we could start docstring format for api later on if it provides more benefits (and, it's not hard to change formats for just |
I am still against any format, including googles. It only adds bloat. You should only mention things important for a specific function, not a predefined list in a predefined format, half of which people will need to skim through. I don't care if some IDEs would be happy parsing it, I prefer it to be human readable instead. |
The one thing I've seen that would actually be pretty useful is to use back quotes around param names when explaining them in the docstring, since we're avoiding a special Args section. This was discussed here early on and it was mostly Alex against it. We ended up agreeing not to use them but now that I've been writing them I kind of miss them. This is because param names or other literals tend to be common words like def func(path, x, y)
```Calculates something with x and y argument,
based on a file found in the given path arg.``` vs. def func(path, x, y)
```Calculates something with `x` and `y`, based on the file in `path`.``` It's just a bit faster and more obvious using `. Maybe we can even allow this partially? (Not required but allowed when needed) |
So we agree that we need docstrings in one shape or another, right? If so, please take a good look at the discussions in this ticket and notice how we all have different preferences. Most of us remember the constant fights we've had about the coding style a year ago, before we've started using black. We all come from different backgrounds and we all have different preferences, which will be a constant source of fights unless we agree on some style and just stick to it. "no-style" is a bad idea, because we will drown in fights as we did with code style and then we will still choose some style to put the end to our fights. So why not just skip ahead and pick something instead of wasting our energy on constant fights? That is why I'm admittedly pushing towards picking some style and sticking to it right away. So from my research, there is 1 defacto standard tool for checking docstrings: pydocstyle and it supports three styles IMPO, |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Uh oh!
There was an error while loading. Please reload this page.
We currently don't have a desired style to write docstrings.
Should we come up with one? If so, which one?
The text was updated successfully, but these errors were encountered: