-
Notifications
You must be signed in to change notification settings - Fork 72
Description
We have a few PRs proposing to add Windows support:
- Add Windows browser path checks to Chromium detection #1014
- Expand Windows tools test coverage in CI workflow #1209
- Add Windows terminal backend support #1012
Proposal:
I think maybe we could we consider implementing this separately, separate from core agent-sdk code. In fact, there are two things here that I feel we could consider:
- separate code / loose coupling
- dogfooding / dedicated maintainer(s).
1. Loose coupling:
- maybe Windows-specific tools
- maybe Windows-specific package
- or better abstractions, at least separate files, initialized per platform
- if we implement it directly into core.
I'm concerned about code like conditionals with Windows-specific logic in core implementation. Windows is too different IMHO, which means we can expect much more special cases, much more ifs in random places. I do feel Windows support is a much more demanding feature than the current PRs, and I'm concerned that:
- the agent-sdk codebase is becoming more error-prone and harder to understand
- we're starting something we are not able to maintain.
IMHO we could perhaps think how to design and implement Windows support as separated as possible from the rest of the logic... The WindowsTerminal file is a good example: abstraction with platform-specific import. But the if for a tool prompt could be done differently. It's a prompt piece, we could have, for example, different templates/files. Or even subclass the definition, or maybe separate a component with the prompt, and subclass that.
2. Dogfooding:
As I noted in one of these PR, one detail here that seems relevant to me is that usually, it won't be the same people that work regularly on Windows and Linux/Mac. This matters at multiple levels:
- for one, no maintainer is working on Windows. We could test a PR (on a quick Windows), but how deep is that testing? Are we introducing support for something we don't maintain?
- this also means that when any contributor looks at a screen of such code in their editor, they never have at their disposal, in their sight, a screen of code, they have half the screen. This may sound trivial, it's true, but if it's too many times, it affects ease of debugging/readability/maintainability IMHO.
(as it happens, I was re-reading Richard Gabriel's Patterns of Software, and this struck me: "The primary feature for easy maintenance is locality: Locality is that characteristic of source code that enables a programmer to understand that source by looking at only a small portion of it." Now this refers to a different issue, but it is the kind of thing I value when I say the whole screen of code I'm looking at should ideally be meaningful.)
I would suggest that we design it separate, and, publicize support when we have an interested maintainer who actually uses Windows. I'm concerned this is a too error-prone feature to be incidental.
Otherwise, I fear we are looking at a bunch of bug reports and anger, like we had in V0, if not more.
Note:
Please note also that if we start supporting Windows in the SDK, people will of course expect it in the CLI and Web UI too. In V0, Windows had been implemented only in LocalRuntime. But we weren't able to make understandable the distinction, so people expected it in CLI at least... Which hadn't been implemented, but that came across as "full of bugs", instead of "not supported".
I would love to hear some thoughts. I am aware people would like Windows support, and that's fine, I'm just suggesting that how we do it matters IMO. Maybe we can do better this time around. 😅