Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bracketed paste support #73257

Closed
wants to merge 2 commits into from
Closed

Conversation

adamsitnik
Copy link
Member

fixes #60101

@adamsitnik adamsitnik added this to the 7.0.0 milestone Aug 2, 2022
@adamsitnik adamsitnik requested a review from jozkee August 2, 2022 18:42
@ghost ghost assigned adamsitnik Aug 2, 2022
@ghost
Copy link

ghost commented Aug 2, 2022

Tagging subscribers to this area: @dotnet/area-system-console
See info in area-owners.md if you want to be subscribed.

Issue Details

fixes #60101

Author: adamsitnik
Assignees: -
Labels:

area-System.Console

Milestone: 7.0.0

[InlineData("\u001B[201~\u001B[200~", "")] // same, but with no content
// Terminals that were tested don't allow for pasting empty clipboard to the Terminal.
// In case some allow for that, an empty content is returned and the caller must re-try read() call
[InlineData("\u001B[200~\u001B[201~", "")]
Copy link
Member

@am11 am11 Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would make sense to add one more case for nested start sequence:

[InlineData("\u001B[200~nested \u001B[200~brackets \u001B[201~start \u001B[201~", "nested brackets start ")]

(trailing space is intentional)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@am11 I tested nesting by pasting "nested" to the terminal, reading the input via read(), copying to clipboard with https://github.com/SimonCropp/TextCopy) and pasting to the Terminal again:

image

it seems that in such cases xterm is removing the escape characters from the "nested" tags. I can add a test case for that, but imo it would be a little bit odd (verifying that incomplete tags remain in the input).

[InlineData("\u001B[200~[200~nested[201~\u001B[201~", "[200~nested[201~ ")

but I can definitely add white spaces to show that they are not being trimmed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more test cases (whitespaces + nested)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing it on Ubuntu terminal, and noticed this:

$ printf '\u001B[200~nested \u001B[200~brackets \u001B[201~start \u001B[201~'
nested brackets start $

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@am11 I see. But this is something that you have printed to the Terminal, not something that you have read from it. Is it possible for the Terminal to return such data from read?

I am asking because with the current implementation we avoid copying memory and the code is relatively simple. I would prefer to keep it that way if there is no evidence that there is scenario that we don't support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the terminal will remove the escape tag exactly the way this implementation does. If I am reading the change right, your implementation will remove the nested tag and extract the text in between just the way how terminal does it. That is why I was suggesting to test to preserve this behavior if it makes sense. Currently you added:

    [InlineData("\u001B[200~[200~nested[201~\u001B[201~", "[200~nested[201~")]

which is missing \u001B from the nested start tag so it is not testing the nested tag scenario. We may as well remove it since [200~ without \u001B has no meaning.

not something that you have read from it.

That was the intention. printf will send the raw string. To see what value is read, I used xxd to get the hex representation:

$ printf '\u001B[200~nested \u001B[200~brackets \u001B[201~start \u001B[201~' | xxd
00000000: 1b5b 3230 307e 6e65 7374 6564 201b 5b32  .[200~nested .[2
00000010: 3030 7e62 7261 636b 6574 7320 1b5b 3230  00~brackets .[20
00000020: 317e 7374 6172 7420 1b5b 3230 317e       1~start .[201~

# same with 'echo -e'
$ echo -e '\u001B[200~nested \u001B[200~brackets \u001B[201~start \u001B[201~' | xxd
00000000: 1b5b 3230 307e 6e65 7374 6564 201b 5b32  .[200~nested .[2
00000010: 3030 7e62 7261 636b 6574 7320 1b5b 3230  00~brackets .[20
00000020: 317e 7374 6172 7420 1b5b 3230 317e 0a    1~start .[201~.

[InlineData(" test123\u001B[201~", " test123")] // no start tag
[InlineData("\u001B[200~", "")] // just start tag
// "nested" pasted to the Terminal, raw input copied and pasted again (yes, the escape characters got eaten by the xterm)
[InlineData("\u001B[200~[200~nested[201~\u001B[201~", "[200~nested[201~")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[InlineData("\u001B[200~[200~nested[201~\u001B[201~", "[200~nested[201~")]
[InlineData("\u001B[200~\u001B[200~nested\u001B[201~ tag\u001B[201~ ", "nested tag ")]

// b) The second read populates the buffer with the pasted text and the end tag.
// We need to assume that other scenarios can happen. That is why once we remove the tags, the loop checks
// whether the unprocessed buffer is still empty. If it is, the read() is performed again.
ReadOnlySpan<byte> parsableBytes = useNet6KeyParser ? receivedBytes : KeyParser.RemoveBracketedPasteTags(receivedBytes);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about how this fits into the consumer's "world view." I'm untrained here, and have not studied the new VT parsing APIs. However, it looks like here deep in StdInReader, the consumer loses the ability to determine whether an input string came in by paste or simply by virtue of it having all shown up in the same call to read().

Stripping out the bracketing sequences seems like a stopgap that will help get them out of the input stream, but not serve the overarching purpose of paste bracketing. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripping out the bracketing sequences seems like a stopgap that will help get them out of the input stream, but not serve the overarching purpose of paste bracketing.

I am not sure if I understand. The code here is Console.ReadKey which is supposed to read a single key at a time. When user pastes "ABC", the first call is going to return A, second B, third C . No matter how big the pasted input will be, the order is going to be preserved.

return receivedBytes;

static bool IsTag(ReadOnlySpan<byte> b)
=> b[0] == '\u001B' && b[1] == '[' && b[2] == '2' && b[3] == '0' && (b[4] == '0' || b[4] == '1') && b[5] == '~';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be reading this wrong, but it looks like this will also pass malformed sequences where the end tag comes before the start tag -- a frame shift could easily result in this happening "in the wild", right? If we miss part of the introducer, we might detect a span flowing from the next finalizer to the next introducer as having been a bracketed paste.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end tag comes before the start tag

As I wrote in the unit tests comments, this handles hypothetical scenario where read() returns end tag from previous pasted content followed by start tag from next pasted content.

Example: user pastes "AB", then "CD".

1st read returns: <startTag>AB
2nd read returns: </endTag><startTag>CD</endTag>

// the test case above are based on manual testing and observations
// the test cases below were not reproduced during manual testing, but hypothetically are possible
[InlineData("\u001B[201~", "")] // just end tag
[InlineData("\u001B[201~\u001B[200~test123", "test123")] // end tag (from previous pasted content) and start tag from next one

I could stick only to the test cases I've confirmed with manual testing, but each Terminal may implement this feature differently and I can't guarantee that we are going to handle all of them properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate that we get at least one start and one end tag (and ignore any nested tags that may occur)?

Copy link
Member

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a parsing the right thing to do for bracketing paste? Why did you decided that? #60101 doesn't sound to me like is asking for this; it looks more like a question about the mode as-is being reliable or not.

Wouldn't be better to let .NET developers decide whether they want to honor bracketing paste themselves or not? And if they don't make the decision, anyone using a .NET program can just disable the mode.

// In theory, a single buffer can contain end tag from previous paste followed immediately by start tag from next.
internal static ReadOnlySpan<byte> RemoveBracketedPasteTags(ReadOnlySpan<byte> receivedBytes)
{
// The Terminal is configured to return from read() as soon as at least byte is available for reading.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The Terminal is configured to return from read() as soon as at least byte is available for reading.
// The Terminal is configured to return from read() as soon as at least one byte is available for reading.

int result = ReadStdin(bufPtr, BytesToBeRead);
if (result > 0)
{
ReadOnlySpan<byte> receivedBytes = new (bufPtr, result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ReadOnlySpan<byte> receivedBytes = new (bufPtr, result);
ReadOnlySpan<byte> receivedBytes = new(bufPtr, result);

return receivedBytes;

static bool IsTag(ReadOnlySpan<byte> b)
=> b[0] == '\u001B' && b[1] == '[' && b[2] == '2' && b[3] == '0' && (b[4] == '0' || b[4] == '1') && b[5] == '~';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate that we get at least one start and one end tag (and ignore any nested tags that may occur)?

const int TagLength = 6; // "\u001B[200~".Length
while (receivedBytes.Length >= TagLength)
{
if (IsTag(receivedBytes.Slice(0, TagLength)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the escape sequences to a static field and do this instead:

Suggested change
if (IsTag(receivedBytes.Slice(0, TagLength)))
if (receivedBytes.Slice(0, TagLength).SequenceEqual(s_startBracketedPaste))

{
receivedBytes = receivedBytes.Slice(TagLength);
}
else if (IsTag(receivedBytes.Slice(receivedBytes.Length - TagLength)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else if (IsTag(receivedBytes.Slice(receivedBytes.Length - TagLength)))
else if (receivedBytes.Slice(receivedBytes.Length - TagLength).SequenceEqual(s_endBracketedPaste))

Comment on lines +46 to +47
static bool IsTag(ReadOnlySpan<byte> b)
=> b[0] == '\u001B' && b[1] == '[' && b[2] == '2' && b[3] == '0' && (b[4] == '0' || b[4] == '1') && b[5] == '~';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static bool IsTag(ReadOnlySpan<byte> b)
=> b[0] == '\u001B' && b[1] == '[' && b[2] == '2' && b[3] == '0' && (b[4] == '0' || b[4] == '1') && b[5] == '~';
static bool IsTag(ReadOnlySpan<byte> b)
=> b[0] == Escape && b[1] == '[' && b[2] == '2' && b[3] == '0' && (b[4] == '0' || b[4] == '1') && b[5] == '~';

@adamsitnik
Copy link
Member Author

Is a parsing the right thing to do for bracketing paste? Why did you decided that? #60101 doesn't sound to me like is asking for this; it looks more like a question about the mode as-is being reliable or not.

I think it is the right thing as I would expect the users to prefer not to deal with removing the tags themselves.

cc @daxian-dbw who created the original issue. @daxian-dbw what is your opinion here?

@daxian-dbw
Copy link
Contributor

@adamsitnik We want the tags to be preserved by ReadKey instead of being stripped off.

In PSReadLine, right click pasting causes lots of problems (see PowerShell/PSReadLine#579). So, we want a way to detect a right click pasting, and the Bracketed Paste mode sounds to be the right solution given that Windows Terminal supports it now.

I need to know a Bracketed Paste is started, then just collect the pasted content until seeing the end tag, and then add the content text as a whole to the PSReadLine's buffer. Without seeing the tags, PSReadLine will have to assume the incoming keys are from interactive typing and handle them one by one (existing behavior today), and that won't help in solving the right click pasting issue.

So, what we want would be for ReadKey to return back the tags as is (maybe including the [ char that is missing today?), and also want the confirmation from .NET team that .NET application can depend on this behavior to handle Bracketed Paste.

@adamsitnik
Copy link
Member Author

We want the tags to be preserved by ReadKey instead of being stripped off.

Thanks for the confirmation! In such a case this issue has been fixed with #73257 and the missing [ chars should be back now.

@adamsitnik adamsitnik closed this Aug 10, 2022
@daxian-dbw
Copy link
Contributor

@adamsitnik That's great! Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Sep 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Console.ReadKey returns incomplete VT sequences for bracketed paste
5 participants