Skip to content

io::stdin().read_to_end() drops a byte on certain Unicode input (Windows only) #142847

@QuaeroEtTego

Description

@QuaeroEtTego

Description

The method io::stdin().read_to_end() appears to drop a byte when reading certain Unicode input on Windows. The issue occurs when using Vec::new(), and affects only certain input strings.

use std::io::{self, Read, Write};
use std::str;

fn main() -> io::Result<()> {
    let mut stdout = io::stdout();

    write!(stdout, "Enter content : ")?;
    stdout.flush()?;

    // let mut buffer = Vec::with_capacity(1024);  // Reads correctly
    let mut buffer = Vec::new();  // Fails to read correctly
    
    // Paste the provided input and press Ctrl+Z (or Ctrl+D on Linux)
    io::stdin().read_to_end(&mut buffer)?;

    println!("\nBytes read : {:?}", buffer);

    match str::from_utf8(&buffer) {
        Ok(s) => println!("UTF-8 - OK: {}", s),
        Err(e) => println!("UTF-8 - ERROR: {}", e),
    }

    Ok(())
}

With the input:

8216]:есть какое нибудь бюджетное

The resulting read contains the sequence of bytes:

... 208, 208, 181 ... 0
Bytes read : [56, 50, 49, 54, 93, 58, 208, 181, 209, 129, 209, 130, 209, 140, 32, 208, 186, 208, 176, 208, 186, 208, 190, 208, 181, 32, 208, 189, 208, 184, 208, 177, 209, 131, 208, 180, 209, 140, 32, 208, 177, 209, 142, 208, 180, 208, 208, 181, 209, 130, 208, 189, 208, 190, 208, 181, 0]

instead of the expected:

... 208, 182, 208, 181 ...
Bytes read : [56, 50, 49, 54, 93, 58, 208, 181, 209, 129, 209, 130, 209, 140, 32, 208, 186, 208, 176, 208, 186, 208, 190, 208, 181, 32, 208, 189, 208, 184, 208, 177, 209, 131, 208, 180, 209, 140, 32, 208, 177, 209, 142, 208, 180, 208, 182, 208, 181, 209, 130, 208, 189, 208, 190, 208, 181, 10]

As correctly returned by io::stdin().read_line() or io::stdin().read_to_end() on Linux.

Symptoms

  • The issue occurs only with specific Unicode inputs (such as the example above).
  • Removing a character (the first one for example) from the input causes the issue to disappear.
  • When the character loss occurs, read_to_end() appends a null byte at the end of the buffer.
  • The issue only occurs when using:
let mut buffer = Vec::new();
std::io::stdin().read_to_end(&mut buffer)?;
  • The issueis NOT observed when using:
let mut buffer = Vec::with_capacity(1024);
std::io::stdin().read_to_end(&mut buffer)?;

Notes

  • The behavior only appears when using read_to_end() with Vec::new().
  • The behavior is not present when using a preallocated buffer (Vec::with_capacity(1024)), nor when using read_line().

Meta

rustc --version --verbose:

rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-pc-windows-msvc
release: 1.87.0
LLVM version: 20.1.1

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Jun 21, 2025
added
O-windowsOperating system: Windows
T-libsRelevant to the library team, which will review and decide on the PR/issue.
A-ioArea: `std::io`, `std::fs`, `std::net` and `std::path`
and removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Jun 21, 2025
Noratrieb

Noratrieb commented on Jun 21, 2025

@Noratrieb
Member
ChrisDenton

ChrisDenton commented on Jun 22, 2025

@ChrisDenton
Member

This seems to be a long standing bug. The problem is when using a too small buffer (less than can fit a single UTF-8 code point). Usually the buffering of stdio appears to avoid this issue but read_to_end does its own thing.

The easy fix here would be to make read_to_end ensure the Vec always has at least char::UTF8_MAX_LEN of spare capacity unless we know for sure the exact size of the input.

QuaeroEtTego

QuaeroEtTego commented on Jun 22, 2025

@QuaeroEtTego
Author

Why is this a platform-specific issue?
char::MAX_LEN_UTF8 is not stable yet (#121714)

ChrisDenton

ChrisDenton commented on Jun 22, 2025

@ChrisDenton
Member

Because on Windows a conversion from UTF-16 to UTF-8 is done for console reads.

The standard library can use unstable things internally so long as it is not observable behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-UnicodeArea: UnicodeA-ioArea: `std::io`, `std::fs`, `std::net` and `std::path`C-bugCategory: This is a bug.O-windowsOperating system: WindowsT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @ChrisDenton@fmease@rustbot@Noratrieb@QuaeroEtTego

      Issue actions

        `io::stdin().read_to_end()` drops a byte on certain Unicode input (Windows only) · Issue #142847 · rust-lang/rust