Skip to content

Fix: Validate File Paths and Parsing Logic #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev-11171
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions mypy/config_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,6 @@ def split_and_match_files_list(paths: Sequence[str]) -> list[str]:
expanded_paths = []

for path in paths:
if not path:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

continue

path = expand_path(path.strip())
globbed_files = fileglob.glob(path, recursive=True)
if globbed_files:
Expand Down Expand Up @@ -287,6 +284,22 @@ def _find_config_file(

return None

def parse_and_validate_filenames(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a function has an and in the name, this usually indicates that it's doing too many things. Although, validation is usually implied, so I'd drop its mention from the name. Additionally, it's possible to name it more accurately:

Suggested change
def parse_and_validate_filenames(
def convert_raw_files_string_to_list(

Copy link

@webknjaz webknjaz Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be reasonable to make it private, too:

Suggested change
def parse_and_validate_filenames(
def _convert_raw_files_string_to_list(

raw_files: str
) -> list[str]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz add a PEP 257-compliant docstring to this function.

# Split and strip filenames

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use code comments to retell what's already written on the following line. You're just repeating the same thing twice, and that has no value. It's distracting at best. Instead, only use code comments to document motivation/justification for things that code is doing. As a very rare exception, code comments can be used to explain very obscure unobvious logic. But then again, it's best to avoid that and find a way to communicate that through structuring code better, naming the variables, defining abstraction layers clearly.

files_split = [file.strip() for file in raw_files.split(",")]

# Remove trailing empty entry if present
if files_split and files_split[-1] == "":
files_split.pop()

if "" in files_split:
raise ValueError(
"Invalid config: Empty filenames are not allowed except for trailing commas."
)

return files_split

def parse_config_file(
options: Options,
Expand Down Expand Up @@ -322,21 +335,8 @@ def parse_config_file(
else:
section = parser["mypy"]

if "files" in section:
raw_files = section["files"].strip()
files_split = [file.strip() for file in raw_files.split(",")]

# Remove trailing empty entry if present
if files_split and files_split[-1] == "":
files_split.pop()

# Raise an error if there are any remaining empty strings
if "" in files_split:
raise ValueError(
"Invalid config: Empty filenames are not allowed except for trailing commas."
)

options.files = files_split
if "files" in section and isinstance(raw_files := section["files"], str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only a string in the case of INI configs? Does TOML not hit this code path? Would the string check be unnecessary when it's known it's going through the INI code path?

It seems to me that a better place for injecting this conversion would be inside _parse_individual_file() where it has the differentiation between is_toml() and not. And you'd only apply it in the else-branch there.

This would likely let you drop the str check for good.

options.files = parse_and_validate_filenames(raw_files)

prefix = f"{file_read}: [mypy]: "
updates, report_dirs = parse_section(
Expand Down
190 changes: 36 additions & 154 deletions mypy/test/testconfigparser.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to have a module docstring here.

Original file line number Diff line number Diff line change
@@ -1,168 +1,50 @@
import os
import tempfile
from unittest import TestCase, main

import pytest
from mypy.config_parser import parse_config_file
from mypy.options import Options


class TestConfigParser(TestCase):
def test_parse_config_file_with_single_file(self) -> None:
"""A single file should be correctly parsed."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files = file1.py
"""
)

options = Options()

parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)

self.assertEqual(options.files, ["file1.py"])

def test_parse_config_file_with_no_spaces(self) -> None:
"""Files listed without spaces should be correctly parsed."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files =file1.py,file2.py,file3.py
"""
)

options = Options()

parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)

self.assertEqual(options.files, ["file1.py", "file2.py", "file3.py"])

def test_parse_config_file_with_extra_spaces(self) -> None:
"""Files with extra spaces should be correctly parsed."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files = file1.py , file2.py , file3.py
"""
)

options = Options()

parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)

self.assertEqual(options.files, ["file1.py", "file2.py", "file3.py"])

def test_parse_config_file_with_empty_files_key(self) -> None:
"""An empty files key should result in an empty list."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files =
"""
)

options = Options()

parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)

self.assertEqual(options.files, [])

def test_parse_config_file_with_only_comma(self) -> None:
"""A files key with only a comma should raise an error."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files = ,
"""
)

options = Options()

with self.assertRaises(ValueError) as cm:
parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)

self.assertIn("Invalid config", str(cm.exception))

def test_parse_config_file_with_only_whitespace(self) -> None:
"""A files key with only whitespace should result in an empty list."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files =
"""
)

options = Options()

parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)

self.assertEqual(options.files, [])

def test_parse_config_file_with_mixed_valid_and_invalid_entries(self) -> None:
"""Mix of valid and invalid filenames should raise an error."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files = file1.py, , , file2.py
"""
)
@pytest.mark.parametrize(
"config_content, expected_files, expected_exception",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually nicer to have an iterable of strings for the params list. Sparse structures read better.

Suggested change
"config_content, expected_files, expected_exception",
("config_content", "expected_files", "expected_exception"),

[
# Files listed without spaces
("[mypy]\nfiles =file1.py,file2.py,file3.py", ["file1.py", "file2.py", "file3.py"], None),

options = Options()
# Files listed with adequate space
("[mypy]\nfiles =file1.py, file2.py, file3.py", ["file1.py", "file2.py", "file3.py"], None),

with self.assertRaises(ValueError) as cm:
parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)
# Files with extra spaces
("[mypy]\nfiles = file1.py , file2.py , file3.py", ["file1.py", "file2.py", "file3.py"], None),

self.assertIn("Invalid config", str(cm.exception))
# Files listed with a trailing comma
("[mypy]\nfiles = file1.py, file2.py, file3.py,", ["file1.py", "file2.py", "file3.py"], None),

def test_parse_config_file_with_newlines_between_files(self) -> None:
"""Newlines between file entries should be correctly handled."""
with tempfile.TemporaryDirectory() as tmpdirname:
config_path = os.path.join(tmpdirname, "test_config.ini")
# Empty files key
("[mypy]\nfiles =", [], None),

with open(config_path, "w") as f:
f.write(
"""
[mypy]
files = file1.py,
file2.py,
file3.py
"""
)
# Files key with only a comma
("[mypy]\nfiles = ,", None, ValueError),

options = Options()
# Mixed valid and invalid filenames
("[mypy]\nfiles = file1.py, , , file2.py", None, ValueError),

parse_config_file(options, lambda: None, config_path, stdout=None, stderr=None)
# Files listed with multiple trailing comma
("[mypy]\nfiles = file1.py, file2.py, file3.py,", None, ValueError),

self.assertEqual(options.files, ["file1.py", "file2.py", "file3.py"])
# Newlines between file entries
("[mypy]\nfiles = file1.py,\nfile2.py,\nfile3.py", ["file1.py", "file2.py", "file3.py"], None),
]
)
def test_parse_config_file(tmp_path, config_content, expected_files, expected_exception):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@x612skm do you remember if function-based tests are being picked up? I remember we had some issues with these, but I don't recall the detail. Was it just the filename that was problematic, or having the class was necessary as well?

"""Parameterized test for parse_config_file handling various configurations."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function docstrings should start with a verb so that they reference the action being performed. Only constants / objects need to be described as terms because they don't do anything.

Additionally, it should probably be more accurate. Generic statements aren't very useful because they don't include any specifics and so it's normally pointless to have them like that.

config_path = tmp_path / "test_config.ini"
config_path.write_text(config_content)

options = Options()

if __name__ == "__main__":
main()
if expected_exception:
Copy link

@webknjaz webknjaz Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bad idea to have logic in tests. Introducing branching makes tests more fragile and unreliable. The more complex structures there are, them more uncertainty there is. You'd have to write tests for tests to make sure they function at all.

Negative and positive scenarios are two distinct tests. They shouldn't co-exist in the same test function.

with pytest.raises(expected_exception) as exc_info:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.raises() must always have a match= regexp set.

parse_config_file(options, lambda: None, str(config_path), stdout=None, stderr=None)
assert "Invalid config" in str(exc_info.value)
else:
parse_config_file(options, lambda: None, str(config_path), stdout=None, stderr=None)
assert options.files == expected_files