fix: preserve multiline string indentation in cell loader by ChidiebereNjoku · Pull Request #9877 · marimo-team/marimo

ChidiebereNjoku · 2026-06-13T11:27:40Z

Description:

Problem

When a cell defines a multiline (triple-quoted) string with indented content,
marimo's cell loader strips leading whitespace from lines inside the string
literal. This mutates user data — the runtime value of the string differs from
what Python itself produces when importing the same file.

Repro:

@app.cell
def _():
    TEXT = """line0
  two_spaces
    four_spaces
      six_spaces"""
    print(TEXT)
    return (TEXT,)

Before (broken):

line0
  two_spaces
four_spaces      ← wrong, 2 spaces stripped from string content
  six_spaces

After (fixed):

line0
  two_spaces
    four_spaces
      six_spaces

Root Cause

textwrap.dedent and the fixed_dedent helper in parse.py are indentation-blind — they strip leading whitespace from every line uniformly, including lines that are content inside multiline string literals. Python's semantics require that indentation inside string literals is part of the string's value and must never be touched.

Additionally, a stray import tokenize (module) on line 43 of compiler.py shadowed the from tokenize import TokenError, tokenize (function) on line 15, causing a TypeError: 'module' object is not callable in ends_with_semicolon.

Solution

Introduce marimo/_ast/dedent.py with a token-aware smart_dedent function. It uses Python's tokenize module to identify lines that fall inside multiline string literals and marks them as protected. Only unprotected (code) lines participate in the base indentation calculation and stripping.

Changes

marimo/_ast/dedent.py (new): smart_dedent and _get_protected_lines — token-aware dedent that preserves string literal whitespace
marimo/_ast/compiler.py: replace textwrap.dedent with smart_dedent in cell_factory; remove stray import tokenize that caused TypeError
marimo/_ast/parse.py: replace textwrap.dedent in fixed_dedent with smart_dedent; preserve existing refill logic for AI-generated inconsistent indentation but skip refill on protected (string-content) lines
tests/_ast/test_parse.py: update test_fixed_dedent snapshot to reflect corrected behaviour

Pre-Review Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
Tests have been added for the changes made.
Documentation has been updated where applicable.


<!-- This is an auto-generated description by cubic. -->
<a href="https://cubic.dev/pr/marimo-team/marimo/pull/9877?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>
<!-- End of auto-generated description by cubic. -->

vercel · 2026-06-13T11:27:47Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
marimo-docs	Ready	Preview, Comment	Jun 25, 2026 10:07am

github-actions · 2026-06-13T11:27:53Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

ChidiebereNjoku · 2026-06-13T11:32:08Z

I have read the CLA Document and I hereby sign the CLA

mscolnick · 2026-06-15T14:16:51Z

@cubic-ai

cubic-dev-ai · 2026-06-15T14:17:06Z

@cubic-ai

@mscolnick I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

1 issue found across 4 files

Architecture diagram

sequenceDiagram
    participant Compiler as compiler.py
    participant Parser as parse.py (fixed_dedent)
    participant Dedent as dedent.py (smart_dedent)
    participant Tokenizer as Python tokenize

    Note over Compiler,Dedent: Cell Loading & Dedentation Flow

    Compiler->>Dedent: CHANGED: cell_factory(), toplevel_cell_factory(), context_cell_factory()
    Note over Compiler: Was calling textwrap.dedent()
    Compiler->>Dedent: smart_dedent(code) - cell code being compiled

    Parser->>Dedent: CHANGED: fixed_dedent() - AI/edited code path
    Note over Parser: Now calls smart_dedent() for final strip

    rect rgb(240, 240, 240)
        Note over Dedent: NEW: Token-Aware Dedent Logic

        Dedent->>Dedent: Split code into lines
        Dedent->>Tokenize: CHANGED: tokenize.generate_tokens()
        Tokenize-->>Dedent: Token stream

        Dedent->>Dedent: _get_protected_lines() - identify STRING tokens with newlines
        Note over Dedent: Marks lines inside """ ... """ as protected

        Dedent->>Dedent: Calculate min_indent from unprotected lines only
        alt No indent needed (min_indent == 0 or infinity)
            Dedent-->>Compiler: Return original code unchanged
        else Indent found
            Dedent->>Dedent: Strip min_indent from unprotected lines
            Note over Dedent: Protected (string-content) lines untouched
            Dedent-->>Compiler: Return dedented code with strings preserved
        end
    end

    rect rgb(240, 235, 255)
        Note over Parser: fixed_dedent() Additional Logic

        Parser->>Dedent: Call _get_protected_lines()
        Parser->>Parser: refill() skips protected lines
        Note over Parser: AI-generated inconsistent indentation only fixed for code lines
        Parser->>Dedent: Finally call smart_dedent() on result
    end

    Note over Compiler,Parser: Result: Multiline string literal content is never stripped of leading whitespace

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

mscolnick · 2026-06-15T14:22:58Z

+    except TokenError:
+        return protected
+    for tok_type, tok_string, tok_start, tok_end, _ in tokens:
+        if tok_type == tokenize.STRING and "\n" in tok_string:


this might not worth for f-strings

mscolnick · 2026-06-15T14:23:18Z

can we add some tests for this specific function? i dont think this works on f-strings currently (hard to tell from just reading the existing tests)

dmadisetti

Thanks!

My largest concern is that this isn't exactly what the user sees in front end, but I understand how it's maybe useful for script mode comparison. We do rewrite the AST, but I wonder if one way conversions would be better and only if the line falls under the standard indent.

Happy to be wrong, but it does seem like our current doc strings will get formatted weird and round tripping is not supported. What happens to:

"""Cell to compute sum
e.g. a=1; b=2 \imples c=3
foo
"""
a + b

Under this scheme it should become

def _():
    """Cell to compute sum
e.g. a=1; b=2 \imples c=3
foo
"""
    a + b

That seems a little counter intuitive and not currently in code. Conversely

def _():
   """Cell to compute sum
   e.g. a=1; b=2 \imples c=3
   foo
   """
   a + b

becomes

"""Cell to compute sum
   e.g. a=1; b=2 \imples c=3
   foo
   """
a + b

So maybe we only trigger in the following case:

def _():
    """Cell to compute sum
  e.g. a=1; b=2 \imples c=3 <--- under indented!
    foo
    """
    a + b

"""Cell to compute sum
  e.g. a=1; b=2 \imples c=3 <--- under indented!
    foo
    """
a + b

Can we also throw in a test for top level functions? e.g.

@app.function
def foo():
    """I wonder what
    Happens to this ?
    """
    return ...

@app.function
def bar():
    """Also wonder what
 Happens to that ?
    """
    return ...

dmadisetti · 2026-06-15T18:01:30Z

@@ -56,22 +56,31 @@ def ast_parse(

 def fixed_dedent(text: str) -> str:


Can you move this to dedent while we're here?

ChidiebereNjoku · 2026-06-17T07:09:18Z

@mscolnick Thanks for flagging this. f-strings work correctly because Python's tokenize module classifies f"""...""" as a STRING token just like regular triple-quoted strings, the token type is the same regardless of the f, r, or b prefix. I'll add explicit tests for f-strings and top-level function docstrings to make this clear.

ChidiebereNjoku · 2026-06-17T07:10:14Z

@dmadisetti Good catches man. I'll move fixed_dedent into dedent.py to consolidate all dedent logic in one place. On the docstring concern, you're right that a docstring whose content lines start at column 0 (less indented than the function body) is a valid edge case. smart_dedent handles it correctly: those lines are marked protected and emitted byte-for-byte, so they won't be touched. I'll add a test covering this case.

ChidiebereNjoku · 2026-06-17T12:02:49Z

@mscolnick You were right. Python 3.12+ tokenizes f-strings as FSTRING_START/FSTRING_MIDDLE/FSTRING_END instead of a single STRING token, so they weren't being protected by the original code. Fixed _get_protected_lines to handle both cases. Added explicit tests for f-strings (test_fstring_interior_preserved, test_fstring_preserved) and top-level function docstrings as requested. All 14 new tests pass.

ChidiebereNjoku · 2026-06-17T12:03:32Z

@dmadisetti Done, fixed_dedent is now in dedent.py alongside smart_dedent and _get_protected_lines. Also added test_docstring_content_at_column_zero which confirms that docstring content lines less indented than the function body are correctly preserved byte-for-byte.

dmadisetti · 2026-06-23T22:27:43Z

Woops. I just broke this PR trying to merge res. Sorry this fell off my plate, for reference hit review to to let people know you're ready

textwrap.dedent blindly strips leading whitespace from all lines, including content inside triple-quoted string literals, mutating user data. Introduce a token-aware smart_dedent in marimo/_ast/dedent.py that marks lines inside multiline strings as protected before stripping base indentation. Changes: - Add marimo/_ast/dedent.py with smart_dedent and _get_protected_lines - Replace textwrap.dedent with smart_dedent in compiler.py cell_factory - Replace textwrap.dedent with smart_dedent in parse.py fixed_dedent, keeping the refill logic for AI-generated inconsistent indentation but skipping refill on protected (string-content) lines - Remove stray bare 'import tokenize' that shadowed the imported tokenize function with the module object, causing TypeError - Update test_fixed_dedent snapshot to reflect corrected behaviour

for more information, see https://pre-commit.ci

…E741)

for more information, see https://pre-commit.ci

- Fix _get_protected_lines to handle Python 3.12+ f-string tokens (FSTRING_START/FSTRING_MIDDLE/FSTRING_END) in addition to regular STRING tokens, so f-string content lines are correctly protected - Eliminate duplicate tokenize loop by passing tokens into _get_protected_lines instead of re-tokenizing - Move fixed_dedent from parse.py into dedent.py to consolidate all dedent logic in one module (per dmadisetti's request) - Add tests/_ast/test_dedent.py with 14 tests covering: basic dedent, regular/f-string/r-string multiline preservation, docstrings with content at column zero, top-level functions, inconsistent indentation

for more information, see https://pre-commit.ci

ChidiebereNjoku · 2026-06-25T10:11:04Z

Woops. I just broke this PR trying to merge res. Sorry this fell off my plate, for reference hit review to to let people know you're ready

No worries. All conflicts resolved and CI is green again. Ready for re-review whenever you get a chance. @dmadisetti

Copilot

Pull request overview

This PR fixes a correctness issue in marimo’s AST-based cell loading: dedenting extracted cell code should not mutate the contents of multiline (triple-quoted) string literals, preserving Python’s runtime string semantics.

Changes:

Add token-aware smart_dedent / fixed_dedent utilities to dedent code while preserving multiline string literal interior whitespace.
Switch cell/code extraction to use smart_dedent / fixed_dedent instead of textwrap.dedent.
Update and extend tests to cover multiline string preservation and related edge cases.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`marimo/_ast/dedent.py`	New token-aware dedent implementation (`smart_dedent`, `fixed_dedent`, helpers).
`marimo/_ast/compiler.py`	Use `smart_dedent` when extracting cell code from source.
`marimo/_ast/parse.py`	Replace prior inline `fixed_dedent` with import from `marimo._ast.dedent`.
`tests/_ast/test_parse.py`	Update snapshots/expectations for corrected dedent behavior.
`tests/_ast/test_dedent.py`	New unit tests for `split_source_lines`, `smart_dedent`, and `fixed_dedent`.

+    min_indent: float = float("inf")
+    for i, line in enumerate(lines):
+        if protected[i]:
+            continue
+        stripped = line.lstrip()
+        if not stripped:
+            continue
+        min_indent = min(min_indent, len(line) - len(stripped))
+


+    for tok_type, tok_string, tok_start, tok_end, _ in tokens:
+        # Regular triple-quoted strings: single STRING token spanning lines
+        if tok_type == tokenize.STRING and "\n" in tok_string:
+            start_line = tok_start[0] - 1
+            end_line = tok_end[0] - 1


+    Unlike `str.splitlines()`, this only treats `\n`, `\r`, and `\r\n` as
+    line breaks. `str.splitlines()` additionally splits on `\f`, `\v`, the
+    `\x1c`-`\x1e` separators, and Unicode line separators.


vercel Bot deployed to Preview June 13, 2026 11:29 View deployment

vercel Bot deployed to Preview June 13, 2026 14:05 View deployment

vercel Bot deployed to Preview June 13, 2026 14:10 View deployment

mscolnick requested a review from dmadisetti June 15, 2026 13:43

cubic-dev-ai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread marimo/_ast/dedent.py Outdated

mscolnick reviewed Jun 15, 2026

View reviewed changes

dmadisetti reviewed Jun 15, 2026

View reviewed changes

dmadisetti linked an issue Jun 15, 2026 that may be closed by this pull request

Cell loader dedents lines inside multiline string literals — runtime value differs from Python semantics #9851

Open

vercel Bot deployed to Preview June 17, 2026 12:01 View deployment

vercel Bot deployed to Preview June 17, 2026 12:17 View deployment

vercel Bot deployed to Preview June 17, 2026 12:20 View deployment

dmadisetti self-requested a review June 23, 2026 21:15

dmadisetti added the bug Something isn't working label Jun 23, 2026

vercel Bot deployed to Preview June 23, 2026 21:17 View deployment

vercel Bot deployed to Preview June 23, 2026 22:27 View deployment

ChidiebereNjoku and others added 4 commits June 25, 2026 09:53

[pre-commit.ci] auto fixes from pre-commit.com hooks

c55b5ec

for more information, see https://pre-commit.ci

fix: rename ambiguous variable l to ln (ruff E741)

93eaa57

fix: rename ambiguous variable l to ln in generator expression (ruff …

551f3b7

…E741)

pre-commit-ci Bot and others added 6 commits June 25, 2026 09:53

[pre-commit.ci] auto fixes from pre-commit.com hooks

0c9ccf1

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

1726ea5

for more information, see https://pre-commit.ci

fix: remove unused variable n in _get_protected_lines (ruff F841)

f3c3e65

fix: restore n and lines in _get_protected_lines (ruff F821)

d41b580

fix: bad merge res

44c442e

ChidiebereNjoku force-pushed the fix/smart-dedent-multiline-strings branch from 9f465b7 to 44c442e Compare June 25, 2026 09:58

vercel Bot had a problem deploying to Preview June 25, 2026 09:59 Failure

ChidiebereNjoku and others added 2 commits June 25, 2026 10:01

fix: remove orphaned conflict markers from parse.py

1d9834c

[pre-commit.ci] auto fixes from pre-commit.com hooks

2d19283

for more information, see https://pre-commit.ci

vercel Bot deployed to Preview June 25, 2026 10:03 View deployment

ChidiebereNjoku requested a review from mscolnick June 25, 2026 10:04

fix: remove duplicate split_source_lines import (ruff F811)

1f26248

vercel Bot deployed to Preview June 25, 2026 10:07 View deployment

kirangadhave requested a review from Copilot June 26, 2026 19:45

Copilot started reviewing on behalf of kirangadhave June 26, 2026 19:45 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

		@@ -56,22 +56,31 @@ def ast_parse(

		def fixed_dedent(text: str) -> str:

Uh oh!

Conversation

ChidiebereNjoku commented Jun 13, 2026 • edited by dmadisetti Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Changes

Pre-Review Checklist

Uh oh!

vercel Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChidiebereNjoku commented Jun 13, 2026

Uh oh!

mscolnick commented Jun 15, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 15, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mscolnick Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

mscolnick commented Jun 15, 2026

Uh oh!

dmadisetti left a comment

Choose a reason for hiding this comment

Uh oh!

dmadisetti Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

ChidiebereNjoku commented Jun 17, 2026

Uh oh!

ChidiebereNjoku commented Jun 17, 2026

Uh oh!

ChidiebereNjoku commented Jun 17, 2026

Uh oh!

ChidiebereNjoku commented Jun 17, 2026

Uh oh!

dmadisetti commented Jun 23, 2026

Uh oh!

ChidiebereNjoku commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ChidiebereNjoku commented Jun 13, 2026 •

edited by dmadisetti

Loading

vercel Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading