Skip to content

Conversation

chrimaho
Copy link

One incredibly helpful function in Python is the textwrap.dedent function. Under the hood, this function uses regex to strip any leading spaces, while maintaining any internal indentation within a chunk of code.

This addition here re-implements the same functionality using native R code.

I've ensured to include 4 different unit tests for the same.

@arnaudgallou
Copy link
Contributor

arnaudgallou commented Sep 5, 2024

I'd also like to have a dedent() function in R. Two comments:

  • Wouldn't it make sense to trim leading and trailing whitespaces in the output or add an argument to do so? That would be very useful when making character strings from multi-line text in R:

    str_dedent("
      This is a long sentence that starts on the first line,
      continues on the second,
      and ends on the third one.
    ")

    For readability purposes, it's better to start the text on its own line here. We wouldn't want to keep that blank line at the beginning though. Passing the output to str_trim() every time in that situation would be cumbersome.

  • The function currently doesn't support the following situation (when python's textwrap.dedent() does):

    str_dedent("
        foo
      bar
    ")

    It's probably uncommon enough not to support it but that could be detailed in the function documentation.

Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if this function was consistent with glue::glue(), at least in important cases, so users don't have to remember differences or look up documentation every time.

I think this means in particular trimming the trailing whitespace by default:

glue::glue("\nfoo\n") |> unclass()
#> [1] "foo"

There are other subtle differences where it's probably not helpful to follow glue. For instance, the indentation is computed from all empty lines, including those that are trimmed, in a way I don't fully understand:

glue::glue("\n \n\n ") |> unclass()
#> [1] "\n"
glue::glue("\n  \n\n ") |> unclass()
#> [1] " \n"
glue::glue("\n  \n\n  ") |> unclass()
#> [1] "\n"
glue::glue("\n  \n\n   ") |> unclass()
#> [1] "  \n"

I also don't understand the behaviour with 2 empty lines, I would have expected "" here:

glue::glue("\n") |> unclass()
#> [1] ""
glue::glue("\n\n") |> unclass()
#> [1] "\n"
glue::glue("\n\n\n") |> unclass()
#> [1] "\n"
glue::glue("\n\n\n\n") |> unclass()
#> [1] "\n\n"
glue::glue("\n\n\n\n\n") |> unclass()
#> [1] "\n\n\n"

For comparison (one more empty line than glue due to trailing one being preserved as of now):

str_dedent("\n") |> unclass()
#> [1] ""
str_dedent("\n\n") |> unclass()
#> [1] "\n"
str_dedent("\n\n\n") |> unclass()
#> [1] "\n\n"
str_dedent("\n\n\n\n") |> unclass()
#> [1] "\n\n\n"
str_dedent("\n\n\n\n\n") |> unclass()
#> [1] "\n\n\n\n"

So while I think compatibility with glue in terms of deleting empty trailing lines would be nice for this function, full compat is probably not worth pursuing.

@DavisVaughan
Copy link
Member

Like @lionel-, I think I am also surprised by the trailing \n here

library(stringr)
library(glue)

glue_chr <- function(...) { 
  unclass(glue(...))
}

str_dedent("
  Line 1
  Line 2
  Line 3
")
#> [1] "Line 1\nLine 2\nLine 3\n"
glue_chr("
  Line 1
  Line 2
  Line 3
")
#> [1] "Line 1\nLine 2\nLine 3"

I would have expected the above to give this output

str_dedent("
  Line 1
  Line 2
  Line 3")
#> [1] "Line 1\nLine 2\nLine 3"
glue_chr("
  Line 1
  Line 2
  Line 3")
#> [1] "Line 1\nLine 2\nLine 3"

I think an invariant of this function could be:

Strips all leading and trailing whitespace from the output

which provides a nice symmetry and nice user experience

Comment on lines +31 to +33
#' It does this by removing the common leading indentation from each line
#' (ignoring lines only containing whitespace), and removing the first line,
#' if it only contains whitespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' It does this by removing the common leading indentation from each line
#' (ignoring lines only containing whitespace), and removing the first line,
#' if it only contains whitespace.
#' It does this by:
#' - Trimming all leading and trailing whitespace
#' - Removing the first line if it only contains whitespace
#' - Removing the common leading indentation from each line (excluding lines containing only whitespace)

Here is my suggestion for the invariants of how this function should work (if we trim leading and trailing whitespace). (Needs a document() call)

@hadley
Copy link
Member

hadley commented Sep 24, 2025

@DavisVaughan you mean "Strips all leading and trailing whitespace lines from the output" right?

And you both really think we don't want a trailing new line? If you were going to cat() this you would want a trailing \n?

@jennybc
Copy link
Member

jennybc commented Sep 24, 2025

First, I'll just add some bits from glue's documentation for reference:

Empty first and last lines are automatically trimmed, as is leading whitespace that is common across all lines.
...
If you want an explicit newline at the start or end, include an extra empty line.
...
Leading and trailing whitespace from the first and last lines is removed.

A uniform amount of indentation is stripped from the second line on, equal to the minimum indentation of all non-blank lines after the first.

As for this:

And you both really think we don't want a trailing new line? If you were going to cat() this you would want a trailing \n?

If you're just cat()ing the result of 1 str_dedent() call, I think it doesn't matter because the R console basically adds the trailing newline. And in more complicated situations, this is when I'd probably use cli::cat_line() anyway.

return(lines)
}

ws <- str_length(str_extract(lines, "^ *"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ws <- str_length(str_extract(lines, "^ *"))
ws <- str_length(str_extract(lines, "^[ \t]*"))

Seems like we should also be thinking about tabs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to just ignore them since we don't use them. If we did, I think we'd need to multiply by the tab stop size.

...

Hmmm, I guess we need to think about this because people might be using tabs for indenting.

@hadley
Copy link
Member

hadley commented Sep 24, 2025

@jennybc that's only true if it's at the top-level (i.e. if you cat multiple times before returning, you need newlines in between them).

I'm not sure why I'm so far apart on trailing newlines than the rest of you. I thought this was a situation where preserving them was "obviously correct".

@jennybc
Copy link
Member

jennybc commented Sep 24, 2025

if you cat multiple times before returning, you need newlines in between them)

I guess that's when I would use cli::cat_line().

We're talking a lot about glue, which stringr imports. Which makes me wonder ... why isn't str_dedent() just glue::trim()? 🤔

@hadley
Copy link
Member

hadley commented Sep 24, 2025

That is a good question. If I replace the existing implementation with a direct call to glue::trim() then I get the following failures:

── Failure ([test-remove.R:12:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): strips common ws ──────────────────────────────────────────────────────────────────────────────
str_dedent("  Hello\n    World") (`actual`) not equal to "Hello\n  World" (`expected`).

`lines(actual)`:   "Hello" "World"  
`lines(expected)`: "Hello" "  World"

── Failure ([test-remove.R:13:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): strips common ws ──────────────────────────────────────────────────────────────────────────────
str_dedent("    Hello\n  World") (`actual`) not equal to "  Hello\nWorld" (`expected`).

`lines(actual)`:   "Hello"   "World"
`lines(expected)`: "  Hello" "World"

── Failure ([test-remove.R:25:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): preserves final newline ───────────────────────────────────────────────────────────────────────
str_dedent("  Hello\n  World\n") (`actual`) not equal to "Hello\nWorld\n" (`expected`).

`lines(actual)`:   "Hello" "World"   
`lines(expected)`: "Hello" "World" ""

── Failure ([test-remove.R:35:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): preserves final newline ───────────────────────────────────────────────────────────────────────
str_dedent("\n      Hello\n      World\n    ") (`actual`) not equal to "Hello\nWorld\n" (`expected`).

`lines(actual)`:   "  Hello" "  World"   
`lines(expected)`: "Hello"   "World"   ""

I can make most of them go away by adding a leading \n to the strings (which better reflects real use), but the remaining weirdness is this:

cat(
  glue::trim("
    Hello
      World
  ")
)
#>   Hello
#>     World

Created on 2025-09-24 with reprex v2.1.1

I find the extra indent here surprising. But maybe we could fix that?

@lionel-
Copy link
Member

lionel- commented Sep 25, 2025

If you're just cat()ing the result of 1 str_dedent() call, I think it doesn't matter because the R console basically adds the trailing newline

The Positron and RStudio consoles do (at top-level as Hadley mentions), but not the R console:

Screenshot 2025-09-25 at 09 57 51

If you were going to cat() this you would want a trailing \n?

I would expect the caller of cat() to add the trailing \n. But I think that's a case where I'd pipe the output to writeLines() (or use cat_line() as Jenny suggests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants