-
Notifications
You must be signed in to change notification settings - Fork 195
Add new str_dedent
function
#514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'd also like to have a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if this function was consistent with glue::glue()
, at least in important cases, so users don't have to remember differences or look up documentation every time.
I think this means in particular trimming the trailing whitespace by default:
glue::glue("\nfoo\n") |> unclass()
#> [1] "foo"
There are other subtle differences where it's probably not helpful to follow glue. For instance, the indentation is computed from all empty lines, including those that are trimmed, in a way I don't fully understand:
glue::glue("\n \n\n ") |> unclass()
#> [1] "\n"
glue::glue("\n \n\n ") |> unclass()
#> [1] " \n"
glue::glue("\n \n\n ") |> unclass()
#> [1] "\n"
glue::glue("\n \n\n ") |> unclass()
#> [1] " \n"
I also don't understand the behaviour with 2 empty lines, I would have expected ""
here:
glue::glue("\n") |> unclass()
#> [1] ""
glue::glue("\n\n") |> unclass()
#> [1] "\n"
glue::glue("\n\n\n") |> unclass()
#> [1] "\n"
glue::glue("\n\n\n\n") |> unclass()
#> [1] "\n\n"
glue::glue("\n\n\n\n\n") |> unclass()
#> [1] "\n\n\n"
For comparison (one more empty line than glue due to trailing one being preserved as of now):
str_dedent("\n") |> unclass()
#> [1] ""
str_dedent("\n\n") |> unclass()
#> [1] "\n"
str_dedent("\n\n\n") |> unclass()
#> [1] "\n\n"
str_dedent("\n\n\n\n") |> unclass()
#> [1] "\n\n\n"
str_dedent("\n\n\n\n\n") |> unclass()
#> [1] "\n\n\n\n"
So while I think compatibility with glue in terms of deleting empty trailing lines would be nice for this function, full compat is probably not worth pursuing.
Like @lionel-, I think I am also surprised by the trailing library(stringr)
library(glue)
glue_chr <- function(...) {
unclass(glue(...))
}
str_dedent("
Line 1
Line 2
Line 3
")
#> [1] "Line 1\nLine 2\nLine 3\n"
glue_chr("
Line 1
Line 2
Line 3
")
#> [1] "Line 1\nLine 2\nLine 3" I would have expected the above to give this output str_dedent("
Line 1
Line 2
Line 3")
#> [1] "Line 1\nLine 2\nLine 3"
glue_chr("
Line 1
Line 2
Line 3")
#> [1] "Line 1\nLine 2\nLine 3" I think an invariant of this function could be: Strips all leading and trailing whitespace from the output which provides a nice symmetry and nice user experience |
#' It does this by removing the common leading indentation from each line | ||
#' (ignoring lines only containing whitespace), and removing the first line, | ||
#' if it only contains whitespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' It does this by removing the common leading indentation from each line | |
#' (ignoring lines only containing whitespace), and removing the first line, | |
#' if it only contains whitespace. | |
#' It does this by: | |
#' - Trimming all leading and trailing whitespace | |
#' - Removing the first line if it only contains whitespace | |
#' - Removing the common leading indentation from each line (excluding lines containing only whitespace) |
Here is my suggestion for the invariants of how this function should work (if we trim leading and trailing whitespace). (Needs a document()
call)
@DavisVaughan you mean "Strips all leading and trailing whitespace lines from the output" right? And you both really think we don't want a trailing new line? If you were going to |
First, I'll just add some bits from glue's documentation for reference:
As for this:
If you're just |
return(lines) | ||
} | ||
|
||
ws <- str_length(str_extract(lines, "^ *")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ws <- str_length(str_extract(lines, "^ *")) | |
ws <- str_length(str_extract(lines, "^[ \t]*")) |
Seems like we should also be thinking about tabs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to just ignore them since we don't use them. If we did, I think we'd need to multiply by the tab stop size.
...
Hmmm, I guess we need to think about this because people might be using tabs for indenting.
@jennybc that's only true if it's at the top-level (i.e. if you cat multiple times before returning, you need newlines in between them). I'm not sure why I'm so far apart on trailing newlines than the rest of you. I thought this was a situation where preserving them was "obviously correct". |
I guess that's when I would use We're talking a lot about glue, which stringr imports. Which makes me wonder ... why isn't |
That is a good question. If I replace the existing implementation with a direct call to
I can make most of them go away by adding a leading cat(
glue::trim("
Hello
World
")
)
#> Hello
#> World Created on 2025-09-24 with reprex v2.1.1 I find the extra indent here surprising. But maybe we could fix that? |
One incredibly helpful function in Python is the
textwrap.dedent
function. Under the hood, this function uses regex to strip any leading spaces, while maintaining any internal indentation within a chunk of code.This addition here re-implements the same functionality using native R code.
I've ensured to include 4 different unit tests for the same.