Skip to content

Conversation

@gorkachea
Copy link

Description:

This PR adds support for the R programming language to LangChain's text splitters, enabling users to intelligently split R code while preserving semantic structure.

Changes:

  • Added Language.R enum value to the Language enum
  • Implemented R-specific separators in RecursiveCharacterTextSplitter.get_separators_for_language()
  • Splits R code along:
    • Function definitions (<- function, = function)
    • Package loading (library(), require())
    • Control flow statements (if(), for(), while(), switch())
    • Data structure creation (data.frame(), list())
    • Standard separators (double newlines, single newlines, spaces)

Why This Matters:

R is widely used in data science, statistical analysis, machine learning, and bioinformatics. This change enables users working with R codebases to properly chunk R scripts for RAG applications and build AI assistants that understand R code structure.

Example Usage:

from langchain_text_splitters import Language, RecursiveCharacterTextSplitter

r_code = """
library(dplyr)

calculate_mean <- function(x) {
  mean(x, na.rm = TRUE)
}
"""

splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.R,
    chunk_size=100,
    chunk_overlap=0
)

chunks = splitter.split_text(r_code)
# Splits intelligently at library calls and function definitions

Testing:
Tested locally with various R code samples including package imports, function definitions (both <- and = syntax), control flow statements, and data structure creation. All separators work correctly with proper regex escaping.

Issue: Fixes #33824

Dependencies: None

- Add Language.R enum value
- Implement R-specific separators for RecursiveCharacterTextSplitter
- Split along function definitions (<- function, = function)
- Split along package loading (library, require)
- Split along control flow statements (if, for, while, switch)
- Split along data structure creation (data.frame, list)

This enables users to properly split R code files while preserving
semantic structure, addressing feature request from community.

Signed-off-by: Gorka Bengochea <[email protected]>
@github-actions github-actions bot added the text-splitters Related to the package `text-splitters` label Nov 11, 2025
@gorkachea gorkachea changed the title ✨ Add R programming language support to text splitters feat(text-splitters): add R programming language support Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature text-splitters Related to the package `text-splitters`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add R programming to langchain_text_splitters.Language

1 participant