feat(text-splitters): add R programming language support #33931
+23
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR adds support for the R programming language to LangChain's text splitters, enabling users to intelligently split R code while preserving semantic structure.
Changes:
Language.Renum value to the Language enumRecursiveCharacterTextSplitter.get_separators_for_language()<- function,= function)library(),require())if(),for(),while(),switch())data.frame(),list())Why This Matters:
R is widely used in data science, statistical analysis, machine learning, and bioinformatics. This change enables users working with R codebases to properly chunk R scripts for RAG applications and build AI assistants that understand R code structure.
Example Usage:
Testing:
Tested locally with various R code samples including package imports, function definitions (both
<-and=syntax), control flow statements, and data structure creation. All separators work correctly with proper regex escaping.Issue: Fixes #33824
Dependencies: None