feat(mongo reader): field_extractors #18063

david20571015 · 2025-03-08T19:13:45Z

Description

Add field_extractors to allow users to customize how data is extracted from MongoDB.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

logan-markewich · 2025-03-09T22:15:31Z

...a-index-integrations/readers/llama-index-readers-mongodb/llama_index/readers/mongodb/base.py

@@ -40,12 +41,6 @@ def __init__(

        self.client = client

-    def _flatten(self, texts: List[Union[str, List[str]]]) -> List[str]:


Why do we remove flatten?

texts is already of type list[str], so calling _flatten(texts) simply returns texts itself without any modifications. Therefore, removing _flatten simplifies the code without changing its behavior.

logan-markewich · 2025-03-09T22:16:06Z

...a-index-integrations/readers/llama-index-readers-mongodb/llama_index/readers/mongodb/base.py

        for item in cursor:
            try:
-                texts = [str(item[name]) for name in field_names]
+                texts = [
+                    field_extractors.get(name, str)(item[name]) for name in field_names


.get(name, str) does not work? What does this return when name isn't in the dict?

It is dict.get, so it will return the str type (acting the same as the original) if name isn't in the dict.

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Mar 8, 2025

logan-markewich reviewed Mar 9, 2025

View reviewed changes

david20571015 requested a review from logan-markewich March 10, 2025 01:28

david20571015 added 3 commits March 10, 2025 09:39

chore: simplifies the code since it is an identity function here

a7fda0f

feat(mongo reader): field_extractor

e05102c

fix: format

e02b503

david20571015 force-pushed the mongo-reader branch from 391928e to e02b503 Compare March 10, 2025 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mongo reader): field_extractors #18063

feat(mongo reader): field_extractors #18063

david20571015 commented Mar 8, 2025

logan-markewich Mar 9, 2025

david20571015 Mar 10, 2025

logan-markewich Mar 9, 2025

david20571015 Mar 10, 2025 •

edited

Loading

		@@ -40,12 +41,6 @@ def __init__(

		self.client = client

		def _flatten(self, texts: List[Union[str, List[str]]]) -> List[str]:

feat(mongo reader): field_extractors #18063

Are you sure you want to change the base?

feat(mongo reader): field_extractors #18063

Conversation

david20571015 commented Mar 8, 2025

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

logan-markewich Mar 9, 2025

Choose a reason for hiding this comment

david20571015 Mar 10, 2025

Choose a reason for hiding this comment

logan-markewich Mar 9, 2025

Choose a reason for hiding this comment

david20571015 Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

david20571015 Mar 10, 2025 •

edited

Loading