Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add additional meta data to the index from json files #2101

Open
svenfeld opened this issue Oct 31, 2024 · 1 comment
Open

How to add additional meta data to the index from json files #2101

svenfeld opened this issue Oct 31, 2024 · 1 comment

Comments

@svenfeld
Copy link

I have a data setup which contains the content insides pdfs and json files with addtional metadata (the jsons and pdfs share the same name).
I want to use the json files as addtional meta data for the pdfs. My approach was to parse the json as additional fields to the index but did not succeed.

I looked into this
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/customization.md#other-approaches-to-improve-search-results

and tried making the changes to the searchmanager.py but could not manage to receive the result I wanted. I added additional 'SimpleFields' but these were null after running'prepdocs.py'
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/prepdocslib/searchmanager.py#L106

The json is structured like this.

{
    "title":"product",
    "isRelevant":true,
    "location":"en",
    "remarks":"comments",
    "thumbnailURL":"https://",
    "fileName":"file.pdf"
 }

Can someone help out on this to create the index correctly? Or suggest another approach that would add the json meta data to the pdf to improve the search?

@pamelafox
Copy link
Collaborator

If your search index already exists, then you'll need to add additional fields in this if block:

logger.info("Search index %s already exists", self.search_info.index_name)

Then, once those search fields exist, you need to change the update_content function to add the new fields to the uploaded document chunks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants