How to add additional meta data to the index from json files #2101

svenfeld · 2024-10-31T16:03:43Z

I have a data setup which contains the content insides pdfs and json files with addtional metadata (the jsons and pdfs share the same name).
I want to use the json files as addtional meta data for the pdfs. My approach was to parse the json as additional fields to the index but did not succeed.

I looked into this
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/customization.md#other-approaches-to-improve-search-results

and tried making the changes to the searchmanager.py but could not manage to receive the result I wanted. I added additional 'SimpleFields' but these were null after running'prepdocs.py'
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/prepdocslib/searchmanager.py#L106

The json is structured like this.

{
    "title":"product",
    "isRelevant":true,
    "location":"en",
    "remarks":"comments",
    "thumbnailURL":"https://",
    "fileName":"file.pdf"
 }

Can someone help out on this to create the index correctly? Or suggest another approach that would add the json meta data to the pdf to improve the search?

The text was updated successfully, but these errors were encountered:

pamelafox · 2024-11-01T19:50:23Z

If your search index already exists, then you'll need to add additional fields in this if block:

azure-search-openai-demo/app/backend/prepdocslib/searchmanager.py

Line 218 in 023dc1b

logger.info("Search index %s already exists", self.search_info.index_name)

Then, once those search fields exist, you need to change the update_content function to add the new fields to the uploaded document chunks:

azure-search-openai-demo/app/backend/prepdocslib/searchmanager.py

Line 264 in 023dc1b

{

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to add additional meta data to the index from json files #2101

How to add additional meta data to the index from json files #2101

svenfeld commented Oct 31, 2024

pamelafox commented Nov 1, 2024

How to add additional meta data to the index from json files #2101

How to add additional meta data to the index from json files #2101

Comments

svenfeld commented Oct 31, 2024

pamelafox commented Nov 1, 2024