Skip to content

Topic-Based Resource Enrichment #256

@alvaro-mazcu

Description

@alvaro-mazcu

Motivation

Right now, Twiga depends mainly on textbooks, which is good for staying grounded in the Tanzanian syllabus. But textbooks are not always enough on their own. Some topics would benefit from extra background knowledge, clearer definitions, linked entities, or complementary explanations. That is where Wikidata and, when useful, linked Wikipedia articles can help.

The useful idea here is not “replace the textbook with the internet”. It is to complement the textbook with structured and topic-linked external knowledge. Wikidata is a structured knowledge base built around items, and those items can also link to related Wikimedia pages such as Wikipedia articles. That makes it a good candidate for enriching Twiga’s resource layer in a more controlled way than just doing open web retrieval.

Background

The flow you have in mind makes sense: first explore the existing resources and identify the important topics, then map those topics to external knowledge, and finally add that material as new resources that Twiga can retrieve from. In practice, the most realistic interpretation is probably not “Wikidata articles”, because Wikidata itself is mainly structured data organized as items rather than article-style pages. A better framing is to use Wikidata for entity/topic mapping and then optionally pull in the corresponding Wikipedia article content when it is useful.

That is why the task should focus on enrichment rather than raw scraping. The value is in finding the right textbook topics, linking them to the right external concepts, and adding that information in a way that improves retrieval without polluting Twiga with noisy or off-syllabus content.

Goal

Build a first version of a resource enrichment pipeline that extracts relevant topics from Twiga’s existing resources, links them to Wikidata entities, and adds useful linked knowledge as additional resources for retrieval.

The outcome should help us answer a practical question: does enriching textbooks with topic-linked Wikidata/Wikipedia content improve Twiga’s coverage and answer quality without making retrieval noisier?

Plan

The developer should start by defining how to extract “relevant topics” from the current resources. This could be based on textbook structure, chapter titles, table of contents, repeated concepts, or chunk-level topic extraction. Once those topics exist, the next step is to map them to Wikidata items in a reliable way.

After that, the task should decide what to ingest. In many cases, Wikidata itself may be best used as the linking and normalization layer, while the human-readable explanatory content comes from the associated Wikipedia article when available. The implementation should stay careful here: the goal is to enrich Twiga with complementary knowledge, not to flood the system with generic external text. A small, well-grounded proof of concept is enough for the first version.

Useful links

  • Wikidata items overview: Wikidata is built around items representing concepts, topics, and objects. (wikidata.org)
  • Wikidata and Wikipedia relationship overview: Wikidata provides structured data and connects with Wikimedia projects such as Wikipedia. (wikidata.org)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions