Motivation
Right now, Twiga depends mainly on textbooks, which is good for staying grounded in the Tanzanian syllabus. But textbooks are not always enough on their own. Some topics would benefit from extra background knowledge, clearer definitions, linked entities, or complementary explanations. That is where Wikidata and, when useful, linked Wikipedia articles can help.
The useful idea here is not “replace the textbook with the internet”. It is to complement the textbook with structured and topic-linked external knowledge. Wikidata is a structured knowledge base built around items, and those items can also link to related Wikimedia pages such as Wikipedia articles. That makes it a good candidate for enriching Twiga’s resource layer in a more controlled way than just doing open web retrieval.
Background
The flow you have in mind makes sense: first explore the existing resources and identify the important topics, then map those topics to external knowledge, and finally add that material as new resources that Twiga can retrieve from. In practice, the most realistic interpretation is probably not “Wikidata articles”, because Wikidata itself is mainly structured data organized as items rather than article-style pages. A better framing is to use Wikidata for entity/topic mapping and then optionally pull in the corresponding Wikipedia article content when it is useful.
That is why the task should focus on enrichment rather than raw scraping. The value is in finding the right textbook topics, linking them to the right external concepts, and adding that information in a way that improves retrieval without polluting Twiga with noisy or off-syllabus content.
Goal
Build a first version of a resource enrichment pipeline that extracts relevant topics from Twiga’s existing resources, links them to Wikidata entities, and adds useful linked knowledge as additional resources for retrieval.
The outcome should help us answer a practical question: does enriching textbooks with topic-linked Wikidata/Wikipedia content improve Twiga’s coverage and answer quality without making retrieval noisier?
Plan
The developer should start by defining how to extract “relevant topics” from the current resources. This could be based on textbook structure, chapter titles, table of contents, repeated concepts, or chunk-level topic extraction. Once those topics exist, the next step is to map them to Wikidata items in a reliable way.
After that, the task should decide what to ingest. In many cases, Wikidata itself may be best used as the linking and normalization layer, while the human-readable explanatory content comes from the associated Wikipedia article when available. The implementation should stay careful here: the goal is to enrich Twiga with complementary knowledge, not to flood the system with generic external text. A small, well-grounded proof of concept is enough for the first version.
Useful links
- Wikidata items overview: Wikidata is built around items representing concepts, topics, and objects. (wikidata.org)
- Wikidata and Wikipedia relationship overview: Wikidata provides structured data and connects with Wikimedia projects such as Wikipedia. (wikidata.org)
Motivation
Right now, Twiga depends mainly on textbooks, which is good for staying grounded in the Tanzanian syllabus. But textbooks are not always enough on their own. Some topics would benefit from extra background knowledge, clearer definitions, linked entities, or complementary explanations. That is where Wikidata and, when useful, linked Wikipedia articles can help.
The useful idea here is not “replace the textbook with the internet”. It is to complement the textbook with structured and topic-linked external knowledge. Wikidata is a structured knowledge base built around items, and those items can also link to related Wikimedia pages such as Wikipedia articles. That makes it a good candidate for enriching Twiga’s resource layer in a more controlled way than just doing open web retrieval.
Background
The flow you have in mind makes sense: first explore the existing resources and identify the important topics, then map those topics to external knowledge, and finally add that material as new resources that Twiga can retrieve from. In practice, the most realistic interpretation is probably not “Wikidata articles”, because Wikidata itself is mainly structured data organized as items rather than article-style pages. A better framing is to use Wikidata for entity/topic mapping and then optionally pull in the corresponding Wikipedia article content when it is useful.
That is why the task should focus on enrichment rather than raw scraping. The value is in finding the right textbook topics, linking them to the right external concepts, and adding that information in a way that improves retrieval without polluting Twiga with noisy or off-syllabus content.
Goal
Build a first version of a resource enrichment pipeline that extracts relevant topics from Twiga’s existing resources, links them to Wikidata entities, and adds useful linked knowledge as additional resources for retrieval.
The outcome should help us answer a practical question: does enriching textbooks with topic-linked Wikidata/Wikipedia content improve Twiga’s coverage and answer quality without making retrieval noisier?
Plan
The developer should start by defining how to extract “relevant topics” from the current resources. This could be based on textbook structure, chapter titles, table of contents, repeated concepts, or chunk-level topic extraction. Once those topics exist, the next step is to map them to Wikidata items in a reliable way.
After that, the task should decide what to ingest. In many cases, Wikidata itself may be best used as the linking and normalization layer, while the human-readable explanatory content comes from the associated Wikipedia article when available. The implementation should stay careful here: the goal is to enrich Twiga with complementary knowledge, not to flood the system with generic external text. A small, well-grounded proof of concept is enough for the first version.
Useful links