-
Notifications
You must be signed in to change notification settings - Fork 63
Fundamental concepts
Benjamin Ooghe-Tabanou edited this page Dec 21, 2012
·
1 revision
In order to provide the best hypertextual corpus possible, HCI has to define a new way of dealing with web pages, aggregate of web pages and crawling. To do so, it was necessary to define fundamental concepts the software relies on.
- Our Documentary Model of the Web is the way our system understands the web as a structure. It involves:
- Reverse URLs are what a URL becomes after Tokenization. It's how a URL is expressed in the file system metaphor.
- Pages are documents returned by a given URL.
- Nodes are pages from which the system keeps all links information (see Precision_limit)
- Web entities are lists of LRU prefixes. It's the object manipulated by the user.
- cascading definition
- a LRU can be a PAGE or a 404 error or Forbidden see LRU prefixes
- a PAGE can be a NODE or a just a page without links because of Precision_limit
- a NODE can be a WEB ENTITY or be included inside an other web entity more generic
- exception in the cascade:
- a LRU_PREFIX can be a WEB ENTITY
- Web corpus define what the user corpus is
- Memory structure define the way stems are created and stored for maximal efficiency