Google repository stores the full HTML of all web pages in a repository nothing but a safety place.
Every website is stored as packet that includes a doc ID identity the document,length of the URL,length of the page,URL&HTML of the page. These packets are stored in a compressed from to save space.
To find a webpage in the repository requires either a pointer to the file (or) going through the whole file.
Hit list:
A hit list corresponds to an occurrence of a word in a webpage. It does not include the word itself (or) the word ID corresponds to the word.
Every website is stored as packet that includes a doc ID identity the document,length of the URL,length of the page,URL&HTML of the page. These packets are stored in a compressed from to save space.
To find a webpage in the repository requires either a pointer to the file (or) going through the whole file.
Hit list:
A hit list corresponds to an occurrence of a word in a webpage. It does not include the word itself (or) the word ID corresponds to the word.
No comments:
Post a Comment