Yioop uses a summarizer to extract from a downloaded, or otherwise acquired document, text that it will add to its index. This text is also used for search result snippet generation. Only terms which appear in this summary can be used to look up a document. Text region scores, such as sentence scores, determined when a summary is made are used in calculating the order of search results.

The <b>Basic</b> summarizer computes a summary by proceeding top to bottom through the document looking for block level tags such as h1, div, p, etc. Based on the distance from the top of the document, the tag type, and the length of the tag's contents, a score for its contents is calculated. The highest scoring regions in the whole document up to the summary length are then returned in the order they appeared in the document as the summary.

The <b>Centroid</b> summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. For each sentence, a vector is made with components the terms appearing in the sentence, and with values the term frequency times the inverse sentence frequency of that term. Using the scores for each sentence, an average sentence vector is computed. Sentences are then ranked by their normalized inner product with the average sentence. Top scoring sentence up to the summary length are then returned in the order they appeared in the document as the summary.

The <b>Centroid-Weighted</b> summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. Then for each sentence it makes a normalized vector of term frequencies (no inverse sentence frequencies). It then computes a weighted average of these vectors where the weighting is based on distance from the start of the documents. The sentence closest to the average sentence based on inner product is determined. The components of this sentence are deleted from the average, and then the next best sentence is determined using the residual average. This process is continued until up-to-summary-length text has been extracted. Sentences found up to the summary length are then returned in the order they appeared in the document as the summary.

The <b>Graph-Based</b> summarizer computes a summary by stripping all tags from the document and then splitting the document into "sentence" units. An weighted adjacency matrix between sentences is then computed. The distance between two sentences is calculated using a distortion measure. Using this adjacency matrix, a sentence rank is computed using the power method (similar to Google's Page rank). Top scoring sentence up to the summary length are then returned in the order they appeared in the document as the summary.
X