Web Page Segmentation: A Review

Web pages are typically designed for visual interaction. In order to support visual interaction they are designed to include a number of visual segments. These visual segments typically include different kinds of information, for example they are used to segment a web page into a number of logical sections such as header, footer, menu, etc. They are also used to differentiate the presentation of different kinds of information. For example, on the news site they are used to differentiate different news items. This technical report aims to review what has been done in the literature to automatically identify such segments in a web page. This technical report reviews the state of the art segmentation algorithms. It reviews the literature with a systematic framework which aims to summarize the five Ws – the ‘Who, What, Where, When, and Why’ questions that need to be addressed to understand roles of web page segmentation.