Automatic Discovery of Visual Elements of Web Pages

Web pages are typically designed for visual interaction – they include many visual elements to guide the reader. However, when they are accessed in alternative forms such as in audio, these visual elements are not available and therefore they become inaccessible. To address this problem, we have proposed an approach to identify visual elements in a web page and then characterize the semantic role of these elements. Our system architecture has three major components: 1. automatic identification of visual elements of web pages, 2. automatic generation of heuristics as Jess rules from an ontology and 3. application of these heuristic-based rules to web pages for automatic annotation of visual elements and their roles. The purpose of this technical report is to introduce our probabilistic approach and describe its technical details.