Combining Linguistic and Spatial Information for
Document Analysis
Marco Aiello, Christof Monz, and Leon Todoran
In: J. Mariani and D. Harman (Eds.) Proceedings of RIAO'2000
Content-Based Multimedia Information Access, CID,
2000. pp. 266-275.
We present a framework to analyze color documents of complex
layout. In addition, no assumption is made on the layout. Our
framework combines in a content-driven bottom-up approach two
different sources of information: textual and spatial. To analyze the
text, shallow natural language processing tools, such as taggers and
partial parsers, are used. To infer relations of the logical layout we
resort to a qualitative spatial calculus closely related to Allen's
calculus. We evaluate the system against documents from a color
journal and present the results of extracting the reading order from
the journal's pages. In this case, our analysis is successful as it
extracts the intended reading order from the document.
|
@InProceedings{aiel:00comb,
author = {Aiello, M. and Monz, C. and Todoran, L.},
title = {Combining Linguistic and Spatial Information
for Document Analysis},
booktitle = {Proceedings of {RIAO'2000} Content-Based
Multimedia Information Access},
pages = {266--275},
year = 2000,
editor = {Mariani, J. and Harman, D.}
}
|