Abstract
In this paper, we propose a hybrid machine learning approach to Information Extraction by combining conventional text classification techniques and Hidden Markov Models (HMM). A text classifier generates a (locally optimal) initial output, which is refined by an HMM, providing a globally optimal classification. The proposed approach was evaluated in two case studies and the experiments revealed a consistent gain in performance through the use of the HMM. In the first case study, the implemented prototype was used to extract information from bibliographic references, reaching a precision rate of 87.48% in a test set with 3000 references. In the second case study, the prototype extracted information from author affiliations, reaching a precision rate of 90.27% in a test set with 300 affiliations.
BibTeX
@article{barros2009combining,
title={Combining text classifiers and hidden markov models for information extraction},
author={Barros, Flavia A and Silva, Eduardo FA and Prud{\^e}ncio, Ricardo BC and Filho, Valmir M and Nascimento, Andre CA},
journal={International Journal on Artificial Intelligence Tools},
volume={18},
number={02},
pages={311--329},
year={2009},
publisher={World Scientific}
}