Abstract

In this paper, we propose a hybrid machine learning approach to Information Extraction by combining conventional text classification techniques and Hidden Markov Models (HMM). A text classifier generates a (locally optimal) initial output, which is refined by an HMM, providing a globally optimal classification. The proposed approach was evaluated in two case studies and the experiments revealed a consistent gain in performance through the use of the HMM. In the first case study, the implemented prototype was used to extract information from bibliographic references, reaching a precision rate of 87.48% in a test set with 3000 references. In the second case study, the prototype extracted information from author affiliations, reaching a precision rate of 90.27% in a test set with 300 affiliations.

BibTeX

 @article{barros2009combining,
  title={Combining text classifiers and hidden markov models for information extraction},
  author={Barros, Flavia A and Silva, Eduardo FA and Prud{\^e}ncio, Ricardo BC and Filho, Valmir M and Nascimento, Andre CA},
  journal={International Journal on Artificial Intelligence Tools},
  volume={18},
  number={02},
  pages={311--329},
  year={2009},
  publisher={World Scientific}
  }