In this paper, we present BioWizard, a bioinformatics knowledge discovery tool for extracting and validating implicit associations between biological entities. By mining specialized scientific literature, BioWizard not only generates biological hypotheses in the form of associations between genes, proteins and diseases, but also validates the plausibility of such associations against high-throughput biological data (microarrays) and annotated databases. The main novelties of the proposed approach are that: 1) it infers associations between biological entities by mining full text papers instead of only abstracts as usually performed by the existing tools, 2) a named entity recognition that improves the precision of the derived associations by enriching the vocabularies used in the mining loop with terms extracted directly from the text and, 3) the inferred associations are filtered according to their evidence in experimental data. We tested the precision and the recall of our system in retrieving known-associations (which did not appear in the same document) from gold standards and the results shown the ability of BioWizard in retrieving valid associations, thus providing a valuable tool for the use of biomedical researchers to speed up scientific progress. © 2012 IEEE

BioWizard: Discovering and validating associations between biological entities by integrated analysis of scientific literature and experimental data

SPAMPINATO, CONCETTO;GIORDANO, Daniela;
2012-01-01

Abstract

In this paper, we present BioWizard, a bioinformatics knowledge discovery tool for extracting and validating implicit associations between biological entities. By mining specialized scientific literature, BioWizard not only generates biological hypotheses in the form of associations between genes, proteins and diseases, but also validates the plausibility of such associations against high-throughput biological data (microarrays) and annotated databases. The main novelties of the proposed approach are that: 1) it infers associations between biological entities by mining full text papers instead of only abstracts as usually performed by the existing tools, 2) a named entity recognition that improves the precision of the derived associations by enriching the vocabularies used in the mining loop with terms extracted directly from the text and, 3) the inferred associations are filtered according to their evidence in experimental data. We tested the precision and the recall of our system in retrieving known-associations (which did not appear in the same document) from gold standards and the results shown the ability of BioWizard in retrieving valid associations, thus providing a valuable tool for the use of biomedical researchers to speed up scientific progress. © 2012 IEEE
2012
978-146732051-1
Knowledge discovery; data fusion; associations validation
File in questo prodotto:
File Dimensione Formato  
CBMS2012-biowizard.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Licenza: Non specificato
Dimensione 1.28 MB
Formato Adobe PDF
1.28 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/90307
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact