We consider some search problems which have applications in statistical text analysis and natural language processing. Given two sets of words A and B, we propose a statistical, corpus-based measure of the ``closeness'' between A and B in texts. Our proposed measure involves the search, throughout a text corpus, of the words in A and B, under the restriction that these words should co-occur within a given maximum distance n. We address the problem of efficiently computing this closeness measure and present algorithms for it.

Computing efficiently the closeness of word sets in natural language texts

CANTONE, Domenico;PAPPALARDO, Giuseppe
2015-01-01

Abstract

We consider some search problems which have applications in statistical text analysis and natural language processing. Given two sets of words A and B, we propose a statistical, corpus-based measure of the ``closeness'' between A and B in texts. Our proposed measure involves the search, throughout a text corpus, of the words in A and B, under the restriction that these words should co-occur within a given maximum distance n. We address the problem of efficiently computing this closeness measure and present algorithms for it.
2015
natural language processing; text search algorithms; closeness measure
File in questo prodotto:
File Dimensione Formato  
Computing-Efficiently-the-Closeness-of-Word-Sets.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/39921
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact