We consider some search problems which have applications in statistical text analysis and natural language processing. Given two sets of words A and B, we propose a statistical, corpus-based measure of the ``closeness'' between A and B in texts. Our proposed measure involves the search, throughout a text corpus, of the words in A and B, under the restriction that these words should co-occur within a given maximum distance n. We address the problem of efficiently computing this closeness measure and present algorithms for it.
Computing efficiently the closeness of word sets in natural language texts
CANTONE, Domenico;PAPPALARDO, Giuseppe
2015-01-01
Abstract
We consider some search problems which have applications in statistical text analysis and natural language processing. Given two sets of words A and B, we propose a statistical, corpus-based measure of the ``closeness'' between A and B in texts. Our proposed measure involves the search, throughout a text corpus, of the words in A and B, under the restriction that these words should co-occur within a given maximum distance n. We address the problem of efficiently computing this closeness measure and present algorithms for it.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
Computing-Efficiently-the-Closeness-of-Word-Sets.pdf
solo gestori archivio
Tipologia:
Versione Editoriale (PDF)
Dimensione
1.13 MB
Formato
Adobe PDF
|
1.13 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.