Computing efficiently the closeness of word sets in natural language texts

IRIS

We consider some search problems which have applications in statistical text analysis and natural language processing. Given two sets of words A and B, we propose a statistical, corpus-based measure of the ``closeness'' between A and B in texts. Our proposed measure involves the search, throughout a text corpus, of the words in A and B, under the restriction that these words should co-occur within a given maximum distance n. We address the problem of efficiently computing this closeness measure and present algorithms for it.

Computing efficiently the closeness of word sets in natural language texts

CANTONE, Domenico;Cristofaro S;PAPPALARDO, Giuseppe

2015-01-01

Abstract

We consider some search problems which have applications in statistical text analysis and natural language processing. Given two sets of words A and B, we propose a statistical, corpus-based measure of the ``closeness'' between A and B in texts. Our proposed measure involves the search, throughout a text corpus, of the words in A and B, under the restriction that these words should co-occur within a given maximum distance n. We address the problem of efficiently computing this closeness measure and present algorithms for it.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Parole chiave
	
				natural language processing; text search algorithms; closeness measure
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Computing-Efficiently-the-Closeness-of-Word-Sets.pdf solo gestori archivio Tipologia: Versione Editoriale (PDF) Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri	1.13 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/39921

Citazioni

ND

ND

ND

social impact