The CompWHoB Corpus: Computational Construction, Annotation and Linguistic Analysis of the White House Press Briefings Corpus

IRIS

The CompWHoB (Computational White House press Brieﬁngs) Corpus, currently being developed at the UniversityofNaplesFedericoII,isacorpusof spoken American English focusing on politicalandmediacommunication. ItrepresentsalargecollectionoftheWhiteHouse Press Brieﬁngs, namely, the daily meetings held by the White House Press Secretary and the news media. At the time of writing, the corpus amounts to more than 20 million words, covers a period of time of twenty-one years spanning from 1993 to 2014 and it is planned to be extended to the end of the second term of President Barack Obama. The aim of the present article is to describe the composition of the corpus and the techniques used to extract, process and annotate it. Moreover, attentionispaidtotheuseoftheTemporalRandomIndexing(TRI)onthecorpusasatool for linguistic analysis

Il CompWHoB Corpus, in sviluppo presso l’Universit`a di Napoli FedericoII, `euncorpusdiparlatoingleseamericano comprendente le conferenze condotte dai segretari statunitensi per i rapporti con la stampa, deﬁnite come Press Brieﬁngs. Allo stato attuale il corpus `e composto da pi`u di 20 milioni di parole e si estende dal 1993 sino a ﬁne 2014. L’obiettivo di questo articolo `e di descrivere la composizione del corpus, le tecniche utilizzate per estrarre ed annotare i testi, e mostrare come possa fungere da fonte di analisi linguistica attraverso l’utilizzo del Temporal Random Indexing (TRI).

The CompWHoB Corpus: Computational Construction, Annotation and Linguistic Analysis of the White House Press Briefings Corpus

VENUTI, MARCO;Esposito F;Basile P;Cutugno F.

2015-01-01

Abstract

The CompWHoB (Computational White House press Brieﬁngs) Corpus, currently being developed at the UniversityofNaplesFedericoII,isacorpusof spoken American English focusing on politicalandmediacommunication. ItrepresentsalargecollectionoftheWhiteHouse Press Brieﬁngs, namely, the daily meetings held by the White House Press Secretary and the news media. At the time of writing, the corpus amounts to more than 20 million words, covers a period of time of twenty-one years spanning from 1993 to 2014 and it is planned to be extended to the end of the second term of President Barack Obama. The aim of the present article is to describe the composition of the corpus and the techniques used to extract, process and annotate it. Moreover, attentionispaidtotheuseoftheTemporalRandomIndexing(TRI)onthecorpusasatool for linguistic analysis

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Codice ISBN
	
				9788899200626
			
	Breve descrizione dei contenuti (Abstract)
	
				Il CompWHoB Corpus, in sviluppo presso l’Universit`a di Napoli FedericoII, `euncorpusdiparlatoingleseamericano comprendente le conferenze condotte dai segretari statunitensi per i rapporti con la stampa, deﬁnite come Press Brieﬁngs. Allo stato attuale il corpus `e composto da pi`u di 20 milioni di parole e si estende dal 1993 sino a ﬁne 2014. L’obiettivo di questo articolo `e di descrivere la composizione del corpus, le tecniche utilizzate per estrarre ed annotare i testi, e mostrare come possa fungere da fonte di analisi linguistica attraverso l’utilizzo del Temporal Random Indexing (TRI).
			
	Parole chiave
	
				tEMPORAL RANDON INDEXING; White House press briefings; Corpus linguistics
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/78186

Citazioni

ND

ND

ND

social impact