HAMFAST: Fast hamming distance computation

IRIS

Similarity is a vague concept which can be treated in aquantitative manner only using appropriate mathematicalrepresentation of the objects to compare and a metric onthe space representation. In biology the mathematical representationof structure relies on strings taken from an alphabetof m symbols. Very often binary strings, m = 2, areused. The size of the binary string depends on the complexityof the structure to represent, so the string can be quitelong. The Hamming distance is the most used metric withbinary strings. The computational effort required to computethe Hamming distance linearly depends on the size ofthe string. However even a linear effort case may be computationalheavy if many computations are required. One ofthe fastest computational approach to evaluate Hammingdistances relies on look-up tables. The computational performance,however, rapidly deteriorates with the size of binarystring length, due to cache misses. We present a computationalstrategy and implementation which can handlehuge number of Hamming distance evaluation between binarystrings of arbitrary length keeping computational performancecompetitive.

HAMFAST: Fast hamming distance computation

PAPPALARDO, FRANCESCO;Calonaci C;PENNISI, MARZIO ALFIO;Mastriani E;MOTTA, Santo

2009-01-01

Abstract

Similarity is a vague concept which can be treated in aquantitative manner only using appropriate mathematicalrepresentation of the objects to compare and a metric onthe space representation. In biology the mathematical representationof structure relies on strings taken from an alphabetof m symbols. Very often binary strings, m = 2, areused. The size of the binary string depends on the complexityof the structure to represent, so the string can be quitelong. The Hamming distance is the most used metric withbinary strings. The computational effort required to computethe Hamming distance linearly depends on the size ofthe string. However even a linear effort case may be computationalheavy if many computations are required. One ofthe fastest computational approach to evaluate Hammingdistances relies on look-up tables. The computational performance,however, rapidly deteriorates with the size of binarystring length, due to cache misses. We present a computationalstrategy and implementation which can handlehuge number of Hamming distance evaluation between binarystrings of arbitrary length keeping computational performancecompetitive.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2009
			
	Codice ISBN
	
				9780769535074
			
	Parole chiave
	
				Mathematical representations; Hamming distance; Computational performance
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
HAMFAST.pdf solo gestori archivio Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 192.99 kB Formato Adobe PDF Visualizza/Apri	192.99 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/76143

Citazioni

ND

8

ND

social impact