Similarity is a vague concept which can be treated in aquantitative manner only using appropriate mathematicalrepresentation of the objects to compare and a metric onthe space representation. In biology the mathematical representationof structure relies on strings taken from an alphabetof m symbols. Very often binary strings, m = 2, areused. The size of the binary string depends on the complexityof the structure to represent, so the string can be quitelong. The Hamming distance is the most used metric withbinary strings. The computational effort required to computethe Hamming distance linearly depends on the size ofthe string. However even a linear effort case may be computationalheavy if many computations are required. One ofthe fastest computational approach to evaluate Hammingdistances relies on look-up tables. The computational performance,however, rapidly deteriorates with the size of binarystring length, due to cache misses. We present a computationalstrategy and implementation which can handlehuge number of Hamming distance evaluation between binarystrings of arbitrary length keeping computational performancecompetitive.

HAMFAST: Fast hamming distance computation

PAPPALARDO, FRANCESCO;PENNISI, MARZIO ALFIO;MOTTA, Santo
2009-01-01

Abstract

Similarity is a vague concept which can be treated in aquantitative manner only using appropriate mathematicalrepresentation of the objects to compare and a metric onthe space representation. In biology the mathematical representationof structure relies on strings taken from an alphabetof m symbols. Very often binary strings, m = 2, areused. The size of the binary string depends on the complexityof the structure to represent, so the string can be quitelong. The Hamming distance is the most used metric withbinary strings. The computational effort required to computethe Hamming distance linearly depends on the size ofthe string. However even a linear effort case may be computationalheavy if many computations are required. One ofthe fastest computational approach to evaluate Hammingdistances relies on look-up tables. The computational performance,however, rapidly deteriorates with the size of binarystring length, due to cache misses. We present a computationalstrategy and implementation which can handlehuge number of Hamming distance evaluation between binarystrings of arbitrary length keeping computational performancecompetitive.
2009
9780769535074
Mathematical representations; Hamming distance; Computational performance
File in questo prodotto:
File Dimensione Formato  
HAMFAST.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 192.99 kB
Formato Adobe PDF
192.99 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/76143
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact