Similarity is a vague concept which can be treated in aquantitative manner only using appropriate mathematicalrepresentation of the objects to compare and a metric onthe space representation. In biology the mathematical representationof structure relies on strings taken from an alphabetof m symbols. Very often binary strings, m = 2, areused. The size of the binary string depends on the complexityof the structure to represent, so the string can be quitelong. The Hamming distance is the most used metric withbinary strings. The computational effort required to computethe Hamming distance linearly depends on the size ofthe string. However even a linear effort case may be computationalheavy if many computations are required. One ofthe fastest computational approach to evaluate Hammingdistances relies on look-up tables. The computational performance,however, rapidly deteriorates with the size of binarystring length, due to cache misses. We present a computationalstrategy and implementation which can handlehuge number of Hamming distance evaluation between binarystrings of arbitrary length keeping computational performancecompetitive.
HAMFAST: Fast hamming distance computation
PAPPALARDO, FRANCESCO;PENNISI, MARZIO ALFIO;MOTTA, Santo
2009-01-01
Abstract
Similarity is a vague concept which can be treated in aquantitative manner only using appropriate mathematicalrepresentation of the objects to compare and a metric onthe space representation. In biology the mathematical representationof structure relies on strings taken from an alphabetof m symbols. Very often binary strings, m = 2, areused. The size of the binary string depends on the complexityof the structure to represent, so the string can be quitelong. The Hamming distance is the most used metric withbinary strings. The computational effort required to computethe Hamming distance linearly depends on the size ofthe string. However even a linear effort case may be computationalheavy if many computations are required. One ofthe fastest computational approach to evaluate Hammingdistances relies on look-up tables. The computational performance,however, rapidly deteriorates with the size of binarystring length, due to cache misses. We present a computationalstrategy and implementation which can handlehuge number of Hamming distance evaluation between binarystrings of arbitrary length keeping computational performancecompetitive.File | Dimensione | Formato | |
---|---|---|---|
HAMFAST.pdf
solo gestori archivio
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
192.99 kB
Formato
Adobe PDF
|
192.99 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.