We propose a simple and natural linear randomized algorithm for the approximate 1-median selection problem in metric spaces. The 1-median of a finite subset S of a metric space is the element of S which minimizes the average distance from the remaining points in S. This problem is extremely important in most applications using clustering of metric spaces, but also in connection with several algorithms in bioinformatics. The only linear approximation algorithm for the 1-median problem, which provably works in any metric space without going through any Euclidean space, has been proposed by Indyk in {[}Proceedings of the 31st Annual ACM Symposium on Theory of Computing, Atlanta, 1999, pp. 428-432]. However, Indyk's algorithm, which is based on sufficiently large sampling, turns out not to be a practical solution. The same holds true even for its heuristic variants which use samplings of smaller size. The algorithm we propose has a simple and efficient implementation, which performs better than Indyk's algorithm in practice. On the other hand, while the performance of Indyk's algorithm is guaranteed by an approximation factor, in the case of our algorithm we are only able to produce experimental evidence of its precision. Extensive experimentation has been performed on both synthetic and real input datasets. Synthetic datasets were generated with uniform and skewed distributions, using various metrics. Real datasets have been extrapolated from real world official databases available on the web. Successful results of the proposed algorithm are reported for several applications in bioinformatics and various classes of approximate search queries.

We propose a simple and natural linear randomized algorithm for the approximate 1-median selection problem in metric spaces. The 1-median of a finite subset S of a metric space is the element of S which minimizes the average distance from the remaining points in S. This problem is extremely important in most applications using clustering of metric spaces, but also in connection with several algorithms in bioinformatics. The only linear approximation algorithm for the 1-median problem, which provably works in any metric space without going through any Euclidean space, has been proposed by Indyk in [Proceedings of the 31st Annual ACM Symposium on Theory of Computing, Atlanta, 1999, pp. 428–432]. However, Indyk’s algorithm, which is based on sufficiently large sampling, turns out not to be a practical solution. The same holds true even for its heuristic variants which use samplings of smaller size. The algorithm we propose has a simple and efficient implementation, which performs better than Indyk’s algorithm in practice. On the other hand, while the performance of Indyk’s algorithm is guaranteed by an approximation factor, in the case of our algorithm we are only able to produce experimental evidence of its precision. Extensive experimentation has been performed on both synthetic and real input datasets. Synthetic datasets were generated with uniform and skewed distributions, using various metrics. Real datasets have been extrapolated from real world official databases available on the web. Successful results of the proposed algorithm are reported for several applications in bioinformatics and various classes of approximate search queries.

An efficient approximate algorithm for the 1-median problem in metric spaces

CANTONE, Domenico;CINCOTTI, Gianluca;FERRO, Alfredo;PULVIRENTI, ALFREDO
2005-01-01

Abstract

We propose a simple and natural linear randomized algorithm for the approximate 1-median selection problem in metric spaces. The 1-median of a finite subset S of a metric space is the element of S which minimizes the average distance from the remaining points in S. This problem is extremely important in most applications using clustering of metric spaces, but also in connection with several algorithms in bioinformatics. The only linear approximation algorithm for the 1-median problem, which provably works in any metric space without going through any Euclidean space, has been proposed by Indyk in {[}Proceedings of the 31st Annual ACM Symposium on Theory of Computing, Atlanta, 1999, pp. 428-432]. However, Indyk's algorithm, which is based on sufficiently large sampling, turns out not to be a practical solution. The same holds true even for its heuristic variants which use samplings of smaller size. The algorithm we propose has a simple and efficient implementation, which performs better than Indyk's algorithm in practice. On the other hand, while the performance of Indyk's algorithm is guaranteed by an approximation factor, in the case of our algorithm we are only able to produce experimental evidence of its precision. Extensive experimentation has been performed on both synthetic and real input datasets. Synthetic datasets were generated with uniform and skewed distributions, using various metrics. Real datasets have been extrapolated from real world official databases available on the web. Successful results of the proposed algorithm are reported for several applications in bioinformatics and various classes of approximate search queries.
2005
We propose a simple and natural linear randomized algorithm for the approximate 1-median selection problem in metric spaces. The 1-median of a finite subset S of a metric space is the element of S which minimizes the average distance from the remaining points in S. This problem is extremely important in most applications using clustering of metric spaces, but also in connection with several algorithms in bioinformatics. The only linear approximation algorithm for the 1-median problem, which provably works in any metric space without going through any Euclidean space, has been proposed by Indyk in [Proceedings of the 31st Annual ACM Symposium on Theory of Computing, Atlanta, 1999, pp. 428–432]. However, Indyk’s algorithm, which is based on sufficiently large sampling, turns out not to be a practical solution. The same holds true even for its heuristic variants which use samplings of smaller size. The algorithm we propose has a simple and efficient implementation, which performs better than Indyk’s algorithm in practice. On the other hand, while the performance of Indyk’s algorithm is guaranteed by an approximation factor, in the case of our algorithm we are only able to produce experimental evidence of its precision. Extensive experimentation has been performed on both synthetic and real input datasets. Synthetic datasets were generated with uniform and skewed distributions, using various metrics. Real datasets have been extrapolated from real world official databases available on the web. Successful results of the proposed algorithm are reported for several applications in bioinformatics and various classes of approximate search queries.
1-median problem; clustroid selection; algorithms for metric spaces; selection algorithms; approximation algorithms; randomized algorithms; Fermat-Weber problem
File in questo prodotto:
File Dimensione Formato  
1median.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Dimensione 259.27 kB
Formato Adobe PDF
259.27 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/7309
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 6
social impact