Textual data still remains the main format for storing information, justifying why text processing is among the most relevant topics in computer science. However, despite capability to store information is growing fast, the amount and complexity of textual data grows faster than storage capacities. In many cases, the problem is not due to the size or complexity of the text, but rather to the representations (or data structure) employed for carrying out the needed processing. In this paper, we show the potentiality and the benefits of a straightforward text representation, referred to as the Bit-Layers text representation, which turns out to be particularly suitable for fast text searching, while still retaining the standard efficiency in the rest of text processing basic tasks. To show the advantages of the Bit-Layers representation, we also present a family of simple algorithms, tuned to it, for solving some classical and non-classical string-matching problems. Such algorithms turn out to be particularly suitable for implementation in modern hard- ware, and very fast in practice. Preliminary experimental results show that in some cases these algorithms are by far faster than their counter- parts based on the standard text representation.

Bit-Layers Text Encoding for Efficient Text Processing

Cantone Domenico;Faro Simone
;
Scafiti Stefano
2020-01-01

Abstract

Textual data still remains the main format for storing information, justifying why text processing is among the most relevant topics in computer science. However, despite capability to store information is growing fast, the amount and complexity of textual data grows faster than storage capacities. In many cases, the problem is not due to the size or complexity of the text, but rather to the representations (or data structure) employed for carrying out the needed processing. In this paper, we show the potentiality and the benefits of a straightforward text representation, referred to as the Bit-Layers text representation, which turns out to be particularly suitable for fast text searching, while still retaining the standard efficiency in the rest of text processing basic tasks. To show the advantages of the Bit-Layers representation, we also present a family of simple algorithms, tuned to it, for solving some classical and non-classical string-matching problems. Such algorithms turn out to be particularly suitable for implementation in modern hard- ware, and very fast in practice. Preliminary experimental results show that in some cases these algorithms are by far faster than their counter- parts based on the standard text representation.
2020
experimental algorithms
Text processing
representation
File in questo prodotto:
File Dimensione Formato  
Cantone2020-BitLayers-paper2.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 1.74 MB
Formato Adobe PDF
1.74 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/497295
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact