One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructed at different levels, structuring a single pages or a whole site or group of sites. Here we present an approach to give a logical schema to a website, first defining a model for a single page, where its contents is divided into "logical" sections, i.e. parts of a page each collecting related information. Then, we introduce a site model in which both physical and logical links among different page sections are represented: physical are existing hyperlinks, while logical links are links between sections containing semantically related information. We show how such links can be found and classified according to their relevance, also showing how schema is used in a structure-aware browser to improve both browsing and searching.

Extracting logical schema from the web

CARCHIOLO, Vincenza;LONGHEU, ALESSANDRO;MALGERI, Michele Giuseppe
2003-01-01

Abstract

One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructed at different levels, structuring a single pages or a whole site or group of sites. Here we present an approach to give a logical schema to a website, first defining a model for a single page, where its contents is divided into "logical" sections, i.e. parts of a page each collecting related information. Then, we introduce a site model in which both physical and logical links among different page sections are represented: physical are existing hyperlinks, while logical links are links between sections containing semantically related information. We show how such links can be found and classified according to their relevance, also showing how schema is used in a structure-aware browser to improve both browsing and searching.
2003
extraction; World wide web; Semistructured data
File in questo prodotto:
File Dimensione Formato  
Extracting logical schema from the web.pdf

accesso aperto

Tipologia: Documento in Pre-print
Dimensione 1.06 MB
Formato Adobe PDF
1.06 MB Adobe PDF Visualizza/Apri
Extracting logical schema from the web.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Dimensione 1.67 MB
Formato Adobe PDF
1.67 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/1286
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
social impact