Data Integration Solutions

 

The Data Integration Challenge

The majority of dynamic Web content is still backed by relational databases (RDB), and so are many enterprise systems. On the other hand, in order to expose structured data on the Web, Resource Description Framework (RDF) is used.

The Web of Data is constantly growing due to its compelling potential of facilitating data integration and retrieval. At the same time however, RDB systems host a vast amount of structured data in relational tables augmented with integrity constraints. In order to make this huge amount of relational data available, a connection must be established between RDBs and a format suitable for the Web of Data. The advantages of creating an RDF view of relational data are inherited from the Web of Data and can be summarized based on the tasks they facilitate:

  • Integration: data in different RDBs can be integrated using RDF semantics and mechanisms; in this sense, the Web of Data can be imagined as one big database. Moreover, information in the database can be integrated with information that comes from other data sources.
  • Retrieval: once data are published in the Web of Data (as opposed to relational databases), queries can span different data sources and more powerful retrieval methods can be built.

The Web of data is a scalable environment with explicit semantics where not only humans can navigate information, but also machines are able to find connections and use them to navigate through the information space. The most common way to publish resources in the Web of Data follows the RDF model and uses Uniform Resource Identifiers for resource identification, thereby facilitating the creation of a comprehensive and flexible resource description.

The consumer of the RDF Graph (virtual or materialized) essentially can access the RDF data in different ways:
  • Query access, which means the agent issues a SPARQL query against an endpoint exposed by the system and processes the results (typically the result is a SPARQL result set in XML or JSON);
  • Entity-level access, which means the agent performs an HTTP GET on a URI exposed by the system and processes the result (typically the result is an RDF graph);
  • Dump access, which means the agent performs an HTTP GET on dump of the entire RDF graph, for example in Extract, Transform, and Load (ETL) processes.