Skip to main contentSkip to page footer

 |  Blog

Machine-readable data via Semantic Web as a basis for analyses and standardization

As data sets continue to grow, there is an increasing need to add annotations and context to our data in order to create added value and enable complex analysis by machines. This is especially true for highly interconnected systems such as the internet. Machine readability can be achieved on the internet through semantic annotations, providing context for certain terms. The result is then referred to as the "semantic web".

For example, the term "Einstein" has no meaning on its own. Is it a spelling mistake or is it the famous physicist? A machine would have difficulty finding this out for itself. This problem is solved in the Semantic Web by adding further information to the term used. This can be done through references to external knowledge or through categorization via types.

So-called "ontologies" exist to solve precisely this problem. Ontologies make it possible to map the properties of a subject area and their relationships to each other. To a certain extent, ontologies are similar to a class system as we know it from programming languages. There are already many ready-made ontologies for describing knowledge, such as FOAF (Friend of a Friend). With FOAF it is possible to publish information such as age, name, e-mail etc. of people and to link known people. As soon as several people have published their FOAF documents, machines can analyze the social relationships fully automatically.

WikiData is the central storage location for all Wikimedia project data, including Wikipedia. References to WikiData or other graphs are possible without any problems, as all instances can be accessed online at any time via a unique IRI (Internationalized Resource Identifier). IRIs are nothing more than web links with special characters. A small section of the WikiData graph can be seen in the example graphic below. The very cryptic-looking IRIs describe Albert Einstein (wd:Q937) and the city of Ulm (wd:Q3012), among others. The "wd:" is the abbreviation for the WikiData URL and the part after the colon is the identification number.

Imagine the following theoretical scenario for the example below: We have implemented our data management in the form of a graph. This enables us to add new knowledge at will and link it to existing knowledge without changing the underlying data structure. We can also connect external data sources such as WikiData to enrich our own data. The semantic web also greatly simplifies analyses of the entire database with the help of AI or complex algorithms.

The categories "Person", "Human" and "City" are shown in blue. Instances of those classes are shown in gray and the values in white. The example shows the following: The instance "_:Einstein" is categorized as "foaf:Person" and has the name "Albert Einstein" in German. There is also an identical instance in the external WikiData graph "wd:Q937", which is categorized as "person". The birthplace of Albert Einstein is set to the "major city" Ulm via the relationship "wdt:P19".

 

There can be a large number of such references between graphs without there being a specific direction. WikiData can also refer to knowledge from other graphs or websites - provided these are accessible online. By linking the graphs, a single "Giant Global Graph" is created. This vision of Tim Berners-Lee, the inventor of the World Wide Web, aims to link the entire Internet.

In addition to the simple examples shown above, more complex issues can also be modeled. To name a few examples: Relational databases, FOAF (social networks) or SKOS (concept definitions).

Our team of experts will be happy to advise you on the topic of the Semantic Web and support you with your data-driven project. Just get in touch with us.

About the author

 

As a software developer at M&M Software, Pierre Bienert works extensively with graph databases and the semantic web.  This know-how enables him to extract new knowledge from existing data and manifest it in enterprise knowledge graphs.

Created by