Semaphobia 😨 would seem to be some malignant psychological syndrome, spreading out among software developer communities… It stems, allegedly 😏, from the distress of having to confront such daunting conceptualities as ontology or semiotic triangle, the foundational ideas of those semantics that many a journalist or… manager 😉 will bandy about, self-promotingly. Actual understanding of semantics per se would warrant taking a full course in linguistics and philosophy. How is it that, upon solving some very concrete problems in Information Technology, we come across concepts that originate in pre-Socratic philosophy?
The use of this portentous O word from greek philosophy is not a terminological coincidence : we are posing, in essence, the same fundamental questions as Parmenides and Plato: how can we properly define and denote an individual, contingent and transitory physical thing, in relationship to the supposedly more invariant categories, forms or “universals” that define it? Which has primacy over the other? These are NOT gratuitous intellectual ruminations, but day-to-day problems faced by “knowledge engineers” in all fields of information technology. We will attempt to show why and how.
From ontology to semiotics
As taken up by Information Sciences, the grandiose bi-millenial notion of ontology gets something of a facelift. Yet its meaning remains rooted in deeply philosophical ground (see sidebox). An ontology aims at formalizing the meaning of concepts used in a given “universe of discourse”, such as can be described or referred to by an information system where this level of “meta-knowledge” is generally implicit and informal, captured by e.g. the idiosyncratic choice of identifiers in a developer’s natural language. The aim of this formalization is to make the “sense” of the information conveyed “comprehensible” by a third party program and not only by a human user.
Trying to get slightly more concrete in this very abstract subject, we will draw all of our examples from the field of the Internet of Things. The knowledge that we try to capture and formalize is used to identify, describe and categorize physical objects through an IoT Platform. The notion of “physical object” is taken here in a very inclusive sense, including devices, things and groupings of these, but also subsets of space, humans or animals.
The formalization of this knowledge relies on the notion of semiotic triangle originally introduced by C. K. Ogden (The Meaning of Meaning” [1]).
In order to ascribe semantics to a physical object , we must unambiguously associate it to a supposedly well-known concept that will itself be formally defined through an ontology. If the “sign” (“signifier” in semiotics lingo) representing the concept (the signified) to which the object belongs has an analog relationship (resemblance) with an object (the signified), it is called an iconic sign, such as those used in road signs like 🚳, pictograms 🚰, and the simplest ideograms 山. Much more common is the use of symbolic signs for which the relationship between the signifier and the signified is arbitrary , and requires the previous knowledge of shared conventions (such as, ☢, ⛔, or those used by natural languages and alphabetic writing) to be understood. Symbolic systems are frequently cascaded, whereby the signified of a lower system (a Unicode character represented by a sequence of bytes, for example) gets composed into the signifiers of a higher-order system (a variable or keyword in a computer language).
In the context of knowledge engineering as currently advocated by the Semantic Web, such two-tiered (supposedly universal) symbols, URIs, (IRIs if they use Unicode rather than ASCII as their tier-1 symbolic system) are used to denote and identify resources that can be either individual physical objects (known as Non-Informational Resources, “NIRs”) or classes of such objects in a sense that gets formally defined by a domain ontology. They may also denote digital representations of these objects, which must be identified separately to distinguish them from the physical objects they represent. URIs can provide information about the identity of a resource in a naming system, as used by URNs, or, for an informational resource, its address in an information system, as used in the historical (and by now deprecated) sense of a Web URL. As per the unique name assumption, multiple URIs may refer to the same resource, while a URI cannot refer to different resources. This cross-referential property of URIs is key to the network effects that are at the heart of Linked Data.
The sign thus defined allows to associate any object to its relevant conceptual apparatus. This knowledge is formalized using so-called semantic languages, including RDF, RDFS and OWL, described in the next section. This conceptual representation should have the following properties:
- Accuracy and completeness aim to produce a precise, explicit and unambiguous representation.
- Formal definition (using mathematical languages derived from first order logics) of the model make the semantic description understandable by a software agent, which may itself respond to simple queries on the basis of the knowledge contained in the model.
- Collaborative definition by domain experts vouches for the validity of the model and ensures its shared use and dissemination in the community.
Ontological languages
Putting aside the third property requiring a major effort to reach the consensus within a technical community, the compliance with the first two properties is allowed by the use of the most expressive model of the spectrum of semantic models called ontology. The latter is described using so-called ontological languages. Three languages are recommended by the W3C and generally used complementarily:
- First, RDF allows to express simple facts in the form of triplets <subject, predicate, object> such as “this tree is a plant” or “the tree located in the park measures 8m”. In the previous triplets, the trees are the subjects, the verbs “to be” and “to measure” are predicates and the “plant” concept and “8m” are objects.
- RDFS introduces structuring axioms for semantic graphs. This language makes it possible to define hierarchies of classes and properties and adds the notion of domain (restricting the set of concepts that can be subject of a given predicate) and range (image of a function in the sense of pure mathematics, it constrains the set of concepts that can be the object of a given predicate) for properties.
- Finally, OWL is the most expressive ontological language. It allows to add additional semantic constraints such as cardinalities and properties features such as symmetry, transitivity, etc.
RDFS and OWL provide ontologists with an expressivity allowing to model an infinite range of domains of expertise, The classification of living beings in biology to the modeling of recipes of cooking and the analysis of scenes of crimes. Moreover, the semantics of these two languages are based on description logics (and, by extension, on first order logics). This offers the formal character sought to open the way to automatic reasoning.
A “Smart Cities” example
To illustrate the previous notions, we introduce a case of very concrete application of semantic interoperability ideas between two IoT platforms used in a city. In this scenario, a car park manager uses a specific ontology of his domain in the platform on which the management system is based. This system aims to optimize the movement of the vehicles in the different floors taking into account the occupancy of the car park spots. This ontology also allows to model the different gates allowing to leave the car park as illustrated in the following figure presenting a subset of the knowledge graph:
The ontology models a structural view of the car park containing knowledge on the building. It incorporates a hierarchy of classes to represent the car park (CarPark) and its access gates (Gate). An object property (associative relationship between two classes) is defined to represent the fact that a car park has an access door. A data property (relationship linking a concept to a primitive data type) is also used to specify the state of a door (open or closed). This ontology is then instantiated and two open access doors are created.
Federated view of the city
In order to improve the traffic conditions of the town, the city council has engaged the services of a company specialized in the optimization of urban traffic. This operator uses a platform that uses an ontology allowing to model in a formal and generic way the road network and the flows of vehicles circulating on the latter. This ontology, illustrated in the figure below, allows to represent different types of roads and their state (congested, free flowing, empty) as well as the notion of entry point to the road network (RoadAccess), for example, the entrances of the city.
To limit the risk of traffic jam at the car park exit, the traffic management operator platform must interoperate with that of the car park manager. The aim is to allow the traffic management algorithm to integrate information on outgoing traffic from the car park, but also to allow the car park manager to give instructions in order to direct the outgoing traffic to the gates leading to streets Where the traffic is fluid.
When implementing this use case, the operator is confronted with two independent ontologies in which the same physical object (the access to the car park is represented by two different concepts, namely the Gate class in the CarPark ontology and the RoadAccess class in the Traffic ontology). In order to carry out the use case, it is therefore necessary to use a technique of ontology mediation among the following:
- Fusion: build a new ontology by unifying the different ontologies in a single coherent ontology including all the concepts.
- Mapping: define mapping rules between the concepts defined collectively by the different concepts of ontology.
- Integration: construction of a new ontology integrating the only useful concepts of the ontologies.
Once this mediation work has been carried out, the unified view of all the knowledge related to car park and urban traffic allows the car park manager to propose to his clients recommendations for gates leading to non-congested streets. To do this, a federated query mechanism allows SPARQL semantic query language to ask a question on several knowledge graphs.
SELECT ?gate WHERE { ?gate rdf:type carPark:Gate . ?gate carPark:hasState "Open" . ?gate traffic:isAccessTo ?road . ?road traffic:hasStatus ?state. FILTER(?state IN ("Free-flowing","Empty")) }
The query proposed above (simplified for the purposes of the article) allows to get the gates currently open and overlooking a road on which the traffic is low or null. In our example, the instance representing the gate B of the Hoche car park is recommended.
Conclusion
16 years have elapsed since the foundational article by Tim Berners-Lee, yet the semantic referencing of Web contents has only just begun to take off through the use of a modest set of “vocabularies”, which do not even claim to be ontologies, under the aegis of the schema.org group led by (among others) Google and Microsoft. This pragmatic approach has, to some extent, succeeded where Sir Tim himself had failed, in overcoming the semaphobia of webmasters! However, this approach is limited in that the vocabularies of schema.org define, first and foremost, virtual or informational entities that remain withing the confines of the original web as a document system. In the (vast) domain of the Internet of Things that we have taken as an example here, it is necessary, in order to properly define physical objects from the real world,out there, to take stock of multiple models (be they properly ontological or semi-formal) defined by a multitude of communities of specialists in their respective fields. It is an ambitious and long-term task for which top-down ontological modeling will have to get complemented by the bottom-up approach of linked data presented in a previous article in order to cross-reference and cross-fertilize information from different systems whose respective ontologies offer only a very partial view of a shared world. In a new take on the parable of the blind men and the elephant, each of the blind men’s viewpoints (the operators of specialized platforms playing th role of the blind men in our example!) must be jointly taken into account, just as they are in fact indispensable at their own level and in their own field. As the notion of a single universal platform with an all-embracing ontology is illusory in the domain of the Internet of Things (a much more fragmented and heterogeneous domain than social networks…), one must also give up on the idea that there could be a single all-encompassing vantage point (that of the sighted observer from the parable…) subsuming the others.
Hereupon we leave readers to their own philosophical musings 🤔.
More info:
Ontology 101
No field of study could be more thanklessly fundamental, or more chimerical, than ontology (science of being, etymologically). Some of the primary questions that have run their course throughout the history of philosophy are, in a proper sense, ontological. The archetypal chasm between materialism and idealism divides two distinct ontologies giving primacy, respectively to the material (as in pre-Socratic philosophy) or to the ideal (as with Plato) in defining the basal essence of being. Materialism has come a long way from such pre-Socratic ontologies as the four elements (water, earth, air and fire) of Empedocles. More relevant to scientific debate would be physicalism, as an avatar of materialism endorsed by a few reductionist physicists, for whom biology, but also social sciences, would be mere chapters of Physics … Dualist ontologies (from, e.g. , Descartes) would give an equivalent status to the material and the ideal. By a rather intriguing twist of the history of ideas, Information Sciences, may appear to rekindle this old dualism, wherein the spiritual realm of former dualisms would have morphed into the virtual-informational. Information Science subjugates physicalist reductionism in its own way, by playing the role of “meta-science” for both physics and biology, where the former role of conceptual “Forms” or “Universals” from early philosophy gets ascribed to Information as the overarching organizing principle of both physical systems (from 1950s cybernetics to complex systems of the 1990s) and life iself (from DNA to epigenetics)…
Related references:
[1] Ogden, C. K., Richards, I. A., Ranulf, S., & Cassirer, E. (1923). The Meaning of Meaning. A Study of the Influence of Language upon Thought and of the Science of Symbolism.
[2] Gruber, T., 1993. A translation approach to portable ontology specifications. Knowledge acquisition.
[3] Cyganiak, R., Wood, D. & Lanthaler, M., 2014. RDF 1.1 Concepts and Abstract Syntax.
[4] Brickley, D. & Guha, R.V., 2000. Resource Description Framework (RDF) Schema Specification 1.0.
[5] McGuiness, D. & Van Harmelen, F., 2004. OWL Web Ontology Language.