Semantics.gr Use Cases

In this section we present you with some examples about how we have been hosting, curating and using the first vocabularies we developed in EKT in order to document and enrich our content services and repositories.

Enriching the aggregator infrastructures OpenArchives.gr and SearchCulture.gr

SearchCulture.gr is the national cultural data aggregator and OpenArchives.gr is the biggest scientific data aggregator for Greek cultural and scientific content respectively. Both infrastructures harvest metadata and thumbnails from the providers’ distributed repositories using the OAI-PMH protocol. Both infrastructures provide a public portal offering unified search and access to the digitised resources. The aggregation workflow includes data validation, transformation of the original metadata to the target model used of the respective infrastructure, metadata enrichment and publishing as Linked Open Data.

In order to enrich the content aggregated by the two infrastructures we used Semantics in a two-step process.

Firstly, we developed the following 5 vocabularies:

  • Cultural heritage item types vocabulary: A SKOS-based original vocabulary using the skos:Concept class to describe different types of cultural artifacts. It is hierarchical, bilingual and the majority of the terms are linked to the Getty Art and Architecture Thesaurus via the skos:exactMatch attribute. The particular vocabulary is being used to enrich the SearchCulture.gr collections as per the item types.
  • UNESCO Thesaurus (EKT version): Vocabulary adapted from the UNESCO thesaurus. We followed the original hierarchical thesaurus structure whose concepts are grouped in 7 broad thematic areas. The UNESCO thesaurus is compliant with the ISO 25964 standard. For the EKT version, 1387 terms were selected that are particularly suitable for the SearchCulture.gr collections’ contents. The output vocabulary is compliant with SKOS, hierarchical, bilingual and is entirely linked with the UNESCO thesaurus via the skos:exactMatch attribute. The Vocabulary was used to enrich SearchCulture.gr items as per the subjects.
  • Thematic tags: A collection of terms that cover different historical, geographical and thematic aspects and other particularities pertaining to the contents that are included in SearchCulture.gr which are not already well represented in the UNESCO thesaurus. The majority of the terms derive from the EKT Greek Terms Thesaurus. The produced vocabulary is also SKOS compliant, hierarchical, bilingual and is linked to the UNESCO thesaurus- EKT version via the skos:broaderMatch attribute. The vocabulary has been used to enrich SearchCulture.gr contents as per the subjects in addition to the UNESCO Thesaurus- EKT version.
  • Greek historical periods: A vocabulary constructed according to the semantic class edm:Timespan of Europeana’s EDM model. It contains 94 terms that cover Greek history from 8.000 BC to today. It is hierarchical, bilingual, it covers the greek territory and some values in the 3rd and 4th level correspond to individual civilisations. The vocabulary is used to enrich the contents in SearchCulture.gr as per the historical periods.
  • Secondly, in order to serve the enrichment processes, an original and particularly user-friendly mapping tool for the semi-automatic semantic enrichment of the metadata with terms from vocabularies that are published on the platform was developed in Semantics.gr. Using this tool, EKT scientific staff is able to normalise, homogenise and enrich the metadata that are being aggregated in SearchCulture.gr and OpenArchives.gr.

EKT Subjects across Scientific Disciplines

A vocabulary based on the OECD FORD Research and Development classification fields (OECD 2015). It follows the FORD classification with regards to the 6 1st level broad thematic areas and 42 2nd level thematic areas. EKT staff processed the 2nd level thematic areas with the aim to create a 3rd finegrained level. The resulting SKOS vocabulary comprises 474 unique bilingual subject terms covering the main areas of Science, Technology & Development. The terms are classified by hierarchical relationships (broader / narrower) while semantic link relations to external open resources are attributed via the exact match, close match, related to attributes. The enrichment with new terms and links is ongoing.

The vocabulary is being used for the enrichment of the National PhD Archive repository, retrospectively via EKT staff and via self-archiving by the PhD candidates that choose the subject areas that their dissertation relates to.

The vocabulary will be further used in other aggregation and repository services of EKT, such as for enriching the Scientific data aggregator OpenArchives.gr.

Person and Corporate Body vocabularies for EKT’s scientific infrastructures

Two more pivotal vocabularies have been created by EKT staff in Semantics.gr: one for natural persons and one for corporate bodies. These vocabularies are currently being used in the documentation and enrichment processes of EKT’s scientific repositories (National PhD Archives’ repository, the OA scientific ePublishing platform, new EKT institutional repository). The two broad vocabularies “Persons” and “Corporate Bodies” can further be elaborated and indexed, based on their attributes to different groupings of “academic institutions”, “PhD holders”, etc.

The vocabularies are being hosted and managed in Semantics.gr. Semantics.gr interoperates with the distributed repositories and in particular with their documentation environments, where the scientific resources are being catalogued. Individual fields such as creator, contributor, editor in the cataloguing forms draw controlled values and are populated with data from the vocabularies in Semantics.gr in real time. In parallel, EKT staff uses the semantic enrichment tools for the mass, semi-automatic retrospective documentation and the homogenisation of the contents of the respective infrastructures.

There are several benefits in the process followed. Each person and corporate body entity created in Semantics.gr, is attributed a unique URI which is then used uniformly to describe the particular entity across all EKT’s digital repositories. All information gathered in this central pool can be easily managed and updated in one place and is automatically synchronised across all repositories. Quality of data is enhanced and can be better measured. In addition, an entity is linked with all information relevant to them, that lives in different repositories (e.g. a person’s PhD thesis and her articles in an online journal hosted in EKT’s ePublishing platform).

The application profiles for the creation of the above-mentioned vocabularies have been modeled on MADS/RDF. Academic researchers make the first list of natural persons that are included in these vocabularies and the vocabulary is being used for in the cataloguing process of EKT’s new institutional repository.