Linghub was originally started by McCrae and Cimiano [1]. This updated version was released in 2022, and developed as part of the H2020 Prêt-à-LLOD project. Linghub is a portal that makes available metadata of language resources (including services) from a range of sources and repositories. It uses Linked Data (LD) principles to expose all the metadata under a common interface, and provides faceted browsing and a SPARQL endpoint. You can find more information about the platform in [1]. We give in this documentation some essential information to navigate Linghub and search for the appropriate data.

This new version of Linghub covers resources from major repositories across Europe and the world including old.dataHub, and in particular language resource repositories including LRE Map, META-SHARE, CLARIN, OLAC, and new datasets such as Annohub [2], iLOD [3] and a list of Teanga compatible services. Each of them make up a "Collection" in Linghub, and searches can be narrowed down to any of them.

Linhub provides a multilingual cross-repository data search facility. Here are different ways the data can be search and explored.

Discover

It is possible for the user to simply browse and discover data according to a specific metadata field, using the right sidebar which displays a selection of metadata that can be explored, such as specific author, subject, language (iso), odrl policy, or type.

Search / Advanced search

In the top right corner of the home page, a search bar is available for free test search over the whole data. When clicking on the search button, the search page appears and an "advanced option" link shows up under the search bar. Here it is possible to specify more advanced

SPARQL query

The data can be queried through a SPARQL query, using the provided endpoint. Here is an example query: http://140.203.155.44:8001/dspace/sparql?query=SELECT+*+WHERE+{+?s+?p+?o+.+}+LIMIT+20&output=text

Different "ticks" and "crosses" are displayed on some metadata of the individual resources.
At the top of the individual resource page, a "tick" is displayed if the metadata schemas and fields are using recognised standards linked data vocabularies only, and a cross if non standard vocabularies are present.
When a language has been identified from the original metadata of the resource, we add an extra field using the dc.language.iso field to normalise the language to the standard ISO 639-3 code, and add a "tick".
In the metadata fields related to licenses and policies, if the license is expressed in a correct way (ie. mentioning an established license with version number), a tick is displayed and an extra field odrl.Policy is created with a link to the representaition of the license using the ODRL language. If the license referred to is not established or is incomplete, a cross will be displayed.

Linghub provides conversions of identifiable licenses using the Open Digital Rights Language (ODRL). When such license is identified, a field odrl.Policy is added in the metadata of the resource. Learn more about the ORDL model here. When such policy is identified, a form will appear at the bottom of the page, allowing the user to query the system on whether or not they are allowed to use the resource, depending on the nature of the work using this resource, and the status of the user (academia, industry).

If you have any query or would like to contribute with your own resources, please contact us at linghub-nuig-at-insight-centre.org

1. McCrae, J.P., Cimiano, P. (2015) Linghub: a Linked Data based portal supporting the discovery of language resources. SEMANTiCS.

2. Abromeit F., Fäth C., Glaser L. (2020) Annohub – Annotation Metadata for Linked Data Applications, In proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020), online

3. Nasir, J. A. and McCrae, J. P. (2020). iLOD: InterPlanetary File System based Linked Open Data Cloud. Proceedings of MEPDaW20 - Managing the Evolution and Preservation of the Data Web, ISWC 2020,Nov.


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.