Using identifierSpace when the entity set uses a mixture of namespaces

There has been some discussion on `identifierSpace` and `schemaSpace` before, e.g. in issue #3 and PR #76. The definitions of these have shifted over time. The [current definition](https://www.w3.org/community/reports/reconciliation/CG-FINAL-specs-0.2-20230410/#identifier-and-schema-spaces), in both the latest draft spec and version 0.2, of `identifierSpace` is:

> identifier space
>    The URI namespace (i.e. prefix) for the identifiers of an [entity](https://reconciliation-api.github.io/specs/draft/#dfn-entity) returned by the reconciliation service, for example http://www.wikidata.org/entity/ or https://d-nb.info/gnd/. This URI MAY resolve to a page describing these entities and their identifiers;

We are currently implementing reconciliation API support for Annif (see https://github.com/NatLibFi/Annif/pull/734) and providing the identifierSpace information has caused some headache. Returning the service manifest is mandatory, and also the identifierSpace information is mandatory within the manifest: "A reconciliation service MUST define two URIs [...] identifierSpace ... schemaSpace"

Service manifest [Example 1](https://www.w3.org/community/reports/reconciliation/CG-FINAL-specs-0.2-20230410/#example-1) given in the spec uses this identifierSpace:

> "identifierSpace": "http://vocab.getty.edu/doc/#GVP_URLs_and_Prefixes",

(FWIW, I would like to point out that this doesn't seem to match well with the definition - IIRC this is not the URI namespace prefix for any Getty vocabulary, but a URI/URL of a web page explaining them. But that is a separate problem, maybe the example is just outdated.)

Annif uses SKOS vocabularies internally and often those vocabularies use a specific URI namespace; in my understanding, this would be the natural value for identifierSpace. But Annif is currently unaware of this namespace, and there is nothing in principle preventing a vocabulary from using a mixture of namespaces. For example, a vocabulary could consist of a mixture of Wikidata and GND entities. A perhaps more realistic example would be a mixture of YSO concepts and those of a domain-specific extension vocabulary such as KAUNO (fiction literature), JUHO (public administration) or TERO (health and welfare), all of which are extensions of YSO - you can think of them naively as additional concepts to add on top of YSO - that use their own URI namespace which is different from YSO.

So what should Annif return in the service manifest for a project that uses a vocabulary whose URI namespace it isn't aware of? Should it look at all the concept URIs and try to infer what is the longest common prefix? What if the URIs are a mixture of namespaces and there is nothing in common - say, a mixture of http and https URIs?

Or should the value be something more custom (somewhat like the Getty document in the example) that isn't really a URI namespace at all, but is unique to the vocabulary / entity set? For example, the reconciliation service at /rest/v1/projects/myproject/reconcile could return an identifierSpace of /rest/v1/vocabs/myvocab (i.e. the vocabulary used by myproject). That doesn't seem to match the current definition of identifierSpace, as it talks specifically about URI namespace prefixes, but would at least be a shared identifier that could also be referenced by other endpoints at the same Annif instance which use the same underlying vocabulary.

Or is it OK to return an identifierSpace of `""` (the current quick-and-dirty solution in the Annif draft PR) since it seems to work fine with OpenRefine - apparently this information is not used at all. Maybe providing identifierSpace shouldn't be a MUST in the spec, if it's actually not used by the main client tool that this API is targeting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using identifierSpace when the entity set uses a mixture of namespaces #139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using identifierSpace when the entity set uses a mixture of namespaces #139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions