Implementation

The following gives best practise and normative requirements for how to implement SciMesh at an institution, in particular, in an ELN or samples database.

Workflow

_images/workflow.svg

Fig. 13 Possible workflow for two institutes working together with the same samples.

Fig. 13 shows the bouncing of samples and data between two institutes working together. The problem at hand is the experimental activities with sample #1, which is created in Institute A, but also investigated in Institute B. At the same time, it is necessary to be able to look at all data in both institutes.

The text-heavy figure contains most of the detail. Therefore, I will only make some remarks in the following.

Both institutes have their own ELN. Those two ELNs are not only different instances on different computers; they may even be different software.

It is important to see that Institute A is the “real” home of the sample since the URI of the sample is actually a URL in the domain of Institute A. However, there is a URL (not URI) for sample #1 at Institute B. Any party needing to collect all data of sample #1 must poll both URLs. Instances may deliver data also found at other instances but this is not guaranteed.

Institute A keeps a list with all URLs that also have data of sample #1, and exports it as part of the SciMesh graph of sample #1.

Two aspects are not covered at all in this graphics: caching and permissions. While the first is an optional yet important optimisation, the latter is essential and will be covered later.

Getting the graph

A major challenge in SciMesh is the fact that in general, the data of a sample or an insight is scattered over many instances, possibly in different institutions and countries.

The two most important things to see here are:

  1. All URIs of samples and processes are constant. They never change.

  2. All of these URIs are URLs at the same time.

So, if an ELN wants to show all the data for a certain sample, it first makes an HTTP GET against the sample URI. This yields the sample entity, and possibly some process entities. The ELN then traverses the process graph back in time. Whenever it hits a missing cause process, it makes an HTTP GET against its process URI. This yields a new graph that is merged into the existing one. Then, traversal is continued. At some point, there are no causes to look for any more (all remaining cause fields contain rdf:nil), or their URLs cannot be retrieved (because the servers don’t respond or we don’t have the required permissions). Then, the graph is displayed to the user.

Requirements for ELNs

Participating databases or ELNs need to implement the following:

  1. Every process and its process history (i.e. the graph back in time) must be the response to an HTTP GET to that process URI. External processes (i.e. with URIs under the control of other systems) needn’t be included.

  2. An HTTP GET to the sample URI must return the sample entity, the processes it points to in the “state” properties, and the whole process graph. Again, external processes needn’t be included.

  3. An HTTP POST to the sample URI with a JSON payload of the form

    {"state": ["http://example.com/processes/1",
               "http://example.com/processes/2"]
    }
    

    adds the containing process URIs to the sample, i.e. adding “state” properties with these URIs as objects.

Note that all of these requests – including the POST – may be answered by HTTP 30x codes and need to be repeated with the new URL.

Considerations about visualisation

Grouping of processes

Visualising agents should consider all processes that form a connected graph via “concurrent” relations a group.

In Fig. 6 and Fig. 7, the latest non-concurrent process in the group (\(P_2\) in the first figure and \(P_{2,4}\) in the second) represents the state of the group. Consequently, its URI is used to refer to the entire group. Despite that, the concurrent should be the top-level or outermost component in the visualisation. The reverse way from the concurrent through the “cause” relations leads to the inner regions of the visualisation. I can be thought of as an onion-like structure, although it needn’t be displayed that way.

_images/visualisation_groups.svg

Fig. 14 Graph on the top, resulting visualisation on the bottom. This visualisation is just a serving suggestion, of course. For example, arranging \(P_1\) and \(P_2\) horizontally would be more consequent but possibly less aesthetical.

Good URIs

We recommend to follow the guidelines explained in “Cool URIs for the Semantic Web” when you create URI of any kind, but in particular for samples and processes. Moreover, you may or may not use any of the ubiquitous PID (persistent identifier) services out there, some of which are specialised to physical samples.

In any case, URIs of samples and processes must be URLs at the same time that yield the corresponding SciMesh graph via HTTPS. (Note that the URI should start with http://, though.)

Authentication

Scientific data might be sensitive for a couple of reasons. Anyway, an ELN will generally refuse to deliver knowledge graphs without authentication and authorisation.

Currently, SciMesh-compliant ELNs must implement the following method of authorisation, called “mutual trust”.

Mutual trust

_images/trust1.svg

Fig. 15 Trust graph for the “mutual trust” method.

This method is very simple. The ELN instance contains a list of peering ELNs that it trusts. In practise, this trust is established by a bilateral agreement between the managements of the respective institutions.

Technically, both ELN instances must send TLS client certificates at each HTTPS request, which are used to authenticate the ELN by the peer.

Note that this scenario does not mean that every scientist can access all data of the peer ELN. It only means that the ELN instance can access all data on the peer ELN, but in general will show only a subset of it to the scientist.

Fig. 15 shows the underlying trust graph. As said, both ELNs trust each other, and the scientists trust their respective ELN.

Discussion of other methods

_images/trust2.svg

Fig. 16 Trust graph for the “personal trust” model.

The “mutual trust” may seem to be all too trustful. Therefore, let’s have a look at a more restricted option.

Fig. 15 shows an alternative trust graph. Here, the ELNs do not need to trust each other. Instead, they trust certain scientists of the other institute.

It is not easy at all for an ELN to implement this. Since no data must be relayed through the peer ELN (which we don’t trust), the sample data sheet with the joint data needs to be compiled in the browser by JavaScript or WebAssembly code (or a proxy web service, for that matter) that can be trusted by the scientist not to forward any data to third parties, in particular, not to the ELN of the scientist’s own institute.

This does not make much sense, given that the scientist and the ELN is under the very same governance, therefore, SciMesh does not include an authorisation method for such a trust model for the time being. It may make sense, however, if at least one of the ELNs provides its service to scientists of different institutions (governances).