Discover who rents a domain name
The Web application below allows you to provide a domain name to see
who controls this domain and to which machine calls to this address are
routed. Here for instance we called that service on the domain name “dbpedia.org”
https://who.is/whois/dbpedia.org
Choosing a scheme for your URIs
An HTTP URI is a URI created to name anything we want to talk about
but that uses the HTTP in order to be “dereferenceable” i.e. so that a
person or a software finding that URI (e.g. a Web crawler)
may easily learn more about the resource represented by that URI by
just making and HTTP call to the HTTP address it provides. We don’t use
the term URL (locator) because the thing that is being represented may
not be itself on the Web at this address.
For example, I may want to
give an HTTP URI to Mytsie (my cat). No matter how hard I try, Mytsie
itself will never be “located” on the Web (it is a not a URL) but this
adorable cat can be identified on the Web by an HTTP URI and if you ever
go to that address you will be provided with a description on the Web
about the resource represented by that URI, i.e. my cat.
Now, how
do we choose the URIs we are going to use to talk about the things we
want to describe? What should be their structure or schema?
The generic form of a URI is
scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]
For classical HTTP URIs we will have a schema of the form:
http://host[:port]][/]path[?query][#fragment]
We
already mentioned the importance of choosing well the domain name for
the host par of the identifier. But what about the rest of the address?
There
is no unique correct answer to that question and here are two documents
that discuss the different options with pros and cons. As you will see
the answer is neither simple nor closed:
In
many cases, the objects we want to describe already have some kind of
identifier. In theory, you can transform any identifier into an HTTP
URI, for instance, just by choosing a transformation (URI scheme) of the
form
http:///
For example, if I want to identify cats, I could choose the following minting scheme:
cat;1278 → http://animals.org/cat/1278
Then HTTP content negotiation (conneg)
and, possibly, redirections are configured on the server to provide
content in XML, RDF, HTML, JSON, etc. to whoever accesses that address.
Of
course, depending on the type of identifier you initially had, you may
need to use the URI encoding mechanism we introduced before.
To illustrate that first step, we can mention the real example of the digital object identifier (DOI). There is a way to lookup any DOI on the Web through a service implementing a mapping from DOIs to HTTP URIs.
If you take the following DOI for instance:
doi:10.1007/3-540-45741-0_18
You can transform it into the following HTTP URI following the URI minting scheme implemented by doi.org:
http://dx.doi.org/10.1007/3-540-45741-0_18
This HTTP URI will then redirect you to a description of the object identify by the DOI.
So, choosing the URIs will strongly depend on the domain to which the objects you want to describe belong.
However, there are two families of HTTP URIs that can be considered every time you want to choose a naming scheme: the “hash URIs” (long story) and the “slash URIs” and the discussion they led to.
When a URI contains a hash (i.e. the symbol # ), this indicates a fragment in the URI:
http://my.domain.name/my/path#the-fragment
The
HTTP standard requires the Web client to remove the fragment before
making a request so if you make an HTTP call on this URI it will in fact
be performed on the address:
http://my.domain.name/my/path
The use of a fragment has two advantages:
- To
immediately differentiate, for instance, the name (URL) of a file on
the Web containing descriptions and the names (URIs with fragments) of
the resources it describes;
- The grouping of several
descriptions in one file that can be cached and avoid several calls to
discover different linked resources.
For example, in one source at the address:
http://fabien.gandon.me/my/objects/cars
I can describe several things:
http://fabien.gandon.me/my/objects/cars#bmw1http://fabien.gandon.me/my/objects/cars#smart1http://fabien.gandon.me/my/objects/cars#tesla1…
It
has "the disadvantages of its advantages ": one cannot obtain the
description of only one resource since the whole document is retrieved
every time the address is accessed and this could be costly in terms of
network traffic, memory and processing when the file is large.
The alternative is to use only the path with slashes (i.e. the symbol / ) to generate identifiers. For instance:
http://fabien.gandon.me/my/objects/cars/bmw1
In that case the server needs to implement a redirection to respond to these addresses with an HTTP 303 error code "See Other".
This is to indicate that this URI identifies a resource that is not
directly available on the Web and to redirect the requester to another
URL where a description about that resource is available. A server
should not answer directly (HTTP 200 OK) because it would mean the
object (the car for instance here) is available on the Web and it can be
retrieved through HTTP which is not true. So the server should redirect
( HTTP 303 error code "See Other") the requester to another address
where to find data about the object (the car in our example). Again the
content negotiation is used to redirect the requester to a URL
corresponding to the requested content format. For instance in HTML:
http://fabien.gandon.me/my/objects/cars/bmw1.html
or in XML:
http://fabien.gandon.me/my/objects/cars/bmw1.xml
This
alternative, using slashes, allows us to be much more modular in the
storage and transfer of descriptions. Here, a Web client can retrieve
only the description it is interested in.
Disadvantages include
the multiplication (by two) of HTTP calls (the first access and the
second one after the redirection) and the fragmentation of the data that
requires multiple calls when one wants to retrieve a collection of
them.
To summarize, fragments can be used for small datasets
where grouping makes sense (unity of content, linked, same life cycle).
This option is also the simplest one as it can be implemented, for
instance, just by hosting a file on a Web server. The redirection by
HTTP 303 is more technical but allows more control over the data served.
Finally, nothing prevents you from using and mixing these two options
even inside the same dataset.
FREE BOOK ONLINE
Tom
Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a
Global Data Space (1st edition). Synthesis Lectures on the Semantic Web:
Theory and Technology, 1:1, 1-136. Morgan & Claypool. http://linkeddatabook.com/
To go further...

- LODStats based on the CKAN dataset metadata registry to obtain a comprehensive picture of the current state of the Data Web
- The free HTML version of the book by Tom Heath & Christian Bizer (2011) "Linked Data: Evolving the Web into a Global Data Space". Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool.


- Data on the Web Best Practices, W3C Recommendation 31 January 2017:
best practices for publication and usage of data on the Web to
facilitate interaction between publishers and consumers. This document
from 2017 also shows the evolution of W3C activity to facilitate data on
the Web in general: https://www.w3.org/TR/dwbp/
commentaires
Ajouter un commentaire Lire les commentaires