Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On ObsCore profiles, Obscore extensions and registry matters #73

Open
Bonnarel opened this issue Nov 26, 2024 · 3 comments
Open

On ObsCore profiles, Obscore extensions and registry matters #73

Bonnarel opened this issue Nov 26, 2024 · 3 comments

Comments

@Bonnarel
Copy link
Collaborator

If we want to complete the basic ObsCore table by a set of new standard attributes (= columns) in the TAP_SCHEMA we can define this specific set of columns as an "ObsCore profile"

The profile is relevant to a "schema" in the tableset which should contain all the columns in whatever tables inside the schema.

It is admitted that in the radio case the basic ObsCore parameters and the radio extension specific parameters will be hosted in two different tables belonging to the same "ivoa" schema. The recommendation is that the basic ObsCore table is called ivoa.obscore and that the extension table is called ivoa.obscore_radio.

The main reason is that the same TAP service may contain data in the radio domain and data outside this domain for which the extension parameters are irrelevant.
So storing everything in the same table would imply that many rows in the table will show NULL values for the extension parameters.


But how do we help users to discover services which serve these two tables ?

Each standard data model has a standardID. This is the case of ObsCore, of EPN-TAP, of RegTAP....

From the registry point of view, the occurrence of an ObsCore table itself in a service is recognized via the datamodel element of a service capability. This practice has the drawback to match a datamodel with a service and not with the tables it serves.

That's the reason why the practice changed and why EPN-TAP and RegTAP used another method.

This is summarized in this recent IVOA note published by Markus :

https://ivoa.net/documents/Notes/TableReg/20240821/

So if the model is serialized in a single table, let's set the standardID of this model as the value of the utype attribute of the table in the registry record

So if the model is serialized in several tables, let's set the standardID of this model as the value of the utype attribute of the schema grouping all these tables in the registry record.

What can happen for the ObsCore extension for radio data ? Obscore and it's extension are actually part of the same data model. So they have the same basic standardID "ivo://ivoa.net/std/ObsCore" and the presence of the extension columns may be rendered by a fragment in the URI "ivo://ivoa.net/std/ObsCore#RadioEXt-1.0"

Now where can we put this standardID in the registry ?
Currently the thing is organized in two tables which are strongly related. So they are in the same "schema". However setting
utype="ivo://ivoa.net/std/ObsCore#RadioEXt-1.0" on the schema is not appropriate. This would mean all the tables in the schema are dealing with radio data. But we may have dataset in the ObsCore table which are outside the radio domain.

Another solution could be to set the utype="ivo://ivoa.net/std/ObsCore#RadioEXt-1.0" on the ivoa.obscore_radio table only.

But this may be confusing and encourage users to try to query this table only which would be a nonsense. The obscore_radio table alone is useless Only a query on a natural join between obscore and obscore_radio makes sense.

Hence the proposal to set the standardID utype on a "view" table defined as the natural join of the ObsCore and obscore_radio tables. If such a utype is discovered in a service we know that the schemac ontaining this view will also contain ObsCore and obscore_radio tables. Of course in practice it may be more efficient to query the two tables by a direct join instead of this view. But the view is there to inform registry users that this service serves ObsCore with its radio extension.

By the way the concept of specific profiles of ObsCore defined this way as a full set of columns is agnostic about in which real table we find the columns. Hence if we decide later to move some of the extension columns in the basic Obscore it's very easy to redefine the profile in a new version. This will not break anything.

@msdemlei
Copy link
Contributor

Just to prevent possible confusion: The current draft says we should have "ivo://ivoa.net/std/ObsCore#radioExt-1.0" in the table's utype (incidentally: let's lowercase that identifier in the spec; I'd like to move away from relying on case-insensitive ivoids).

If I understand François' concern right, it is basically that he argues there is no point in separately discovering that table, because there are probably no conceivable science queries against that table alone.

The premise of that statement is probably right. I don't think I agree with the conclusion, though. You see, I think only a prankster would have obs_radio without ivoa.obscore, and hence if you see a #radioExt-1.0-utyped table, you know you can run queries joining obs_radio and obscore on it. If you don't want to rely on people not being praksters, we could add a "if you publish obs_radio, you MUST have ivoa.obscore, too" in the spec.

Against that, creating a view in order to have something usable without a join to me seems to set a few traps for a very limited benefit; for one, operators need to make sure to create these views every time they re-create one of the underlying tables, and if we follow this pattern for other extensions, this can become a chore even for a computer.

Also, it feels a bit odd when you want, say, a service with both a radio and a time extension: You would discover the two views but then use neither of them (assuming we don't want to in addition create a radio-and-time view, too, which to me is too horrible to even contemplate).

So... my vote is for keeping things as they are.

@Bonnarel
Copy link
Collaborator Author

Sorry for the delay, Markus.

Just to prevent possible confusion: The current draft says we should have "ivo://ivoa.net/std/ObsCore#radioExt-1.0" in the table's utype (incidentally: let's lowercase that identifier in the spec; I'd like to move away from relying on case-insensitive ivoids).

OK. Done.

If I understand François' concern right, it is basically that he argues there is no point in separately discovering that table, because there are probably no conceivable science queries against that table alone.

Right, Markus. But not only this. See below.

The premise of that statement is probably right. I don't think I agree with the conclusion, though. You see, I think only a prankster would have obs_radio without ivoa.obscore, and hence if you see a #radioExt-1.0-utyped table, you know you can run queries joining obs_radio and obscore on it. If you don't want to rely on people not being praksters, we could add a "if you publish obs_radio, you MUST have ivoa.obscore, too" in the spec.

Against that, creating a view in order to have something usable without a join to me seems to set a few traps for a very limited benefit; for one, operators need to make sure to create these views every time they re-create one of the underlying tables, and if we follow this pattern for other extensions, this can become a chore even for a computer.

The important point is to have a way to identify in one spot that the service, (the ivoa schema in the service more accurately) actually contains radio data described by the root obscore + extension. If creating a view containing all attributes is a concern we can have an empty table instead. Only the list of attributes as a consistent set. And forget about the actual view building I was proposing.

Then the standard ID utype I was proposing would be set on this empty table. This would imply that both the root table and the radio table with extension attributes will be there in the service ivoa schema.

Also, it feels a bit odd when you want, say, a service with both a radio and a time extension: You would discover the two views but then use neither of them (assuming we don't want to in addition create a radio-and-time view, too, which to me is too horrible to even contemplate).

But in that case I don't see the difference : we need an id for that part of the tables that uses both radio and time extension attributes (pulsar radio data, FRB, radio source time series, etc...)

Because finding out a service which contains both radio extension and time extension would not imply that the service delivers a specific pulsar subset which will contain both.

I think the final correct view will be a an extended set of classes and attributes required for datasets descriptions in all electromagnetic domains.

Among all these attributes we can define profiles. The current overall domain ObsCore set will be the root profile.

Then we can have the radio profile, the time profile, the time+radio (pulsar ?) profile, the heig profile, ... all containing the root attributes + the dedicated others

Each profile will have it's stantardID (root Obscore one + a dedicate fragment)

I propose to write some dummy scenarii to show how it could work.

So... my vote is for keeping things as they are.

Current Proposed recommendation contains a version of the registry section with utype on the ivoa schema which I think now is wrong because it implies that only radio data are included in the service. So I think you mean go back to your previous proposal which was

%\section{Registry Aspects}
%\label{sec:registry}
%
%Services compliant with this specification are registered using
%VODataService \citep{2021ivoa.spec.1102D} tablesets. Compliant tables
%use the utype
%$$
%\hbox{\nolinkurl{ivo://ivoa.net/std/obsradio#table-1.0}.}
%$$
%
%While it is admitted that the table only sits in the tableset of the
%embedding TAP service, implementors are urged to use a seperate registry
%record with the main TAP service as an auxiliary capability
%\citep{2019ivoa.spec.0520D}. In this way, meaningful information
%on coverage in space, time, and spectraum as per VODataService 1.2 can
%be communicated to the Registry, which, again, data providers are urged
%to do. There is no expectation that the coverage information only
%pertains to data with entries in \verb|ivoa.obs_radio|, i.e., it may be
%a copy of the coverage of the full Obscore table.\todo{Is that
%acceptable? Or should we require pure radio coverage?}
%
%A sample registry record for an obs_radio table comes with this
%specification\footnote{\auxiliaryurl{sample-record.xml}}.
%
%To obtain access URLs of all TAP services that have compliant tables
%together with their table names (which in this major version are fixed
%to \verb|ivoa.obs_radio|), use a RegTAP \citep{2019ivoa.spec.1011D}
%query like:
%
%\begin{lstlisting}
%SELECT DISTINCT(access_url), table_name
%FROM rr.res_table
% NATURAL JOIN rr.capability
% NATURAL JOIN rr.interface
%WHERE
% standard_id LIKE 'ivo://ivoa.net/std/tap%'
% AND intf_role='std'
% AND table_utype LIKE 'ivo://ivoa.net/std/obsradio#table-1.%'
%\end{lstlisting}

@msdemlei
Copy link
Contributor

msdemlei commented Jan 13, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants