-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On ObsCore profiles, Obscore extensions and registry matters #73
Comments
Just to prevent possible confusion: The current draft says we should have "ivo://ivoa.net/std/ObsCore#radioExt-1.0" in the table's utype (incidentally: let's lowercase that identifier in the spec; I'd like to move away from relying on case-insensitive ivoids). If I understand François' concern right, it is basically that he argues there is no point in separately discovering that table, because there are probably no conceivable science queries against that table alone. The premise of that statement is probably right. I don't think I agree with the conclusion, though. You see, I think only a prankster would have obs_radio without ivoa.obscore, and hence if you see a #radioExt-1.0-utyped table, you know you can run queries joining obs_radio and obscore on it. If you don't want to rely on people not being praksters, we could add a "if you publish obs_radio, you MUST have ivoa.obscore, too" in the spec. Against that, creating a view in order to have something usable without a join to me seems to set a few traps for a very limited benefit; for one, operators need to make sure to create these views every time they re-create one of the underlying tables, and if we follow this pattern for other extensions, this can become a chore even for a computer. Also, it feels a bit odd when you want, say, a service with both a radio and a time extension: You would discover the two views but then use neither of them (assuming we don't want to in addition create a radio-and-time view, too, which to me is too horrible to even contemplate). So... my vote is for keeping things as they are. |
Sorry for the delay, Markus.
OK. Done.
Right, Markus. But not only this. See below.
The important point is to have a way to identify in one spot that the service, (the ivoa schema in the service more accurately) actually contains radio data described by the root obscore + extension. If creating a view containing all attributes is a concern we can have an empty table instead. Only the list of attributes as a consistent set. And forget about the actual view building I was proposing. Then the standard ID utype I was proposing would be set on this empty table. This would imply that both the root table and the radio table with extension attributes will be there in the service ivoa schema.
But in that case I don't see the difference : we need an id for that part of the tables that uses both radio and time extension attributes (pulsar radio data, FRB, radio source time series, etc...) Because finding out a service which contains both radio extension and time extension would not imply that the service delivers a specific pulsar subset which will contain both. I think the final correct view will be a an extended set of classes and attributes required for datasets descriptions in all electromagnetic domains. Among all these attributes we can define profiles. The current overall domain ObsCore set will be the root profile. Then we can have the radio profile, the time profile, the time+radio (pulsar ?) profile, the heig profile, ... all containing the root attributes + the dedicated others Each profile will have it's stantardID (root Obscore one + a dedicate fragment) I propose to write some dummy scenarii to show how it could work.
Current Proposed recommendation contains a version of the registry section with utype on the ivoa schema which I think now is wrong because it implies that only radio data are included in the service. So I think you mean go back to your previous proposal which was %\section{Registry Aspects} |
On Sun, Jan 12, 2025 at 02:52:23PM -0800, François Bonnarel wrote:
> Just to prevent possible confusion: The current draft says we
> should have "ivo://ivoa.net/std/ObsCore#radioExt-1.0" in the
> table's utype (incidentally: let's lowercase that identifier in
[...]
The important point is to have a way to identify in one spot that
the service, (the ivoa schema in the service more accurately)
actually contains radio data described by the root obscore +
extension. If creating a view containing all attributes is a
I would submit that "there is a table with the #radioext-1.0 utype in
this TAP service" is exactly that spot. What do you see missing from
it? Or is it some duplication that you are concerned about?
> Also, it feels a bit odd when you want, say, a service with both
> a radio and a time extension: You would discover the two views
> but then use neither of them (assuming we don't want to in
> addition create a radio-and-time view, too, which to me is too
> horrible to even contemplate).
>
But in that case I don't see the difference : we need an id for
that part of the tables that uses both radio and time extension
attributes (pulsar radio data, FRB, radio source time series,
etc...)
I'm afraid I quite get what you are saying here, and I think that's
mainly because I don't really understand what "that part of the
tables that uses both radio and time" would be. Are you thinking of
a rowset here, i.e., something that would be returned if you say
ivoa.obscore
NATURAL JOIN ivoa.obs_radio
NATURAL JOIN ivoa.obs_time
?
I would give you it would be (mildly) desirable to be able to
discover "This service has observations that have extra metadata in
*both* the radio and the time extensions, but I think we'd have to pay
a high price for building something like that, and I'd submit we
should first get single-column statistics addressed properly
[shameless plug: https://ivoa.net/documents/Notes/colstatnote/ is
still waiting for feedback and implementations] before even daring to
think about this kind of correlated statistics.
Or are you thinking about column sets? If so, doesn't #radioext-1.0
uniquely specify a well-defined column set that's exactly what people
would want to discover ("Give me all services I can run my query
using the radio extension against")?
Because finding out a service which contains both radio extension
and time extension would not imply that the service delivers a
specific pulsar subset which will contain both.
This would suggest you are thinking of row subsets, and as just said
I strongly suspect that'll be too hard for data discovery for years
to come. I also don't think that's a very strong discovery case to
begin with: it's probably almost always faster to just query a few
extra services than to come up with a clever discovery condition.
Among all these attributes we can define profiles. The current
overall domain ObsCore set will be the root profile.
Then we can have the radio profile, the time profile, the
time+radio (pulsar ?) profile, the heig profile, ... all containing
the root attributes + the dedicated others
Each profile will have it's stantardID (root Obscore one + a
dedicate fragment)
But that would be the combinatorial catastrophe I was always afraid
of. I give you we probably won't have 10 extensions for quite a
while, but if we had them, we'd have 2^10=1024 such "profiles". I,
for one, would consider that a nightmare.
Current Proposed recommendation contains a version of the registry
section with utype on the ivoa schema which I think now is wrong
"On the ivoa schema" would be wrong, right. But at least in my docs
it's the obs_radio table that the utype is on. And I'd argue that's
exactly where it should be.
because it implies that only radio data are included in the
service. So I think you mean go back to your previous proposal
which was
Uh... My checkout (git commit d50ff42) still says
Compliant *tables* use the utype
ivo://ivoa.net/std/ObsCore#radioExt-1.0.
(emphasis mine), and the sample-record.xml also has the utype at the
right position. If the document says the utype is on the schema
somewhere, that's a bug and obviously needs to be fixed.
Where do you see that? [the doc repo is down for me right now, so I
looked at github HEAD, which I think ought to be at least as good...]
|
If we want to complete the basic ObsCore table by a set of new standard attributes (= columns) in the TAP_SCHEMA we can define this specific set of columns as an "ObsCore profile"
The profile is relevant to a "schema" in the tableset which should contain all the columns in whatever tables inside the schema.
It is admitted that in the radio case the basic ObsCore parameters and the radio extension specific parameters will be hosted in two different tables belonging to the same "ivoa" schema. The recommendation is that the basic ObsCore table is called ivoa.obscore and that the extension table is called ivoa.obscore_radio.
The main reason is that the same TAP service may contain data in the radio domain and data outside this domain for which the extension parameters are irrelevant.
So storing everything in the same table would imply that many rows in the table will show NULL values for the extension parameters.
But how do we help users to discover services which serve these two tables ?
Each standard data model has a standardID. This is the case of ObsCore, of EPN-TAP, of RegTAP....
From the registry point of view, the occurrence of an ObsCore table itself in a service is recognized via the datamodel element of a service capability. This practice has the drawback to match a datamodel with a service and not with the tables it serves.
That's the reason why the practice changed and why EPN-TAP and RegTAP used another method.
This is summarized in this recent IVOA note published by Markus :
https://ivoa.net/documents/Notes/TableReg/20240821/
So if the model is serialized in a single table, let's set the standardID of this model as the value of the utype attribute of the table in the registry record
So if the model is serialized in several tables, let's set the standardID of this model as the value of the utype attribute of the schema grouping all these tables in the registry record.
What can happen for the ObsCore extension for radio data ? Obscore and it's extension are actually part of the same data model. So they have the same basic standardID "ivo://ivoa.net/std/ObsCore" and the presence of the extension columns may be rendered by a fragment in the URI "ivo://ivoa.net/std/ObsCore#RadioEXt-1.0"
Now where can we put this standardID in the registry ?
Currently the thing is organized in two tables which are strongly related. So they are in the same "schema". However setting
utype="ivo://ivoa.net/std/ObsCore#RadioEXt-1.0" on the schema is not appropriate. This would mean all the tables in the schema are dealing with radio data. But we may have dataset in the ObsCore table which are outside the radio domain.
Another solution could be to set the utype="ivo://ivoa.net/std/ObsCore#RadioEXt-1.0" on the ivoa.obscore_radio table only.
But this may be confusing and encourage users to try to query this table only which would be a nonsense. The obscore_radio table alone is useless Only a query on a natural join between obscore and obscore_radio makes sense.
Hence the proposal to set the standardID utype on a "view" table defined as the natural join of the ObsCore and obscore_radio tables. If such a utype is discovered in a service we know that the schemac ontaining this view will also contain ObsCore and obscore_radio tables. Of course in practice it may be more efficient to query the two tables by a direct join instead of this view. But the view is there to inform registry users that this service serves ObsCore with its radio extension.
By the way the concept of specific profiles of ObsCore defined this way as a full set of columns is agnostic about in which real table we find the columns. Hence if we decide later to move some of the extension columns in the basic Obscore it's very easy to redefine the profile in a new version. This will not break anything.
The text was updated successfully, but these errors were encountered: