Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIKE operators #87

Closed
sauliusg opened this issue Jun 12, 2019 · 8 comments
Closed

LIKE operators #87

sauliusg opened this issue Jun 12, 2019 · 8 comments
Assignees
Labels
topic/filtering-language Issue discussing changes and improvements to the query and filtering language topic/query-string Issues relating to the query string sent to the OPTIMaDe api, excluding the filter language. type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.

Comments

@sauliusg
Copy link
Contributor

sauliusg commented Jun 12, 2019

It has been suggested (Issue #42, PR #69) to have operator similar to SQL "LIKE" in a query language.
Since there are reservations about implementing 'LIKE "%string%"' construct as being too much geared towards a particular DB implementation, and 'LIKE "string"' would be inventing something superfluously similar to the SQL LIKE but different enough to be confusing, what about the following syntax:r

some_property MATCHES "*C*"

and interpret it as, say a Shell GLOB?

The idea is to have a query expression that is simpler and more widely supported in database engines than full-featured RegExps, but still useful for general substring searches,

@sauliusg sauliusg added the topic/query-string Issues relating to the query string sent to the OPTIMaDe api, excluding the filter language. label Jun 12, 2019
@sauliusg sauliusg self-assigned this Jun 12, 2019
@sauliusg sauliusg added topic/filtering-language Issue discussing changes and improvements to the query and filtering language type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus. labels Jun 12, 2019
@sauliusg
Copy link
Contributor Author

sauliusg commented Jul 12, 2019

It seems from the issue #42 discussions that we have the following possibilities for querrying substrings:

  • STARTS WITH/ENDS WITH/CONTAINS;

  • LIKE with SQL-like semantics;

  • REGEXP of different kinds (PCRE, ERE, with or without Unicode support).

Each has increasing expressive power, but also may be increasingly difficult to implement on back-ends without the native support (for instance, MySQL at some stage required a plug-in to support regexp searches).

I suggest defining Filter language syntax and the following compatibility requirements:

  • STARTS WITH/ENDS WITH/CONTAINS – required (as it is now);

  • LIKE/UNLIKE with the SQL-like semantics – optional;

  • MATCH GLOB, MATCH PCRE, MATCH ERE (e.g. _cod_chemical_name MATCH PCRE "µ.*[Oo]cta[^\s]+[23]") – optional.

In this way, all backends will provide the same basic STARTS/ENDS/CONTAINS functionality (which is easy to implement using LIKE of SQL, REGEXPs or even a post-filtering); but we will also be able to introduce efficient extensions in a compatible way.

What others think?

@merkys
Copy link
Member

merkys commented Jan 21, 2022

This issue was brought up by @ml-evs in today's Web meeting. @rartino asked the providers to express their opinions about it.

Personally, I like the proposal to support different string matching languages (SQL LIKE, GLOB, PCRE etc). However, I think that here we again encounter the same dilemma where by allowing diversity among providers we transfer the burden onto the clients. I easily image every provider preferring a different string matching language, and only that one.

SQL LIKE seems least demanding and, as such, least powerful. Maybe we can start from that and move on to the next ones?

By the way, I would rename LIKE operator to MATCHES SQLLIKE (not sure if _ in operator name would work) just to have simpler grammar. UNLIKE is the same as NOT LIKE, right?

@ml-evs ml-evs removed this from the v1.2 milestone Dec 5, 2022
@sauliusg
Copy link
Contributor Author

I would close the LIKE issue since we have STARTS/ENDS/CONTAINS for simple queries and REGEXP for advanced ones. The LIKE mechanism seems superfluous.

Anybody against closing this issue?

@merkys
Copy link
Member

merkys commented Oct 18, 2024

AFAIK, OPTIMADE does not support REGEXP mechanisms in filters yet. PR #490 only defined OPTIMADE regular expression format, but there is no way to filter using regular expressions.

@sauliusg
Copy link
Contributor Author

Yes indeed. But the REGEXP syntax is key to provide filtering. We should open a separate issue on REGEXP filtering, IMHO.

@merkys
Copy link
Member

merkys commented Oct 18, 2024

Yes indeed. But the REGEXP syntax is key to provide filtering. We should open a separate issue on REGEXP filtering, IMHO.

Agree. We can close this issue then.

@rartino
Copy link
Contributor

rartino commented Oct 18, 2024

I agree and the suggestion matches my view of what we have agreed to-- to skip "LIKE" and work the new REGEXP definition recently added as an appendix into the query language. There is a closed PR that in its final stage more or less did this (except referring to the appendix for the regex definition) adding a REGEX construct to the grammar; maybe we can recover the end state of that PR and continue from there.

@sauliusg
Copy link
Contributor Author

I'm closing this issue in favor of designing the REGEXP match from the blank sheet in #532

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/filtering-language Issue discussing changes and improvements to the query and filtering language topic/query-string Issues relating to the query string sent to the OPTIMaDe api, excluding the filter language. type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.
Projects
None yet
Development

No branches or pull requests

4 participants