Extend query (filter) language to include substring comparison operators #42

sauliusg · 2018-06-15T07:20:51Z

Let's add the following operators to the filter language:

string_property LIKE "value" # as in SQL 'select string_property from data where string_property like "%value%"'
string_properties STARTS WITH "value" # as in SQL 'select string_property from data where string_property like "value%"'
string_properties ENDS WITH "value" # as in SQL 'select string_property from data where string_property like "%value"'
string_properties UNLIKE "value" # as in SQL 'select string_property from data where string_property NOT like "%value%"'

dwinston · 2018-06-15T09:51:59Z

I agree to extend the query filter language to include the capability for substring comparison. I propose an alternative to the above (or perhaps including the above as syntactic sugar): we can add a single operator, REGEX, to the filter language. The above four cases would map to the following equivalents:

string_property REGEX "value"
string_property REGEX "^value"
string_property REGEX "value$"
NOT string_property REGEX "value"

Furthermore, filters more powerful than substring comparison are enabled by REGEX. I propose the value for this operator be interpreted by the server as a Perl compatible regular expression (i.e. “PCRE” ) version 8.39 with UTF-8 support.

rartino · 2018-06-21T09:27:11Z

@dwinston, What do you suggest an API implementation should do if the underlying backend does not allow string queries using specifically PCRE version 8.39 with UTF-8 support? But, say, some other REGEX format? I'm imagining this would be the typical case, and it seems nasty for essentially all OPTIMaDe implementations to do some form of REGEX translations?...

dwinston · 2018-06-22T01:44:14Z

I suggest an optional attribute "operator_notes" returned by the base URL info endpoint. This attribute is a dictionary, with keys being operators and values being short notes of interest to a client. For the REGEX operator, if an OPTIMaDe implementation uses PCRE 8.39 with UTF-8 support, they MAY provide the field. However, if an implementation interprets a regex differently, it MUST include "operator_notes.regex" and a corresponding value understandable to a human.

I don't see another way around having a default spec for what a regex is (in this case, that supported by MongoDB 3.2+), and, if we insist on implementors being able to deviate from that and still claim to support the operator of the same name, to provide metadata in /info to inform a client implementation.

rartino · 2018-06-23T00:08:58Z

What you suggest seems problematic from a user perspective. Someone who wants to send a regex-type query to many databases will now need to manually deal with these differences in support. They would have to go through all target databases to check compatibility, and possibly manually translate between different regex formats. A user that isn't careful will easily end up with a mix of data resulting from different interpretations of their regex.

Individual databases can already support an extended filtering language on a query parameter like _exmpl_filter=..... Maybe it is reasonable to stay with the simpler substring operators in the standard filter language (which hopefully can be supported as specified in any reasonable backend), and defer full support for REGEX to database-specific extended filtering?

dwinston · 2018-06-27T18:11:16Z

Okay, I agree that lack of unity wrt REGEX interpretation makes adding it to the filter syntax too complex at this time. For now, I drop my advocacy for adding it to the standard filter language. I am for adding the operators as @sauliusg proposed.

, Materials-Consortia#42, Materials-Consortia#19, Materials-Consortia#16, Materials-Consortia#47

giovannipizzi · 2019-06-11T17:15:54Z

This is partially addressed by #69, what remains to do are "LIKE" operators

merkys · 2019-06-26T06:19:35Z

Closing this issue as what remains of it is fully covered in #87.

rartino added a commit to rartino/OPTIMADE that referenced this issue May 12, 2019

Filter changes related to Materials-Consortia#58, Materials-Consortia#17

2b048c8

, Materials-Consortia#42, Materials-Consortia#19, Materials-Consortia#16, Materials-Consortia#47

rartino mentioned this issue May 13, 2019

Updates to filtering language #69

Merged

sauliusg mentioned this issue Jun 12, 2019

LIKE operators #87

Closed

merkys closed this as completed Jun 26, 2019

rartino mentioned this issue Dec 30, 2022

SMILES data type #436

Open

rartino mentioned this issue Jan 6, 2024

Define OPTIMADE regex format #490

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend query (filter) language to include substring comparison operators #42

Extend query (filter) language to include substring comparison operators #42

sauliusg commented Jun 15, 2018

dwinston commented Jun 15, 2018

rartino commented Jun 21, 2018

dwinston commented Jun 22, 2018

rartino commented Jun 23, 2018

dwinston commented Jun 27, 2018

giovannipizzi commented Jun 11, 2019

merkys commented Jun 26, 2019

Extend query (filter) language to include substring comparison operators #42

Extend query (filter) language to include substring comparison operators #42

Comments

sauliusg commented Jun 15, 2018

dwinston commented Jun 15, 2018

rartino commented Jun 21, 2018

dwinston commented Jun 22, 2018

rartino commented Jun 23, 2018

dwinston commented Jun 27, 2018

giovannipizzi commented Jun 11, 2019

merkys commented Jun 26, 2019