JSON Schema #10

anseljh · 2020-06-06T07:20:40Z

This PR adds a JSON Schema. It also adds a unit test to tests.py that tries to validate the courts data file against that schema.

I fixed a few typos, and updated some bits of the court data so they would conform to the schema. For example, there were many empty strings that I turned into nulls and other small things like that. You scan see these all in commit a7297bc.

Some things I did not fix yet, because I wasn't sure whether the data or the schema should change. Those are:

Georgia Superior Courts level is weird #7: The level of the Georgia Superior Courts is "gjc & iac", which was odd.
Delaware Court of Chancery type is weird ("non-trial") #8: The Delaware Chancery Court is the only court with a type of "non-trial". It's an unusual court, but I suggest we code it as "trial".
Superior Court of Delaware has multiple types #9: Similar to Georgia Superior Courts level is weird #7, the Superior Court of Delaware has has a compound type of "trial & iac".

Those three issues should be the last things remaining before the courts file validates.

I did not understand what is supposed to go in the jurisdiction and case_types items, so I left them mostly blank except for a description of "TODO".

Closes #2

mlissner

A couple little things, but generally this looks great and like a solid and important improvement. I'll let @flooie make the final call though, once he's back.

mlissner · 2020-06-09T06:38:19Z

tests.py

+            with open_schema() as schema_f:
+                schema_data = schema_f.read()
+                schema = json.loads(schema_data)
+
+                try:
+                    jsonschema.validate(
+                        instance=instance,
+                        schema=schema,
+                    )
+                except jsonschema.ValidationError as e:
+                    self.fail("JSON failed validation against schema")


I think this whole thing can be outdented, right?

mlissner · 2020-06-09T06:40:06Z

tests.py

+            with open_schema() as schema_f:
+                schema_data = schema_f.read()
+                schema = json.loads(schema_data)
+
+                try:
+                    jsonschema.validate(
+                        instance=instance,
+                        schema=schema,
+                    )
+                except jsonschema.ValidationError as e:
+                    self.fail("JSON failed validation against schema")


It seems like this validation would fail without a meaningful message. Is there a better way to do this that gives us something more about how the validation failed?

Yes, I'm pretty sure that's possible. jsonschema.validate() typically barfs more useful stuff when it raises its exceptions. I'll look into it.

Should be fixed, let me know what it does for you.

mlissner · 2020-06-09T06:43:49Z

courts_db/data/courts.json

@@ -17440,7 +17426,7 @@
            }
        ],
        "name": "Decisions of the Federal Communications Commission",
-        "level": "",
+        "level": null,


This is philosophical, but generally I follow Django's approach to empty strings, which is:

In most cases, it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL.

(https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.Field.null)

I've argued about it before, but I think the convention makes enough sense that it's worth it just to be consistent throughout FLP. I guess I'd listen to opposing views, but I usually appreciate knowing that strings with no value are "" instead of either null or "".

Something tells me courts-db is already using null in this way though....hrm. If that's the case, I feel ambivalent about changing all of courts-db to use "" instead of null, though I still kind of feel like it'd be worth it.

Oof...I'm not well acquainted with Django-land, but that seems like a really weird convention. To me, empty string means empty string, not absence of data. How does that even really help? You're just doing a different check (if x == "" instead of if x is None).

In any event, for the level element here, it is constrained by an enum so you would never have to guess whether you're getting an empty string or a null. The enum in the schema tells you only one of those is valid. See: https://github.com/freelawproject/courts-db/pull/10/files#diff-11aaec965ced488b7af5aa03d35c580dR32-R43

Courts-DB was in fact using a lot of empty strings, but I changed them to nulls in a7297bc.

In any event, for the level element here, it is constrained by an enum so you would never have to guess whether you're getting an empty string or a null

That assumes you remember that there's a schema and what's in it. The nice thing about sticking with the convention is that you always know that "no data" is "", no matter where you are in the code base.

You're just doing a different check (if x == "" instead of if x is None)

In Django, you can almost always do x == "" precisely because the convention is there. Without it, you have to do x == "" or x is None, because "no data" can have multiple values.

If I win you over a bit, let's switch it back. If you feel like I'm wrong, and see no value in Django's approach, very well.

mlissner · 2020-06-10T00:01:28Z

I also just noticed that black is failing you. You should be able to install it pretty easily, and then once you have it, you just do black . in your code directory, and it should do the right thing.

anseljh · 2020-06-13T02:21:31Z

Tests are all green now, which is weird because I expected the validation test to fail with something like this because of #8:

AssertionError: JSON failed validation against schema: 'non-trial' is not one of [None, 'ag', 'appellate', 'bankruptcy', 'international', 'special', 'trial']

Failed validating 'enum' in schema['items']['properties']['type']:
    {'description': 'What kind of court it is, e.g., "appellate"',
     'enum': [None,
              'ag',
              'appellate',
              'bankruptcy',
              'international',
              'special',
              'trial'],
     'type': ['string', 'null']}

On instance[85]['type']:
    'non-trial'

----------------------------------------------------------------------
Ran 5 tests in 3.797s

FAILED (failures=1)

Looks like the culprit is the CI test configuration here, which isn't running that test: https://github.com/freelawproject/courts-db/blob/master/.github/workflows/tests.yml#L25

Is there a reason to limit what tests are run, or can we just make that bit:

    - name: Run tests
      run: |
        python tests.py

mlissner · 2020-06-16T18:15:29Z

No reason, @anseljh. Want to fix the tests too?

mlissner · 2020-06-16T18:15:46Z

(Or rather, fix the CI test config, is what I meant to say.)

anseljh · 2020-06-20T01:07:35Z

Ok, I updated the CI config. Now the test I expect to fail is failing. Once we decide how to resolve #7, #8, and #9, it should be all green.

CLAassistant · 2021-12-18T00:09:01Z

All committers have signed the CLA.

mlissner · 2021-12-18T00:18:05Z

Huh. I just turned on the CLA bot, but didn't realize it'd spam an actual IP lawyer. Ansel, maybe you have thoughts about it. I'm playing with it for this and a couple other smaller repos atm.

We should get this merged too. I thought it was merged ages ago!

anseljh · 2022-01-19T09:21:58Z

Huh. I just turned on the CLA bot, but didn't realize it'd spam an actual IP lawyer. Ansel, maybe you have thoughts about it. I'm playing with it for this and a couple other smaller repos atm.

That's kind of awesome and funny. Happy to have a look at the CLA, but you'll probably need to remind me.

We should get this merged too. I thought it was merged ages ago!

Sure. We'll need to re-run the tests.

mlissner · 2022-01-20T00:25:43Z

That's kind of awesome and funny. Happy to have a look at the CLA, but you'll probably need to remind me.

Take a look up thread, there's a link where you can agree to it.

brianwc · 2022-01-20T05:29:43Z

Wow. Didn't know there was a CLA-bot. I love it!

…

On Wed, Jan 19, 2022, 4:26 PM Mike Lissner ***@***.***> wrote: That's kind of awesome and funny. Happy to have a look at the CLA, but you'll probably need to remind me. Take a look up thread, there's a link where you can agree to it. — Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACPKOOSEMHIXM6RMCBPWB3UW5JBHANCNFSM4NVWL43A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mlissner · 2022-01-20T06:48:35Z

It's new. I thought I flagged it for you, since I knew how you'd feel. It seems to be working well and uses the cla you provided in 2009 or so.

…

On Wed, Jan 19, 2022, 21:29 Brian Carver ***@***.***> wrote: Wow. Didn't know there was a CLA-bot. I love it! On Wed, Jan 19, 2022, 4:26 PM Mike Lissner ***@***.***> wrote: > That's kind of awesome and funny. Happy to have a look at the CLA, but > you'll probably need to remind me. > > Take a look up thread, there's a link where you can agree to it. > > — > Reply to this email directly, view it on GitHub > < #10 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AACPKOOSEMHIXM6RMCBPWB3UW5JBHANCNFSM4NVWL43A > > . > Triage notifications on the go with GitHub Mobile for iOS > < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > > or Android > < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. > > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> > — Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABZ3KRTPO6N2NOHY6H6EFDUW6MVHANCNFSM4NVWL43A> . You are receiving this because you commented.Message ID: ***@***.***>

anseljh added 5 commits June 6, 2020 00:06

Add JSON schema file

d1adbc5

Fix some typos and update courts data for schema conformity

a7297bc

Add JSON Schema validation to tests

c85732d

Update requirements_dev.txt

b362df6

Oops, move dependency to requirements.txt

490ed00

mlissner reviewed Jun 9, 2020

View reviewed changes

anseljh added 3 commits June 12, 2020 18:56

Format with black

a09c14a

Improve error message on schema-validation failure

85c5316

Farewell, f-string, we barely knew ye (fails Python 3.5 tests)

ed49c58

Update CI config to run all tests

8cb13cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Schema #10

JSON Schema #10

anseljh commented Jun 6, 2020

mlissner left a comment

mlissner Jun 9, 2020

anseljh Jun 13, 2020

mlissner Jun 9, 2020

anseljh Jun 9, 2020

anseljh Jun 13, 2020

mlissner Jun 9, 2020

mlissner Jun 9, 2020

anseljh Jun 9, 2020

mlissner Jun 10, 2020

mlissner commented Jun 10, 2020

anseljh commented Jun 13, 2020 •

edited

Loading

mlissner commented Jun 16, 2020

mlissner commented Jun 16, 2020

anseljh commented Jun 20, 2020

CLAassistant commented Dec 18, 2021 •

edited

Loading

mlissner commented Dec 18, 2021

anseljh commented Jan 19, 2022

mlissner commented Jan 20, 2022

brianwc commented Jan 20, 2022 via email

mlissner commented Jan 20, 2022 via email

JSON Schema #10

Are you sure you want to change the base?

JSON Schema #10

Conversation

anseljh commented Jun 6, 2020

mlissner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlissner commented Jun 10, 2020

anseljh commented Jun 13, 2020 • edited Loading

mlissner commented Jun 16, 2020

mlissner commented Jun 16, 2020

anseljh commented Jun 20, 2020

CLAassistant commented Dec 18, 2021 • edited Loading

mlissner commented Dec 18, 2021

anseljh commented Jan 19, 2022

mlissner commented Jan 20, 2022

brianwc commented Jan 20, 2022 via email

mlissner commented Jan 20, 2022 via email

anseljh commented Jun 13, 2020 •

edited

Loading

CLAassistant commented Dec 18, 2021 •

edited

Loading