Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema #10

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

JSON Schema #10

wants to merge 9 commits into from

Conversation

anseljh
Copy link
Member

@anseljh anseljh commented Jun 6, 2020

This PR adds a JSON Schema. It also adds a unit test to tests.py that tries to validate the courts data file against that schema.

I fixed a few typos, and updated some bits of the court data so they would conform to the schema. For example, there were many empty strings that I turned into nulls and other small things like that. You scan see these all in commit a7297bc.

Some things I did not fix yet, because I wasn't sure whether the data or the schema should change. Those are:

Those three issues should be the last things remaining before the courts file validates.

I did not understand what is supposed to go in the jurisdiction and case_types items, so I left them mostly blank except for a description of "TODO".

Closes #2

Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple little things, but generally this looks great and like a solid and important improvement. I'll let @flooie make the final call though, once he's back.

tests.py Outdated
Comment on lines 136 to 146
with open_schema() as schema_f:
schema_data = schema_f.read()
schema = json.loads(schema_data)

try:
jsonschema.validate(
instance=instance,
schema=schema,
)
except jsonschema.ValidationError as e:
self.fail("JSON failed validation against schema")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole thing can be outdented, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

tests.py Outdated
Comment on lines 136 to 146
with open_schema() as schema_f:
schema_data = schema_f.read()
schema = json.loads(schema_data)

try:
jsonschema.validate(
instance=instance,
schema=schema,
)
except jsonschema.ValidationError as e:
self.fail("JSON failed validation against schema")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this validation would fail without a meaningful message. Is there a better way to do this that gives us something more about how the validation failed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm pretty sure that's possible. jsonschema.validate() typically barfs more useful stuff when it raises its exceptions. I'll look into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed, let me know what it does for you.

@@ -17440,7 +17426,7 @@
}
],
"name": "Decisions of the Federal Communications Commission",
"level": "",
"level": null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is philosophical, but generally I follow Django's approach to empty strings, which is:

In most cases, it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL.

(https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.Field.null)

I've argued about it before, but I think the convention makes enough sense that it's worth it just to be consistent throughout FLP. I guess I'd listen to opposing views, but I usually appreciate knowing that strings with no value are "" instead of either null or "".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something tells me courts-db is already using null in this way though....hrm. If that's the case, I feel ambivalent about changing all of courts-db to use "" instead of null, though I still kind of feel like it'd be worth it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof...I'm not well acquainted with Django-land, but that seems like a really weird convention. To me, empty string means empty string, not absence of data. How does that even really help? You're just doing a different check (if x == "" instead of if x is None).

In any event, for the level element here, it is constrained by an enum so you would never have to guess whether you're getting an empty string or a null. The enum in the schema tells you only one of those is valid. See: https://github.com/freelawproject/courts-db/pull/10/files#diff-11aaec965ced488b7af5aa03d35c580dR32-R43

Courts-DB was in fact using a lot of empty strings, but I changed them to nulls in a7297bc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any event, for the level element here, it is constrained by an enum so you would never have to guess whether you're getting an empty string or a null

That assumes you remember that there's a schema and what's in it. The nice thing about sticking with the convention is that you always know that "no data" is "", no matter where you are in the code base.

You're just doing a different check (if x == "" instead of if x is None)

In Django, you can almost always do x == "" precisely because the convention is there. Without it, you have to do x == "" or x is None, because "no data" can have multiple values.

If I win you over a bit, let's switch it back. If you feel like I'm wrong, and see no value in Django's approach, very well.

@mlissner
Copy link
Member

I also just noticed that black is failing you. You should be able to install it pretty easily, and then once you have it, you just do black . in your code directory, and it should do the right thing.

@anseljh
Copy link
Member Author

anseljh commented Jun 13, 2020

Tests are all green now, which is weird because I expected the validation test to fail with something like this because of #8:

AssertionError: JSON failed validation against schema: 'non-trial' is not one of [None, 'ag', 'appellate', 'bankruptcy', 'international', 'special', 'trial']

Failed validating 'enum' in schema['items']['properties']['type']:
    {'description': 'What kind of court it is, e.g., "appellate"',
     'enum': [None,
              'ag',
              'appellate',
              'bankruptcy',
              'international',
              'special',
              'trial'],
     'type': ['string', 'null']}

On instance[85]['type']:
    'non-trial'

----------------------------------------------------------------------
Ran 5 tests in 3.797s

FAILED (failures=1)

Looks like the culprit is the CI test configuration here, which isn't running that test: https://github.com/freelawproject/courts-db/blob/master/.github/workflows/tests.yml#L25

Is there a reason to limit what tests are run, or can we just make that bit:

    - name: Run tests
      run: |
        python tests.py

@mlissner
Copy link
Member

No reason, @anseljh. Want to fix the tests too?

@mlissner
Copy link
Member

(Or rather, fix the CI test config, is what I meant to say.)

@anseljh
Copy link
Member Author

anseljh commented Jun 20, 2020

Ok, I updated the CI config. Now the test I expect to fail is failing. Once we decide how to resolve #7, #8, and #9, it should be all green.

@CLAassistant
Copy link

CLAassistant commented Dec 18, 2021

CLA assistant check
All committers have signed the CLA.

@mlissner
Copy link
Member

Huh. I just turned on the CLA bot, but didn't realize it'd spam an actual IP lawyer. Ansel, maybe you have thoughts about it. I'm playing with it for this and a couple other smaller repos atm.

We should get this merged too. I thought it was merged ages ago!

@anseljh
Copy link
Member Author

anseljh commented Jan 19, 2022

Huh. I just turned on the CLA bot, but didn't realize it'd spam an actual IP lawyer. Ansel, maybe you have thoughts about it. I'm playing with it for this and a couple other smaller repos atm.

That's kind of awesome and funny. Happy to have a look at the CLA, but you'll probably need to remind me.

We should get this merged too. I thought it was merged ages ago!

Sure. We'll need to re-run the tests.

@mlissner
Copy link
Member

That's kind of awesome and funny. Happy to have a look at the CLA, but you'll probably need to remind me.

Take a look up thread, there's a link where you can agree to it.

@brianwc
Copy link

brianwc commented Jan 20, 2022 via email

@mlissner
Copy link
Member

mlissner commented Jan 20, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pull Requests
Development

Successfully merging this pull request may close these issues.

Add a JSON Schema
4 participants