Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dbt list --output json erroring with Object of type date is not JSON serializable #10556

Closed
2 tasks done
jeremyyeo opened this issue Aug 12, 2024 · 15 comments · Fixed by #11187
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Aug 12, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Doing a list with json output results in an error Object of type date is not JSON serializable.

Expected Behavior

No error?

Steps To Reproduce

  1. dbt project setup.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
   my_dbt_project:
      +materialized: table

# models/schema.yml
models:
  - name: foo
    tests:
      - my_test
-- models/foo.sql
select 1 id

-- tests/generic/my_test.sql
{% test my_test(model) -%}
  {{ config(start_date=modules.datetime.date(2024, 5, 1)) }}
  select '{{ config.get("start_date") }}' as c
{% endtest %}
  1. Do a list with json output on 1.8:
$ dbt list --output json
10:07:24  Running with dbt=1.8.5
10:07:24  Registered adapter: postgres=1.8.2
10:07:24  Unable to do partial parsing because saved manifest not found. Starting full parse.
{"name": "foo", "resource_type": "model", "package_name": "my_dbt_project", "original_file_path": "models/foo.sql", "unique_id": "model.my_dbt_project.foo", "alias": "foo", "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "group": null, "materialized": "table", "incremental_strategy": null, "persist_docs": {}, "post-hook": [], "pre-hook": [], "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "on_configuration_change": "apply", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "contract": {"enforced": false, "alias_types": true}, "access": "protected"}, "tags": [], "depends_on": {"macros": [], "nodes": []}}
10:07:25  Encountered an error:
Object of type date is not JSON serializable
10:07:25  Traceback (most recent call last):
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 138, in wrapper
    result, success = func(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 101, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 218, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 247, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 294, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 332, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/cli/main.py", line 507, in list
    results = task.run()
              ^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/list.py", line 172, in run
    return self.output_results(generator())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/list.py", line 176, in output_results
    for result in results:
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/list.py", line 139, in generate_json
    yield json.dumps(
          ^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type date is not JSON serializable
  1. Do a list on 1.7:
$ dbt --version
Core:
  - installed: 1.7.17
  - latest:    1.8.5  - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.7.16 - Update available!
  - snowflake:  1.7.5  - Update available!
  - redshift:   1.7.7  - Update available!
  - postgres:   1.7.17 - Update available!
  - spark:      1.7.1  - Update available!

$ dbt list --output json

10:14:14  Running with dbt=1.7.17
10:14:14  Registered adapter: postgres=1.7.17
10:14:14  Unable to do partial parsing because saved manifest not found. Starting full parse.
10:14:15  Found 1 model, 1 test, 0 sources, 0 exposures, 0 metrics, 402 macros, 0 groups, 0 semantic models
{"name": "foo", "resource_type": "model", "package_name": "my_dbt_project", "original_file_path": "models/foo.sql", "unique_id": "model.my_dbt_project.foo", "alias": "foo", "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "group": null, "materialized": "table", "incremental_strategy": null, "persist_docs": {}, "post-hook": [], "pre-hook": [], "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "on_configuration_change": "apply", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "contract": {"enforced": false, "alias_types": true}, "access": "protected"}, "tags": [], "depends_on": {"macros": [], "nodes": []}}
10:14:15  Encountered an error:
Object of type date is not JSON serializable
10:14:15  Traceback (most recent call last):
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 91, in wrapper
    result, success = func(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 76, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 169, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 198, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 245, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/cli/requires.py", line 278, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/cli/main.py", line 556, in list
    results = task.run()
              ^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/task/list.py", line 163, in run
    return self.output_results(generator())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/task/list.py", line 167, in output_results
    for result in results:
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.7.latest/lib/python3.11/site-packages/dbt/task/list.py", line 130, in generate_json
    yield json.dumps(
          ^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/.pyenv/versions/3.11.9/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type date is not JSON serializable

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11.9
- dbt: dbt-core==1.8.5 dbt-postgres==1.8.2

Which database adapter are you using with dbt?

postgres, snowflake

Additional Context

Originally raised: https://getdbt.slack.com/archives/CBSQTAPLG/p1722637389979709

There's an interesting thing happening with dbt Cloud though... when on 1.7, we don't emit an error

image

But when users go to versionless:

image

We can see from repro above that on 1.7, we can also trigger this error if we do a json output... so it's not specifically a 1.8 thing - but rather a 1.8/versionless thing when in dbt Cloud.

We can trigger this error on 1.7 in cloud like so:

image

Couple of similar oldies:
#2373
#5357

@jeremyyeo jeremyyeo added bug Something isn't working triage labels Aug 12, 2024
@aranke
Copy link
Member

aranke commented Aug 12, 2024

@jeremyyeo Summarizing your findings, is my understanding correct?

version environment json error
1.7 cloud
1.8/versionless cloud
1.7 cloud

We think we have a fix for 1.8, can you confirm if this issue occurs in 1.7 + core? Thanks!

@jeremyyeo
Copy link
Contributor Author

jeremyyeo commented Aug 12, 2024

I think I'd put it this way:

On dbt-core 1.6, 1.7, 1.8 using dbt-core CLI - this errors whenever we explicitly issue a command to output json

dbt ls --output json

In dbt Cloud - on 1.7:

image

^ There is no initial error.

image

^ Up until you try to do what we did in core CLI - invoke a command and explicitly set --output json.

In dbt Cloud - on versionless (1.8+):

It immediately errored (screenshot copied from above when I opened the issue - see note below):

image

Which is blocking any further operation - users cant run anything.

There's something special about versionless in Cloud attempting to do json things behind the scenes - that isn't done on 1.7.


Note: As of this writing though - this is no longer the case on versionless:

image

Likely cause of the reversion of the streaming manifest internal PR.


Essentially:

  1. This is an error that has existed for a while (1.6, 1.7, 1.8) in the dbt-core lib.
  2. This error only presented itself when you tried to output json explicitly (not sure what else does serialization to json).
  3. In dbt Cloud on 1.7, there was no behind-the-scenes-magic that tried to do (2).
  4. In dbt Cloud on versionless (core 1.8+), we added some behind-the-scenes-magic that tried to do (2) - therefor users ran into errors.
  • behind-the-scenes-magic: Some operations specific to dbt Cloud which I know not off.

@dbeatty10
Copy link
Contributor

Here's the same reproducible example supplied by @jeremyyeo originally, but simplified slightly:

Reprex

models/foo.sql

{{ config(
    start_date=modules.datetime.date(2024, 5, 1),
)
}}

select '{{ config.get("start_date") }}' as c
dbt list --output json
...
TypeError: Object of type date is not JSON serializable

Workaround

But if the start_date is cast to a string (using the Jinja string filter), then it works just fine:

    start_date=modules.datetime.date(2024, 5, 1) | string,
dbt list --output json
{"name": "foo", "resource_type": "model", "package_name": "my_project", "original_file_path": "models/foo.sql", "unique_id": "model.my_project.foo", "alias": "foo", "config": {"enabled": true, "alias": null, "schema": null, "database": null, "tags": [], "meta": {}, "group": null, "materialized": "view", "incremental_strategy": null, "persist_docs": {}, "post-hook": [], "pre-hook": [], "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "on_configuration_change": "apply", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "contract": {"enforced": false, "alias_types": true}, "access": "protected", "start_date": "2024-05-01"}, "tags": [], "depends_on": {"macros": [], "nodes": []}}

@dbeatty10 dbeatty10 removed the triage label Aug 13, 2024
@gshank
Copy link
Contributor

gshank commented Aug 14, 2024

This is setting a custom field in the node config, correct? So it just goes into the _extra dictionary, and since there is no type associated with those dictionary keys, it's not serializable. I think the possibilities here are 1) cast everything that's set in an extra key to string, or 2) raise an error when somebody sets an extra key to anything except a string.

@gshank
Copy link
Contributor

gshank commented Aug 14, 2024

Note that the same is true of the "meta" dictionary.

@dbeatty10
Copy link
Contributor

@gshank What are we doing right now with the "meta" dictionary? Does it have the same issue as reported here? Or are we 1) casting everything to a string or 2) raising an error for non-strings?

@gshank
Copy link
Contributor

gshank commented Aug 20, 2024

We don't do anything different for meta, just like extra keys. I think that people might be less likely to set it in the sql file though? Because it would be difficult or impossible to get objects set via yaml, so this is mostly an issue from the greater flexibility that people have in setting the keys in jinja.

@dbeatty10
Copy link
Contributor

@gshank Per your options in #10556 (comment), I'd advocate for:

  1. raise an error when somebody sets an extra key to anything except a string

That way there isn't any surprising gotchas due to us doing an implicit type cast to string. Rather, we'd ask the user to do an explicit type cast of their choosing.

We could do an error message like this:

Key SOME_KEY in file some/file/path.sql is not JSON serializable. Cast it to a serializable type (e.g. string).

@jeremyyeo
Copy link
Contributor Author

  1. raise an error when somebody sets an extra key to anything except a string

If we do this - hopefully it not an exception and potentially breaks users existing runs. Can already predict having to explain to existing customers who have this pattern on versionless that we accidentally broke their runs - and then them asking us what about our promise of versionless not breaking runs.

Just thought I'd call this out while ya'll figure out the best path forward :)

@dbeatty10
Copy link
Contributor

If we do this - hopefully it not an exception and potentially breaks users existing runs.

💯 @jeremyyeo -- thanks for calling this out.

Maybe we could just catch errors like TypeError: Object of type <type> is not JSON serializable and re-raise with a targeted error message.

What do you think @gshank?

@StanleySane
Copy link

Actually, to reproduce this error it's not necessary to input Python-like code like modules.datetime.date() etc.
The YAML specification allows implicit conversion of some strings into date and datetime instance.

Assuming I have a model error_demostration.sql:

{% if execute %}
    {% for var_name in ['SOME_VAR', 'ANOTHER_VAR'] %}
        {{ print(var_name ~ " value: " ~ var(var_name)) }}
        {{ print(var_name ~ " repr: " ~ "{0!r}".format(var(var_name))) }}
    {% endfor %}
{% endif %}

This model just prints out two required variable values (SOME_VAR and ANOTHER_VAR) in two ways: simple string value (like python str method) and wide string representation (like python repr method) which helps to recognize real underlying type of the value (just for illustration purposes).

Now I use dbt Core dbt compile command with --vars option, which, according to the documentation, represents a YAML string:

PS C:\dev\dbt> dbt compile --select error_demostration --vars '{SOME_VAR: 2024-01-01, ANOTHER_VAR: "2024-01-01"}'  
20:59:39  Running with dbt=1.8.5
20:59:40  Registered adapter: postgres=1.8.2
...
20:59:43  
20:59:44  Concurrency: 1 threads (target='dev')
20:59:44  
SOME_VAR value: 2024-01-01
SOME_VAR repr: datetime.date(2024, 1, 1)
ANOTHER_VAR value: 2024-01-01
ANOTHER_VAR repr: datetime.date(2024, 1, 1)
20:59:44  Compiled node 'error_demostration' is:

As you can see, both variables have been converted into datetime.date instances, despite one of them being enclosed in quotes in command line.
To force the parser treat them as strings, you need to use YAML tags:

PS C:\dev\dbt> dbt compile --select error_demostration --vars '{SOME_VAR: !!str 2024-01-01, ANOTHER_VAR: "2024-01-01"}'
21:06:35  Running with dbt=1.8.5
21:06:36  Registered adapter: postgres=1.8.2
...
21:06:39  
21:06:40  Concurrency: 1 threads (target='dev')
21:06:40  
SOME_VAR value: 2024-01-01
SOME_VAR repr: '2024-01-01'
ANOTHER_VAR value: 2024-01-01
ANOTHER_VAR repr: datetime.date(2024, 1, 1)
21:06:40  Compiled node 'error_demostration' is:

Note the added !!str token in the --vars option.

Now SOME_VAR value is pure string.

Finally, we can reproduce the issue - we just need to forget to pass a value for the required variable (ANOTHER_VAR in my case):

PS C:\dev\dbt> dbt compile --select error_demostration --vars '{SOME_VAR: "2024-01-01"}'                             
21:16:06  Running with dbt=1.8.5
21:16:06  Registered adapter: postgres=1.8.2
...
21:16:10  
21:16:10  Concurrency: 1 threads (target='dev')
21:16:10  
SOME_VAR value: 2024-01-01
SOME_VAR repr: datetime.date(2024, 1, 1)
21:16:10  Unhandled error while executing 
Object of type date is not JSON serializable
21:16:10  Encountered an error:
Runtime Error
  Object of type date is not JSON serializable

As you can see, I never indicated that I was passing a date. The YAML parser itself understood that the string "2024-01-01" is a date.
To bypass this error we need to explicitly mark string as string:

PS C:\dev\dbt> dbt compile --select error_demostration --vars '{SOME_VAR: !!str 2024-01-01}'
21:21:15  Running with dbt=1.8.5
21:21:15  Registered adapter: postgres=1.8.2
...
21:21:19  
21:21:19  Concurrency: 1 threads (target='dev')
21:21:19  
SOME_VAR value: 2024-01-01
SOME_VAR repr: '2024-01-01'
21:21:19  Encountered an error:
Runtime Error
  Compilation Error in model error_demostration (models\_experiments\error_demostration.sql)
    Required var 'ANOTHER_VAR' not found in config:
    Vars supplied to error_demostration = {
        "SOME_VAR": "2024-01-01"
    }

Now we see the expected error message (about missing required var ANOTHER_VAR).

@owenprough-sift
Copy link

This error was driving me crazy, but @StanleySane 's workaround of forcing YAML string tag in --vars worked for me. Thanks!

@internetcoffeephone
Copy link
Contributor

internetcoffeephone commented Dec 4, 2024

I'm experiencing this error using dbt-redshift===1.9.0rc1 specifically with the new incremental microbatch feature.
Having defined in a model.sql:

{{
    config(
        materialized="incremental",
        incremental_strategy='microbatch',
        begin=var('start_date'),
        ...
    )
}}

and in dbt_project.yml:

vars:
  start_date: "2012-01-01"

The error appears when running dbt list --output json.

Casting the variable to string with jinja templating does not work however.

@tuantran0910
Copy link

tuantran0910 commented Dec 20, 2024

@internetcoffeephone Did you find any solution for this ? I am facing this problem too

@internetcoffeephone
Copy link
Contributor

@tuantran0910 this was solved in #11187

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants