Skip to content

Commit

Permalink
add support for running mul invocations in parallel (#4909)
Browse files Browse the repository at this point in the history
this pr clarifies that the Cloud CLI now support running multiple
invocations in parallel. This is based on @dichenqiandbt 's demo.

Before that cloud CLI only supports run one invocation at one time.

This pr has grown to also address parallel execution, what it means,
where it's supported, and modify the current dbt commands table to
further explain this.

Resolves #4952
  • Loading branch information
mirnawong1 authored Feb 21, 2024
2 parents ee63228 + 23b7cd3 commit ef182f4
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 37 deletions.
10 changes: 6 additions & 4 deletions website/docs/docs/cloud/configure-cloud-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Once you install the dbt Cloud CLI, you need to configure it to connect to a dbt

With your repo recloned, you can add, edit, and sync files with your repo.

### Set environment variables
## Set environment variables

To set environment variables in the dbt Cloud CLI for your dbt project:

Expand All @@ -94,9 +94,11 @@ To set environment variables in the dbt Cloud CLI for your dbt project:

## Use the dbt Cloud CLI

- The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details.
- It allows you to automatically defer build artifacts to your Cloud project's production environment.
- It also supports [project dependencies](/docs/collaborate/govern/project-dependencies), which allows you to depend on another project using the metadata service in dbt Cloud.
The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details. With the dbt Cloud CLI, you can:

- Run [multiple invocations in parallel](/reference/dbt-commands) and ensure [safe parallelism](/reference/dbt-commands#parallel-execution), which is currently not guaranteed by `dbt-core`.
- Automatically defers build artifacts to your Cloud project's production environment.
- Supports [project dependencies](/docs/collaborate/govern/project-dependencies), which allows you to depend on another project using the metadata service in dbt Cloud.
- Project dependencies instantly connect to and reference (or `ref`) public models defined in other projects. You don't need to execute or analyze these upstream models yourself. Instead, you treat them as an API that returns a dataset.

:::tip Use the <code>--help</code> flag
Expand Down
83 changes: 50 additions & 33 deletions website/docs/reference/dbt-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,45 +5,62 @@ title: "dbt Command reference"
You can run dbt using the following tools:

- In your browser with the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud)
- On the command line interface using the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) or open-source [dbt Core](/docs/core/installation-overview), both of which enable you to execute dbt commands. The key distinction is the dbt Cloud CLI is tailored for dbt Cloud's infrastructure and integrates with all its [features](/docs/cloud/about-cloud/dbt-cloud-features).
- On the command line interface using the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) or open-source [dbt Core](/docs/core/installation-overview).

A key distinction with the tools mentioned, is that dbt Cloud CLI and IDE are designed to support safe parallel execution of dbt commands, leveraging dbt Cloud's infrastructure and its comprehensive [features](/docs/cloud/about-cloud/dbt-cloud-features). In contrast, `dbt-core` _doesn't support_ safe parallel execution for multiple invocations in the same process. Learn more in the [parallel execution](#parallel-execution) section.

The following sections outline the commands supported by dbt and their relevant flags. For information about selecting models on the command line, consult the docs on [Model selection syntax](/reference/node-selection/syntax).
## Parallel execution

### Available commands
dbt Cloud allows for parallel execution of commands, enhancing efficiency without compromising data integrity. This enables you to run multiple commands at the same time, however it's important to understand which commands can be run in parallel and which can't.

<VersionBlock firstVersion="1.6">
In contrast, [`dbt-core` _doesn't_ support](/reference/programmatic-invocations#parallel-execution-not-supported) safe parallel execution for multiple invocations in the same process, and requires users to manage concurrency manually to ensure data integrity and system stability.

To ensure your dbt workflows are both efficient and safe, you can run different types of dbt commands at the same time (in parallel) &mdash; for example, `dbt build` (write operation) can safely run alongside `dbt parse` (read operation) at the same time. However, you can't run `dbt build` and `dbt run` (both write operations) at the same time.

dbt commands can be `read` or `write` commands:

All commands in the table are compatible with either the dbt Cloud IDE, dbt Cloud CLI, or dbt Core.

You can run dbt commands in your specific tool by prefixing them with `dbt`. For example, to run the `test` command, type `dbt test`.

| Command | Description | Compatible tools | <div style={{width:'220px'}}>Version</div> |
| ------- | ----------- | ---------------- | ------- |
| [build](/reference/commands/build) | Build and test all selected resources (models, seeds, snapshots, tests) | All | All [supported versions](/docs/dbt-versions/core) |
| cancel | Cancels the most recent invocation.| dbt Cloud CLI | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [clean](/reference/commands/clean) | Deletes artifacts present in the dbt project | All | All [supported versions](/docs/dbt-versions/core) |
| [clone](/reference/commands/clone) | Clone selected models from the specified state | All | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [compile](/reference/commands/compile) | Compiles (but does not run) the models in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [debug](/reference/commands/debug) | Debugs dbt connections and projects | dbt Cloud IDE <br /> dbt Core | All [supported versions](/docs/dbt-versions/core) |
| [deps](/reference/commands/deps) | Downloads dependencies for a project | All | All [supported versions](/docs/dbt-versions/core) |
| [docs](/reference/commands/cmd-docs) | Generates documentation for a project | All | All [supported versions](/docs/dbt-versions/core) |
| [environment](/reference/commands/dbt-environment) | Enables you to interact with your dbt Cloud environment. | dbt Cloud CLI | Requires [dbt v1.5 or higher](/docs/dbt-versions/core) |
| help | Displays help information for any command | dbt Core <br /> dbt Cloud CLI | All [supported versions](/docs/dbt-versions/core) |
| [init](/reference/commands/init) | Initializes a new dbt project | dbt Core | All [supported versions](/docs/dbt-versions/core) |
| [list](/reference/commands/list) | Lists resources defined in a dbt project | All | All [supported versions](/docs/dbt-versions/core) |
| [parse](/reference/commands/parse) | Parses a project and writes detailed timing info | All | All [supported versions](/docs/dbt-versions/core) |
| reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | dbt Cloud CLI | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [retry](/reference/commands/retry) | Retry the last run `dbt` command from the point of failure | All | Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [run](/reference/commands/run) | Runs the models in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [run-operation](/reference/commands/run-operation) | Invoke a macro, including running arbitrary maintenance SQL against the database | All | All [supported versions](/docs/dbt-versions/core) |
| [seed](/reference/commands/seed) | Loads CSV files into the database | All | All [supported versions](/docs/dbt-versions/core) |
| [show](/reference/commands/show) | Preview table rows post-transformation | All | All [supported versions](/docs/dbt-versions/core) |
| [snapshot](/reference/commands/snapshot) | Executes "snapshot" jobs defined in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [source](/reference/commands/source) | Provides tools for working with source data (including validating that sources are "fresh") | All | All [supported versions](/docs/dbt-versions/core) |
| [test](/reference/commands/test) | Executes tests defined in a project | All | All [supported versions](/docs/dbt-versions/core) |
| [--version](/reference/commands/version) | Displays the currently installed version of dbt CLI | dbt Core <br /> dbt Cloud CLI | All [supported versions](/docs/dbt-versions/core) |
| Command type | Description | <div style={{width:'200px'}}>Example</div> |
|------|-------------|---------|
| **Write** | These commands perform actions that change data or metadata in your data platform.<br /><br /> Limited to one invocation at any given time, which prevents any potential conflicts, such as overwriting the same table in your data platform at the same time. | `dbt build`<br />`dbt run` |
| **Read** | These commands involve operations that fetch or read data without making any changes to your data platform.<br /><br /> Can have multiple invocations in parallel and aren't limited to one invocation at any given time. This means read commands can run in parallel with other read commands and a single write command.| `dbt parse`<br />`dbt compile`|

## Available commands

<VersionBlock firstVersion="1.6">

The following sections outline the commands supported by dbt and their relevant flags. They are available in all tools and all [supported versions](/docs/dbt-versions/core) unless noted otherwise. You can run these commands in your specific tool by prefixing them with `dbt` &mdash; for example, to run the `test` command, type `dbt test`.

For information about selecting models on the command line, refer to [Model selection syntax](/reference/node-selection/syntax).

Commands marked with a '❌' indicate write commands and those with a checkmark ('✅') indicate read commands.

| Command | Description | Parallel execution | <div style={{width:'250px'}}>Caveats</div> |
|---------|-------------| :-----------------:| ------------------------------------------ |
| [build](/reference/commands/build) | Build and test all selected resources (models, seeds, snapshots, tests) || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| cancel | Cancels the most recent invocation. | N/A | dbt Cloud CLI <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [clean](/reference/commands/clean) | Deletes artifacts present in the dbt project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [clone](/reference/commands/clone) | Clone selected models from the specified state || All tools <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [compile](/reference/commands/compile) | Compiles (but does not run) the models in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [debug](/reference/commands/debug) | Debugs dbt connections and projects || dbt Cloud IDE, dbt Core <br /> All [supported versions](/docs/dbt-versions/core) |
| [deps](/reference/commands/deps) | Downloads dependencies for a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [docs](/reference/commands/cmd-docs) | Generates documentation for a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [environment](/reference/commands/dbt-environment) | Enables you to interact with your dbt Cloud environment. | N/A | dbt Cloud CLI <br /> Requires [dbt v1.5 or higher](/docs/dbt-versions/core) |
| help | Displays help information for any command | N/A | dbt Core, dbt Cloud CLI <br /> All [supported versions](/docs/dbt-versions/core) |
| [init](/reference/commands/init) | Initializes a new dbt project || dbt Core<br /> All [supported versions](/docs/dbt-versions/core) |
| [list](/reference/commands/list) | Lists resources defined in a dbt project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [parse](/reference/commands/parse) | Parses a project and writes detailed timing info || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | N/A | dbt Cloud CLI <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [retry](/reference/commands/retry) | Retry the last run `dbt` command from the point of failure || All tools <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [run](/reference/commands/run) | Runs the models in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [run-operation](/reference/commands/run-operation) | Invoke a macro, including running arbitrary maintenance SQL against the database | Write || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [seed](/reference/commands/seed) | Loads CSV files into the database || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [show](/reference/commands/show) | Preview table rows post-transformation || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [snapshot](/reference/commands/snapshot) | Executes "snapshot" jobs defined in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [source](/reference/commands/source) | Provides tools for working with source data (including validating that sources are "fresh") || All tools<br /> All [supported versions](/docs/dbt-versions/core) |
| [test](/reference/commands/test) | Executes tests defined in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [--version](/reference/commands/version) | Displays the currently installed version of dbt CLI | N/A | dbt Core, dbt Cloud CLI <br /> All [supported versions](/docs/dbt-versions/core) | <br />

Note that some have "N/A" since they're not relevant to the parallelization of dbt commands.
</VersionBlock>

<VersionBlock lastVersion="1.5">
Expand Down
9 changes: 9 additions & 0 deletions website/docs/reference/programmatic-invocations.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,15 @@ for r in res.result:
print(f"{r.node.name}: {r.status}")
```

## Parallel execution not supported

[`dbt-core`](https://pypi.org/project/dbt-core/) doesn't support [safe parallel execution](/reference/dbt-commands#parallel-execution) for multiple invocations in the same process. This means it's not safe to run multiple dbt commands at the same time. It's officially discouraged and requires a wrapping process to handle sub-processes. This is because:

- Running simultaneous commands can unexpectedly interact with the data platform. For example, running `dbt run` and `dbt build` for the same models simultaneously could lead to unpredictable results.
- Each `dbt-core` command interacts with global Python variables. To ensure safe operation, commands need to be executed in separate processes, which can be achieved using methods like spawning processes or using tools like Celery.

To run [safe parallel execution](/reference/dbt-commands#available-commands), you can use the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) or [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), both of which does that additional work to manage concurrency (multiple processes) on the your behalf.

## `dbtRunnerResult`

Each command returns a `dbtRunnerResult` object, which has three attributes:
Expand Down

0 comments on commit ef182f4

Please sign in to comment.