Skip to content

Commit

Permalink
Updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
calum-mcg committed Jul 9, 2023
1 parent a878742 commit 8f71d0b
Showing 1 changed file with 70 additions and 11 deletions.
81 changes: 70 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,74 @@
Welcome to your new dbt project!
# dbt-fuzzy-text

### Using the starter project
Macros that help with fuzzy text matching, with the aim of keeping dbt models agnostic of data warehouses.

Try running the following commands:
- dbt run
- dbt test
Current coverage:
| Algorithm | Snowflake | BigQuery |
| :--- | :----: | ---: |
| _Edit distance based_ |
| Levenshtein Distance | ✔️ | ✔️ |
| Jaro-Winkler Similarity | ✔️ ||

# Installation instructions

### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers
- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support
- Find [dbt events](https://events.getdbt.com) near you
- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices
New to dbt packages? Read more about them [here](https://docs.getdbt.com/docs/building-a-dbt-project/package-management/).

1. Include this package in your `packages.yml` file — check [here](https://hub.getdbt.com/dbt-labs/calum-mcg/latest/) for the latest version number:

```yml
packages:
- package: calum-mcg/dbt-fuzzy-text
version: X.X.X ## update to latest version here
```
2. Run `dbt deps` to install the package.

# Macros

## levenshtein_distance ([source](macros/levenshtein.sql))

This macro generates the levenshtein distance between two strings.

### Arguments

- `str1` (required): First string to compare
- `str2` (required): Second string to compare
- `max` (optional, default=none): Maximum distance to compute (integer)

### Usage:

Copy the macro into a statement tab in the dbt Cloud IDE, or into a model, and compile your code

```
... {{ fuzzy_text.levenshtein_distance('input_string_column', 'comparison_string_column') }} as levenshtein_distance ...
```
## jaro_winkler ([source](macros/jaro_winkler.sql))
This macro generates the Jaro-Winkler similarity between two strings.
### Arguments
- `str1` (required): First string to compare
- `str2` (required): Second string to compare
### Usage:
Copy the macro into a statement tab in the dbt Cloud IDE, or into a model, and compile your code
```
... {{ fuzzy_text.jaro_winkler('input_string_column', 'comparison_string_column') }} as jaro_winkler ...
```
## Contribution Guidelines
Pull requests are the best way to propose changes to the codebase. Steps required:
1. Create an issue in the repo with a description of the problem / bug / improvement required
2. Clone the `main` branch with a suitable branch name, e.g. `feature/add-cool-thing`
3. Add tests for supported adaptors in the `integration_tests` folder
4. If required, update the README documentation to include usage and an example.
5. Issue a pull request, provide:
- a description of changes
- add a reviewer
- reference original issue (from step 1)

0 comments on commit 8f71d0b

Please sign in to comment.