Skip to content

Commit

Permalink
Merge pull request #3 from rajagurunath/feature/delta-rs
Browse files Browse the repository at this point in the history
Feature/delta rs
  • Loading branch information
rajagurunath authored Oct 14, 2021
2 parents 64319d2 + c7d004c commit 7cba6ff
Show file tree
Hide file tree
Showing 9 changed files with 309 additions and 364 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ jobs:
- name: Run tests
run: python -m pytest --junitxml=junit/test-results.xml --cov-report=xml tests

# - name: Setup tmate session
# if: ${{ failure() }}
# uses: mxschmitt/action-tmate@v3

- name: Upload pytest test results
uses: actions/upload-artifact@v1
with:
Expand Down
35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ To Try out the package:
pip install dask-deltatable
```

Features:
### Features:
1. Reads the parquet files based on delta logs parallely using dask engine
2. Supports all three filesystem like s3, azurefs, gcsfs
3. Supports some delta features like
Expand All @@ -17,3 +17,36 @@ Features:
- parquet filters
- row filter
- partition filter
4. Query Delta commit info - History
5. vacuum the old/ unused parquet files
6. load different versions of data using datetime.

### Usage:

```
import dask_deltatable as ddt
# read delta table
ddt.read_delta_table("delta_path")
# read delta table for specific version
ddt.read_delta_table("delta_path",version=3)
# read delta table for specific datetime
ddt.read_delta_table("delta_path",datetime="2018-12-19T16:39:57-08:00")
# read delta complete history
ddt.read_delta_history("delta_path")
# read delta history upto given limit
ddt.read_delta_history("delta_path",limit=5)
# read delta history to delete the files
ddt.vacuum("delta_path",dry_run=False)
# Can read from S3,azure,gcfs etc.
ddt.read_delta_table("s3://bucket_name/delta_path",version=3)
# please ensure the credentials are properly configured as environment variable or
# configured as in ~/.aws/credential
```
2 changes: 1 addition & 1 deletion dask_deltatable/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from .core import read_delta_table
from .core import read_delta_history, read_delta_table, vacuum
Loading

0 comments on commit 7cba6ff

Please sign in to comment.