Skip to content

Commit

Permalink
Merge pull request #27 from couyang24/dev
Browse files Browse the repository at this point in the history
🚀  release for version 0.1.2
  • Loading branch information
couyang24 authored Apr 20, 2022
2 parents fe133f3 + 4d5a5d2 commit d92ce0a
Show file tree
Hide file tree
Showing 23 changed files with 349 additions and 86 deletions.
11 changes: 10 additions & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ jobs:
pkg-manager: pip
app-dir: ~/project/ # If you're requirements.txt isn't in the root directory.
# pip-dependency-file: test-requirements.txt # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.
- run: &install_packages
name: Install system packages
command: |
# pyspark needs to run Java, so we install openjdk.
sudo apt-get update
sudo apt-get install -y default-jre-headless
- run:
name: Install firelink
command: pip install .
Expand All @@ -47,4 +53,7 @@ workflows:
validation: # This is the name of the workflow, feel free to change it to better match your workflow.
# Inside the workflow, you define the jobs you want to run.
jobs:
- build
- build:
branches:
ignore:
- gh-pages
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Bug Report
description: Report incorrect behavior in the firelink library
title: "BUG: "
title: ":bug: "
labels: [Bug, Needs Triage]

body:
Expand All @@ -17,7 +17,7 @@ body:
[latest version](https://pypi.org/project/firelink/) of firelink.
required: true
- label: >
I have confirmed this bug exists on the main branch of pandas.
I have confirmed this bug exists on the main branch of firelink.
- type: textarea
id: example
attributes:
Expand Down
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/documentation_improvement.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
name: Documentation Improvement
description: Report wrong or missing documentation
title: "DOC: "
title: ":memo: "
labels: [Docs, Needs Triage]

body:
- type: checkboxes
attributes:
label: Pandas version checks
label: Firelink version checks
options:
- label: >
I have checked that the issue still exists on the latest versions of the docs
Expand Down
33 changes: 0 additions & 33 deletions .github/ISSUE_TEMPLATE/feature_request.md

This file was deleted.

67 changes: 67 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Feature Request
description: Suggest an idea for firelink library
title: ":sparkles: "
labels: [Enhancement, Needs Triage]

body:
- type: textarea
id: problem
attributes:
label: Identify the Problem
description: >
Please provide a description of what the problem is.
value: >
<details>
I wish I could use firelink to do [...]
</details>
validations:
required: true
- type: textarea
id: solution
attributes:
label: Feature Request
description: >
Please provide a description of what the solution you would like and try to write a docstring for the desired feature.
value: >
<details>
`firelink.pandas_transform` should get a new parameter `Agg` that [...]
</details>
validations:
required: true
- type: textarea
id: api
attributes:
label: Implications on API
description: >
Please provide a description of how this feature will affect the API.
validations:
required: true
- type: textarea
id: alternative
attributes:
label: Alternative Solution
description: >
Please provide a description of any alternative solutions or features you've considered.
validations:
required: true
- type: textarea
id: other
attributes:
label: Additional Information
description: >
Please provide any other context, code examples, or references to existing implementations about the feature request here.
placeholder: >
# Your code here, if applicable
...
render: python
validations:
required: true
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/installation_issue.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Installation Issue
description: Report issues installing the firelink library on your system
title: "BUILD: "
title: ":package: "
labels: [Build, Needs Triage]

body:
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/performance_issue.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Performance Issue
description: Report slow performance or memory issues when running firelink code
title: "PERF: "
title: ":zap: "
labels: [Performance, Needs Triage]

body:
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/submit_question.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Submit Question
description: Ask a general question about pandas
title: "QST: "
title: ":bulb: "
labels: [Usage Question, Needs Triage]

body:
Expand Down
20 changes: 18 additions & 2 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,15 @@ jobs:
fail-fast: false
matrix:
python-version: [3.7, 3.8, 3.9, '3.10']
spark-version: [3.0.3, 3.1.2, 3.2.0]
hadoop: [3.2]
include:
- python-version: 3.8
- python-version: 3.7
spark-version: 2.4.8
hadoop: 2.7
env:
PYTHON_VERSION: ${{ matrix.python-version }}
SPARK_VERSION: ${{ matrix.spark-version }}

steps:
- uses: actions/checkout@v2
Expand All @@ -30,10 +35,21 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

- name: Setup Java JDK
uses: actions/[email protected]
with:
java-version: 1.8

- name: Install Spark
run: |
wget -q -O spark.tgz https://archive.apache.org/dist/spark/spark-${{ matrix.spark-version }}/spark-${{ matrix.spark-version }}-bin-hadoop${{ matrix.hadoop }}.tgz
tar xzf spark.tgz
rm spark.tgz
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install pytest
python -m pip install pytest pytest-spark pypandoc
python -m pip install pyspark==${{ matrix.spark-version }}
python -m pip install .
- name: Test with pytest
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ spark_pipe
sdf = spark_pipe.fit_transform(sdf)
sdf.show()
add1 = Assign(**{"Country": "Canada"})
add2 = Assign(**{"City": "Toronto"})
add1 = Assign({"Country": "Canada"})
add2 = Assign({"City": "Toronto"})
pandas_pipe = FirePipeline([("Add Country", add1), ("Add City", add2)])
pandas_pipe.fit_transform(df)
Expand Down
1 change: 0 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
sys.path.insert(0, os.path.abspath(".."))
import sphinx_rtd_theme


# -- Project information -----------------------------------------------------

project = "Firelink"
Expand Down
4 changes: 4 additions & 0 deletions docs/refresher.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
rm setup.rst
rm modules.rst
rm firelink.rst
sphinx-apidoc -o . ..
37 changes: 37 additions & 0 deletions docs/tests.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
tests package
=============

Submodules
----------

tests.conftest module
---------------------

.. automodule:: tests.conftest
:members:
:undoc-members:
:show-inheritance:

tests.test\_pandas\_transform module
------------------------------------

.. automodule:: tests.test_pandas_transform
:members:
:undoc-members:
:show-inheritance:

tests.test\_spark\_transform module
-----------------------------------

.. automodule:: tests.test_spark_transform
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: tests
:members:
:undoc-members:
:show-inheritance:
6 changes: 4 additions & 2 deletions firelink/pandas_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ class Astype(Firstflame):
def __init__(self, dtype, copy=True, errors="raise"):
self.dtype = dtype
self.copy = copy
self.errors = erros
self.errors = errors

def transform(self, X, y=None):
"""transform"""
return X.astype(self.type, self.copy, self.errors)
return X.astype(self.dtype, self.copy, self.errors)


class Apply(Firstflame):
Expand Down Expand Up @@ -108,6 +108,8 @@ def __init__(
self.axis = axis
self.level = level
self.as_index = as_index
self.sort = (sort,)
self.group_keys = (group_keys,)
self.squeeze = squeeze
self.observed = observed
self.dropna = dropna
Expand Down
7 changes: 5 additions & 2 deletions firelink/spark_transform.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

from firelink.fire import Firstflame
from pyspark.sql import SparkSession, functions as F


class WithColumn(Firstflame):
"""with column"""
Expand Down Expand Up @@ -35,5 +38,5 @@ def __init__(self, col, new_col, val, fill, result):
def transform(self, X, y=None):
return X.withColumn(
self.new_col,
when(X[self.col].isin(self.val), self.result).otherwise(self.fill),
F.when(X[self.col].isin(self.val), self.result).otherwise(self.fill),
)
24 changes: 15 additions & 9 deletions playground/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,15 @@


import catboost as cgb
import firelink
import lightgbm as lgb

# +
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas.testing import assert_frame_equal
import seaborn as sns
import xgboost as xgb
from firelink.fire import Firstflame
from firelink.pipeline import FirePipeline
from firelink.pandas_transform import Drop_duplicates, Filter
from pandas.testing import assert_frame_equal
from sklearn import set_config
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.compose import ColumnTransformer
Expand All @@ -60,6 +56,11 @@
from sklearn.svm import SVC, LinearSVC
from sklearn.tree import DecisionTreeClassifier

import firelink
from firelink.fire import Firstflame
from firelink.pandas_transform import Drop_duplicates, Filter
from firelink.pipeline import FirePipeline

# %load_ext autoreload
# %autoreload 2
# -
Expand Down Expand Up @@ -136,7 +137,10 @@ def transform(self, X):
categorical_transformer = FirePipeline(
steps=[
("cimputer", SimpleImputer(strategy="most_frequent")),
("ordinalencoder", OrdinalEncoder()),
(
"ordinalencoder",
OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1),
),
("nimputer", SimpleImputer(strategy="median")),
("scaler", StandardScaler()),
]
Expand Down Expand Up @@ -272,9 +276,11 @@ def transform(self, X):

# ## Spark Transformation

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

from firelink.spark_transform import WithColumn
from firelink.transform import Assign
from pyspark.sql import SparkSession, functions as F

spark = SparkSession.builder.appName("spark_session").enableHiveSupport().getOrCreate()

Expand All @@ -292,8 +298,8 @@ def transform(self, X):
sdf = spark_pipe.fit_transform(sdf)
sdf.show()

add1 = Assign(**{"Country": "Canada"})
add2 = Assign(**{"City": "Toronto"})
add1 = Assign({"Country": "Canada"})
add2 = Assign({"City": "Toronto"})
pandas_pipe = FirePipeline([("Add Country", add1), ("Add City", add2)])

pandas_pipe.fit_transform(df)
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ matplotlib
numpy
pandas>=1.0.0,<=1.4.2
pre-commit
pyspark
pytest
scikit-learn
xgboost
Loading

0 comments on commit d92ce0a

Please sign in to comment.