Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task three #45

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
4 changes: 4 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[core]
remote = storage
['remote "storage"']
url = gdrive://13pwaK4fUnV7nnWltM3n4DdqjcU3wvkqQ
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
2 changes: 2 additions & 0 deletions Data/Climate/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
IPCC_ClimateZoneMap.ipynb

98 changes: 0 additions & 98 deletions Data/Climate/IPCC_ClimateZoneMap.ipynb

This file was deleted.

4 changes: 4 additions & 0 deletions Data/Climate/IPCC_ClimateZoneMap.ipynb.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 5277092d7a4a5c248a593a46493f1617
size: 4405
path: IPCC_ClimateZoneMap.ipynb
11 changes: 11 additions & 0 deletions Data/Soil/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@

/data
/metadata
/Global Soil Organic Carbon Density in kg Carbon_m2 to 1 meter depth.zip
/CONTENTS.txt
/TERMS OF USE.txt
/soilcarbon.ovr
/processed
/GlobalSoilOrganicCarbonDensityinkgCm_1mDepth.tif
/.botmKQAeq3934dqKuUWyps.tmp
/.kfLrvZdTQqnxpJRSCrHh96.tmp
4 changes: 2 additions & 2 deletions Data/Soil/GlobalSoilOrganicCarbonDensitykgCm2to1mdepth.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -72,7 +72,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.8.10"
}
},
"nbformat": 4,
Expand Down
4 changes: 4 additions & 0 deletions Data/Soil/soilcarbon.ovr.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 33956e4dd24c1caa1dcea956e85e1f5f
size: 131473
path: soilcarbon.ovr
46 changes: 46 additions & 0 deletions GlobalSoilOrganicCarbonDensityinkgCarbon_m2to1meterdepth.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Process Soil Data

on:
push:
branches:
- main

jobs:
update_soil_data:
name: Update Soil Data
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
cache: 'pip'

- name: Install Dependencies
run: |
sudo apt-get update
sudo apt-get install -y libgdal-dev gdal-bin
pip install numpy
pip install GDAL
pip install dvc[gdrive]

- name: Download and Process Data
run: |
wget -O soil_data.zip "https://databasin2-filestore.s3.amazonaws.com/a4cb6d367eae4e52a08902874f8bfedf/download/a4cb6d367eae4e52a08902874f8bfedf_1_zip_en.zip?Signature=ne2Aa6KK3wnmbjRWPPNV0TTecMw%3D&Expires=1679157927&AWSAccessKeyId=AKIAI4RK5BEPK3FCQPUQ"
unzip -o soil_data.zip -d Data/Soil
cd Data/Soil
python process.py
gdalwarp -to SRC_METHOD=NO_GEOTRANSFORM -s_srs EPSG:4326 -t_srs EPSG:4326 -tr 0.5 0.5 -r near -te -180.0 -90.0 180.0 90.0 -te_srs EPSG:4326 -of GTiff soilcarbon.ovr GlobalSoilOrganicCarbonDensityinkgCarbon_m2to1meterdepth.tif

- name: Commit and Push Changes
env:
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
git config --local user.email "[email protected]"
git config --local user.name "GitHub Actions"
cd Data/Soil
dvc add GlobalSoilOrganicCarbonDensityinkgCarbon_m2to1meterdepth.tif

23 changes: 23 additions & 0 deletions dvc.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
schema: '2.0'
stages:
clean:
cmd: rm -rf tmp_unzip_path
load:
cmd:
- dvc push
extract:
cmd:
- python scripts/extract.py
- python scripts/rename_files.py
transform:
cmd:
- "python scripts/transform.py --input tmp_unzip_path/data/commonData_Data0_soilcarbon.ovr\
\ \\\n --output Data/Soil/processed/GlobalSoilOrganicCarbonDensityinkgCm_1mDepth.tif"
deps:
- path: tmp_unzip_path/data/commonData_Data0_soilcarbon.ovr
md5: 33956e4dd24c1caa1dcea956e85e1f5f
size: 131473
outs:
- path: Data/Soil/processed/GlobalSoilOrganicCarbonDensityinkgCm_1mDepth.tif
md5: 96f78155b79a835f56d019586d4c1f14
size: 1038282
21 changes: 21 additions & 0 deletions dvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
stages:
extract:
cmd:
- python scripts/extract.py
- python scripts/rename_files.py
transform:
cmd:
- >-
python scripts/transform.py --input tmp_unzip_path/data/commonData_Data0_soilcarbon.ovr \
--output Data/Soil/processed/GlobalSoilOrganicCarbonDensityinkgCm_1mDepth.tif
deps:
- tmp_unzip_path/data/commonData_Data0_soilcarbon.ovr
outs:
- Data/Soil/processed/GlobalSoilOrganicCarbonDensityinkgCm_1mDepth.tif
load:
cmd:
- dvc push
clean:
cmd: rm -rf tmp_unzip_path


8 changes: 8 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
gdal-utils==3.5.0.0
jsonschema==4.3.3
jupyter==1.0.0
jupyter-client==5.3.5
jupyter-console==5.2.0
jupyter-core==4.10.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.0
32 changes: 32 additions & 0 deletions scripts/extract.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import argparse
import requests
import zipfile
import io
import os

URL = "https://databasin2-filestore.s3.amazonaws.com/a4cb6d367eae4e52a08902874f8bfedf/download/a4cb6d367eae4e52a08902874f8bfedf_1_zip_en.zip?Signature=148HP0SJFI49y7HmTmOaNMAlUDw%3D&Expires=1680448953&AWSAccessKeyId=AKIAI4RK5BEPK3FCQPUQ"

def ensure_url_is_accessible(URL):
r = requests.get(URL)
if not r.ok:
print("Download link expired. Please update download link")
else:
download_and_unzip_files(r.content)

def download_and_unzip_files(content):
current_directory = os.getcwd()
target_parent_dir = os.path.join(current_directory, r'tmp_unzip_path')
if not os.path.exists(target_parent_dir):
os.mkdir(target_parent_dir)
try:
z = zipfile.ZipFile(io.BytesIO(content))
z.extractall(target_parent_dir)
except Exception as e:
print(e)
else:
print("unzipped successfully")

ensure_url_is_accessible(URL)



12 changes: 12 additions & 0 deletions scripts/rename_files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import os

current_directory = os.getcwd()
target_parent_dir = os.path.join(current_directory, r'tmp_unzip_path/data')
if os.path.exists(target_parent_dir):
for file_name in os.listdir(target_parent_dir):
if '\\' in file_name:
old_file_name = os.path.join(target_parent_dir, file_name)
filename = os.fsdecode(file_name)
changed_name = filename.replace("\\", "_")
new_file_name = os.path.join(target_parent_dir, changed_name)
os.rename(old_file_name,new_file_name)
22 changes: 22 additions & 0 deletions scripts/transform.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import subprocess
import argparse
import os

parser = argparse.ArgumentParser()
parser.add_argument('--input', help="Directory of file to transform")
parser.add_argument('--output', help="Directory for transformed files")
args = vars(parser.parse_args())


def run_shell_cmd(cmd):
try:
p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
last_stdout_bytes, last_stderr_bytes = p.communicate()
if last_stdout_bytes:
return last_stdout_bytes.decode('utf-8', 'replace')
else:
return last_stderr_bytes
except Exception as e:
print(e)

run_shell_cmd("gdalwarp -s_srs EPSG:4326 -t_srs EPSG:4326 -to SRC_METHOD=NO_GEOTRANSFORM -tr 0.5 0.5 -r near -te -180.0 -90.0 180.0 90.0 -te_srs EPSG:4326 -of GTiff " + args.get('input') + " " + args.get('output'))