Merge branch 'time-series'

# Conflicts: # url_handlers/basic_stats.py # url_handlers/coplots_pl.py # url_handlers/histogram.py # url_handlers/scatter_plot.py # webserver.py
dieterich-lab · Jun 4, 2021 · e424c6f · e424c6f
2 parents 4f31c6f + 294ef5d
commit e424c6f
Show file tree

Hide file tree

Showing 116 changed files with 29,099 additions and 14,434 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -8,15 +8,15 @@ RUN apt-get update && \
     pip install pipenv && \
     pipenv install --ignore-pipfile --deploy --system
 
+
+
 WORKDIR /app
 
 ENV FLASK_ENV production
-# not really needed as weitress ignores this option
-ENV FLASK_APP webserver.py
-
-ENV FLASK_RUN_HOST 0.0.0.0
 ENV TZ=Europe/Berlin
 
+EXPOSE 5428
 EXPOSE 80
 
-CMD [ "waitress-serve", "--port", "80", "--call", "webserver:main" ]
+
+CMD [ "waitress-serve","--port","80","--call", "webserver:main" ]
diff --git a/Pipfile b/Pipfile
@@ -15,10 +15,12 @@ pyyaml = "*"
 nltk = "*"
 redis = "*"
 ipython = "*"
-pybind11 = "*"
+"pybind11" = "*"
 python-dotenv = "*"
 click = "*"
 flask = "*"
+misc = {git = "https://bitbucket.org/bmmalone/misc.git"}
+autosklearn = {ref = "development",git = "https://github.com/automl/auto-sklearn.git"}
 flask-redis = "*"
 sklearn = "*"
 seaborn = "*"
@@ -31,18 +33,15 @@ pytest = "*"
 apscheduler = "*"
 waitress = "*"
 ipdb = "*"
-sqlalchemy = "*"
-
+psycopg2 = "*"
+statsmodels = "*"
+gower = "*"
+umap = "*"
+flask_cors="*"
+requests = "*"
 
 [dev-packages]
 pylint = "*"
 
 [requires]
 python_version = "3.7"
-
-[packages.misc]
-git = "https://bitbucket.org/bmmalone/misc.git"
-
-[packages.autosklearn]
-ref = "development"
-git = "https://github.com/automl/auto-sklearn.git"
diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/README.md b/README.md
@@ -16,76 +16,29 @@ Currently setup for deployment and not development
 #### Usage ####
 * `docker-compose up`
 
-### Setup Instructions Development ###
+### Setup Instructions Development [(detailed documentation)](https://github.com/dieterich-lab/medex/tree/PostgreSQL/documentation) ### 
 Not recommended for pure deployment.
 
 #### Requirements ####
 * [Python](https://www.python.org/) >= 3.7
 * [pipenv](https://docs.pipenv.org/en/latest/) >= 2018.10.13
-* [redis](https://redis.io/) >= 5.x
+* [Docker-CE](https://docs.docker.com/install/) >= 18.09.07
+* [docker-compose](https://docs.docker.com/compose/overview/) >= 1.24.0
 * Linux/MacOS
 
 #### Usage ####
-* `pipenv install` installs the latest depencies
+* `pipenv install` installs the latest dependencies
 * `pipenv shell` enters the virtual environment
+* `docker-compose up` necessary for creating container for PostgreSQL database
 * `./scripts/start.sh`
 * Develop
 
+
 ## Data Import ##
 * Database imports run every night at 5:05 and at startup.
 * The database is only updated if there is new data to import.
-
-### Importing new data ###
-In order to add new data add a new `entities.csv` and `dataset.csv` to the `./import` folder
-
-To work the files should have the same format as the current example files that are already in that directory. 
-
-The currently used format of the dataset.csv file comes from the research warehouse export format of the data we are analysing with this tool:
-
-`Patient_ID,Billing_ID,Date,Time,Key,Value`
-
-Example file starts like this:
-```
-f96ae85e2c3598e7eefa593a927fe1c8,d41d8cd98f00b204e9800998ecf8427e,2012-07-13,4:51:9,Gender,male
-f96ae85e2c3598e7eefa593a927fe1c8,d41d8cd98f00b204e9800998ecf8427e,1999-03-13,15:26:20,Jitter_rel,0.25546
-```
-Billing_ID, Date and Time are currently not used and are optional, required are only a unique identifier of the data instance (Patient), a parameter name and the respective value, and the six columns format, so this line works as well:
-```
-Patient1,,,,A_numeric_parameter,5.8
-```
-
-Also necessary is the entities.csv file, specifying the data type, which can be String or Double. 
-In our Example that would be a file starting like this:
-```
-entity,datatype
-Gender,String
-Jitter_rel,Double
-```
-
-Example files can be found in `./dataset_examples`. To test them copy them to `./import` and restart the tool.
-
-
-### Controlling the Data Import Scheduler ###
-To learn more about the scheduler and the configuration methods read [here](https://apscheduler.readthedocs.io/en/latest/modules/triggers/cron.html#module-apscheduler.triggers.cron). 
-The scheduler can be controlled by 4 envrionment variables:
-* `IMPORT_DISABLED` disables the scheduler if any value is set
-* `IMPORT_DAY_OF_WEEK` decides which days the import is run
-* `IMPORT_HOUR` decides which hour it is run at i.e. 5 is 5 a.m.
-* `IMPORT_MINUTE` changes the minute the import is run
-
-
-
-## Deploy Debian Based ##
-Please keep in mind that the application uses port 80 by default, that can be changed in the `./docker-compose.yml`!
-This is important because it manages the application startup automatically. i.e. Autostart and Restart on Crash
-
-1. Open service `./scripts/data-warehouse.service`
-2. Change `WorkingDirectory` to current main directory
-3. Copy to systemd folder; for instance, `/etc/systemd/system/`
-4. Run `sudo systemctl daemon-reload`
-5. Run `sudo systemctl enable data-warehouse.service`
-6. Run `sudo systemctl start data-warehouse.service`
-7. Check whether it runs properly
+* In order to add new data add a new `header.csv`,`entities.csv` and `dataset.csv` to the `./import` folder.
+* The `header.csv`,`entities.csv` and `dataset.csv` files should look like in directory `dataset_examples` [(detailed documentation)](https://github.com/dieterich-lab/medex/tree/time-series/dataset_examples/Data_import.md).