Skip to content

Commit

Permalink
Release v3
Browse files Browse the repository at this point in the history
  • Loading branch information
Martin Gleize committed Mar 9, 2020
1 parent 90c0ec6 commit b07259b
Show file tree
Hide file tree
Showing 914 changed files with 810,913 additions and 193,932 deletions.
8 changes: 8 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
**

!Dockerfile
!data
!target/*.war
!*.properties
!LICENSE
!README.md
29 changes: 29 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,30 @@
/target/
/output/
**/.DS_Store
scripts/bcts.supervised.properties
logs/
bct-s.properties
scripts/*.properties
scripts/res*.txt
test_ie_index/
PropertiesExplained.properties
PropertiesExplained
com.ibm.drl.hbcp.inforetrieval.apr/*.properties
trec_eval/trec_eval

# Binaries
*.class

# eclipse project file
.settings/
.classpath
.project

# IntelliJ files
.idea/
hbcpIE.iml
/nbproject/

# Grobid-related files/folders
/grobid-0.5.3
grobid.log*
20 changes: 20 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# You can validate this file at: https://lint.travis-ci.org

language: java
jdk:
- openjdk8

cache:
directories:
- $HOME/data/
- $HOME/data_resources/

#Use the default script command of travis with maven which is mvn test -B
#Before running script, for maven projects travis executes 'mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V'

before_script: mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V

script:
- mvn javadoc:javadoc
- mvn test -B

11 changes: 11 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# OpenJDK JRE for Java 8, with Tomcat 9.0.30 server
FROM tomcat:9.0.30-jdk8-openjdk
# Tomcat server port
EXPOSE 8080
# places the app in the Tomcat app folder (at the root of the server)
RUN rm -rf /usr/local/tomcat/webapps/*
COPY ./target/hbcpIE-0.0.1-SNAPSHOT.war /usr/local/tomcat/webapps/ROOT.war
# set the max size of the JVM heap
ENV CATALINA_OPTS -Xms512m -Xmx8g
# run catalina (Tomcat's servlet container)
CMD ["catalina.sh", "run"]
13 changes: 13 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Copyright 2018 - UCL

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
224 changes: 72 additions & 152 deletions README.md
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,190 +1,110 @@
# Human Behaviour Change Project Information Extraction (hbcpIE)
version 2.0.0
# Human Behaviour Change Project (HBCP)

The Human Behaviour-Change Project (HBCP) is a collaboration between behavioral scientists, computer scientists and
system architects that aims to revolutionize methods for synthesizing evidence in real time and generate new insights on
behavior change.

### Update:
demo currently down due to security issue.
The system can be run locally following the instructions below.
This repository includes code for two important tasks in the project: behavior change entity extraction and prediction
for behavior change (e.g., outcome value given a set of population and intervention entities).

## Getting Started

17/04/2019
These instructions will get you a copy of the project up and running on your local machine.

Copyright (c): Apache License Version 2.0
### Prerequisites

Created by HBCP team, IBM Research, Dublin
hbcpIE uses Java 1.8 and needs to have [Maven](https://maven.apache.org/) installed to compile and run the project.

### Installing

* Code contributions: Debasis Ganguly, Martin Gleize, Yufang Hou, Charles Jochim, Francesca Bonin
After cloning the project go the the root `hbcpIE` directory.

Java version: "1.8.0"

The project represents the first version of the code of the HBCP project.
It contains two main packages and information extraction pipeline ( extractor) and a prediction pipeline (predictor). Both have APIs that can be tested via a Swagger UI.


## Extractor
The package executes the following action:
1. Indexes a collection of documents
2. Store them in a Lucene index
3. Extracts pieces of information from arbitrary passages.


To cite the work please cite:

Debasis Ganguly, L�a A. Deleris, Pol Mac Aonghusa, Alison J. Wright, Ailbhe N. Finnerty, Emma Norris, Marta M. Marques, Susan Michie:
Unsupervised Information Extraction from Behaviour Change Literature. MIE 2018: 680-684
```@inproceedings{DBLP:conf/mie/GangulyAAWFNMM18,
author = {Debasis Ganguly and
L{\'{e}}a A. Deleris and
Pol Mac Aonghusa and
Alison J. Wright and
Ailbhe N. Finnerty and
Emma Norris and
Marta M. Marques and
Susan Michie},
title = {Unsupervised Information Extraction from Behaviour Change Literature},
booktitle = {Building Continents of Knowledge in Oceans of Data: The Future of
Co-Created eHealth - Proceedings of {MIE} 2018, Medical Informatics
Europe, Gothenburg, Sweden, April 24-26, 2018},
pages = {680--684},
year = {2018}
}
Compile the code:
```
mvn clean compile -U
```
[bibtex](https://dblp.uni-trier.de/rec/bibtex/conf/mie/GangulyAAWFNMM18)


### Description
The project takes as input pdfs (scientific articles of behaviour change interventions) and returns several attributes encoded by the Behaviour Change Intervention Ontology developed by UCL in the context of the Human Behaviour Change Project (HBCP) (https://www.humanbehaviourchange.org/). A rest API with a Swagger UI is provided. This allows everybody to test the system from a web browser: [http://23.97.177.82:8180/swagger-ui.html](http://23.97.177.82:8180/swagger-ui.html)

Supported entities for the moment:

Population characteristics: min age, max age, gender and mean age.

Behavioural Change Techniques: Goal Setting (Behaviour), Problem Solving, Action Planning, Feedback on behaviour, Self-monitoring of behaviour, Social support (unspecified), Information about health consequences, Information about social and environmental consequences, Pharmacological support and Reduce negative emotions.

Outcome : outcome value

## Predictor
The package executes the following actions:
1. Takes in input entities and the documents from which they have been extracted
2. Build a relation graph where entities are nodes and co-occurrence in the same document are edges
3. Create an embedded space via Node2Vec
4. Allow to query the space in order to find a entity given 1 or more other entities.

### Description
The project takes as input pdfs (scientific articles of behaviour change interventions) and a json of either extracted either manually annotated entities. It allow the user to write a query, and retrieves the most relevant outcome value for that particualre query. A rest API with a Swagger UI is provided. This allows everybody to test the system from a web browser: [http://23.97.177.82:8180/swagger-ui.html](http://23.97.177.82:8180/swagger-ui.html)


### Important Note:
Please consider that this is a work in progress and we are currently working on improving/expanding the project.
Feel free to download the code and use it, as well as to send us feedback and bug report.


## Requirements
Most recent version of Maven: https://maven.apache.org/download.cgi

## What has been released in this repository

We are releasing:
- code for supervised, semisupervised and unsupersived retrieval of a selection of BCTs
- Swagger api facilities
- Java documentation for each class
- 17 fully annotated open access papers

## Dataset

In the context of the HBCP (https://www.humanbehaviourchange.org/), 244 papers of behaviour science intervention papers have been annotated for 10 behaviour change techniques and 4 population characteristics (min age, max age, mean age and gender) according to the Behaviour Change Intervention Ontology. Our pre-trained model is trained on a subset of 111 papers. We release 17 papers (that the model has not been trained on) as a sample dataset. Those 17 papers are open access and publicly available.



## Quickstart for behavioural science users
### Information Extraction
## Example Usage

- Visit this page to demo the system http://23.97.177.82:8180/swagger-ui.html
- Click on extractor-controller
- Choose the entity you want to extract from your pdf:
- all BCT present in the paper (allbcts)
- all BCT present in the paper using a supervised method (allbcts/supervised)
- detect the presence of the bct specified in the parameter "code" (bct)
- detect the presence of the bct specified in the parameter "code" using a supervised method (bct/supervised)
- the gender of participants (gender)
- the minimum age of participants (minage)
- the max of participants (maxage)
- the mean of participants (meanage)
- Upload the pdf with the : "choose file" button
- Click "Try it out!"
Once you've confirmed that Maven can build the code, there are two ways to quickly test our APIs.
1. [Use with Maven commands](#with-maven-commands)
1. [Use with Docker](#with-docker)

Results will be shown in this format:
### With Maven commands

The easiest way to test our entity extraction and prediction APIs is via a [Swagger](https://swagger.io/) interface
using your own PDFs of behavior change literature. Before doing that, we need to build the indexes used by extraction
and prediction. This is done with the following commands:
```
code": "Minage",
"docName": "Volpp 2009 primary paper.pdf",
"extractedValue": "18"
}
mvn exec:java@indexer
```
and
```
mvn exec:java@extractor
```

I.E. Looking for gender in pdf Volpp 2009 primary paper.pdf, the extracted value is 18.


### Prediction
Visit this page to demo the system http://23.97.177.82:8180/swagger-ui.html

Click on predictor-controller

You can either:
- get one attribute given an ID
- get the entire set of attributes ( interventions or gender, or settings etc.)
- predict an outcome given a population query and a intervention query expressed as follows:

- For population: C:<attributeID>:value, eg. C:4507435:18 for Mean age=18
- For Interventions: I:<attributeID>:value, eg. I:3673271:1 for Goal settings.
(you should see after each of these something like `[INFO] BUILD SUCCESS`)

Click "Try it out!"
Results will be shown in this format:
Next we will start the server that will allow us to access the Swagger interface:
```
mvn spring-boot:run
```

## Quickstart for coders
This will take several seconds to start. After it has started, open a web browser and go to
http://127.0.0.1:8080/swagger-ui.html. You can then follow the instructions on that page to see how to use the extractor and
predictor APIs.

### With Docker

### For information extraction
From command line type the following command to index the collection.
First build the project with:
```
mvn exec:java@indexer
mvn clean install
```
The next step is to execute the following command to run the IE pipeline.
Build the docker image with:
```
mvn exec:java@extractor
docker build .
```
The only thing you might need is to increase your Docker runtime memory option to at least 8GB.

### For predictions
Run a docker container exposing the API on port 8080:
```
mvn exec:java@predict
docker run -t -p 8080:8080 [your_image_id]
```
It will take a while, as it is indexing and running the extraction. After it has started, open a web browser and go to http://127.0.0.1:8080/swagger-ui.html. You can then follow the instructions on that page to see how to use the extractor and predictor APIs.

### Test REST API with Swagger UI
The project uses spring boot and spring fox with swagger ui.
## Publications

From the command line, locate yourself in the project HOME (hbcpIE/) and execute
```
mvn spring-boot:run
```
Open a browser and hit this [URL] (http://localhost:8180/swagger-ui.html)
If you use the extractor please cite:

* Debasis Ganguly, Yufang Hou, Léa A. Deleris, Francesca Bonin:
[Information Extraction of Behavior Change Intervention Descriptions](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6568066/). AMIA Joint Summits on Translational Science
proceedings 2019:182–191.

## Java Documentation
[Javadoc](apidocs/index.html)
#### Other Related Publications
* Yufang Hou, Debasis Ganguly, Léa A. Deleris, Francesca Bonin:
[Extracting Factual Min/Max Age Information from Clinical Trial Studies](https://www.aclweb.org/anthology/W19-1914/).
Proceedings of the 2nd Clinical Natural Language Processing Workshop 2019: 107-116.
* Debasis Ganguly, Léa A. Deleris, Pol Mac Aonghusa, Alison J. Wright, Ailbhe N. Finnerty, Emma Norris, Marta M. Marques, Susan Michie:
[Unsupervised Information Extraction from Behaviour Change Literature](http://ebooks.iospress.nl/publication/48878). MIE 2018: 680-684
* Susan Michie, James Thomas, Marie Johnston, Pol Mac Aonghusa, John Shawe-Taylor, Michael P. Kelly, Léa A. Deleris,
Ailbhe N. Finnerty, Marta M. Marques, Emma Norris, Alison O’Mara-Eves, Robert West: [The Human Behaviour-Change Project:
harnessing the power of artificial intelligence and machine learning for evidence synthesis and interpretation](https://doi.org/10.1186/s13012-017-0641-5).
Implementation Science 12, 121 (2017).


## THANKS
Thanks to the UCL annotators that developed the Behaviour Change Intervention Ontology.
## Team Members
* Debasis Ganguly
* Martin Gleize
* Yufang Hou
* Charles Jochim
* Francesca Bonin
* Pierpaolo Tommasi

## CHANGES
- version 2.0
## Acknowledgments
Thanks to the UCL annotators that developed the Behaviour Change Intervention Ontology.

## LICENSE
This program is free software; you can redistribute it and/or
modify it under the terms of the Apache License Version 2.0.
## License
This program is free software; you can redistribute it and/or modify it under the terms of the [Apache License
Version 2.0](./LICENSE).

## Contact

For help or issues using the HBCP code, please submit a GitHub issue.
For personal communication related to the project, please contact the HBCP team ([email protected])
1 change: 0 additions & 1 deletion apr/qrels/.gitignore

This file was deleted.

1 change: 0 additions & 1 deletion apr/res/.gitignore

This file was deleted.

Loading

0 comments on commit b07259b

Please sign in to comment.