Skip to content

2. Set up a project with bionitio

Bernie Pope edited this page Jul 8, 2019 · 86 revisions

These instructions explain how to set up a new project using bionitio.

In this example, the $ is the command prompt.

1. Choose a language

  • There are currently 12 languages available for setting up a bionitio project.
  • Choose one of c, clojure, cpp, csharp, haskell, java, js, perl5, python, r, ruby, or rust.
  • In this example, we will use python (the names are case sensitive).

2. Choose a project name

  • A good name will be informative and, ideally, unique amongst bioinformatics tools.
  • You should not already have a repository in GitHub with this name.
  • Try to stick with alphanumeric characters.
  • In this example, we will call our project "biodemo".

3. Choose a license

  • If you aren't sure which license is best for your project, try choose a license for advice.
  • By default, bionitio uses the MIT license for new projects, however you can override this with the -c option.
  • The license will be written to a file called LICENSE.

4. Run bionitio-boot.sh to set up your project

You need to provide the following information:

  • programming language; specified with -i
  • project name; specified with -n

These are optional:

  • your GitHub username; specified with -g
  • your name; specified with -a
  • your email address; specified with -e

If you specify -g, bionitio-boot.sh will create a new GitHub project for you. If you specify your name and email address these will be used in the README and code documentation, otherwise generic placeholders will be used.

  • At the command-line navigate into the directory where you want to start a new project (such as $HOME/code).
  • Run the following command and replace the placeholders with real values. The placeholders are: GITHUB_USERNAME, YOUR NAME and YOUR_EMAIL_ADDRESS.
$ curl -sSf https://raw.githubusercontent.com/bionitio-team/bionitio/master/boot/bionitio-boot.sh \
 | bash -s -- -i python -n biodemo -g GITHUB_USERNAME -a 'YOUR NAME' -e YOUR_EMAIL_ADDRESS

The script will create a new directory with the same name as your project (in this case "biodemo"). It will initialise a new git software repository in that directory, and it will push the repository to GitHub.

Depending on your git/GitHub setup, you will be prompted to enter your GitHub password and maybe your GitHub username, up to 2 times. Note: be careful about when it asks for your username and when it asks for your password.

If requested for a password, enter your GitHub password for username GITHUB_USERNAME
Enter host password for user 'GITHUB_USERNAME':
If requested for a username and password, enter your GitHub username and password
Username for 'https://github.com': GITHUB_USERNAME
Password for 'https://[email protected]':

After a few seconds the set up will be complete. You will have a completely new software project in the biodemo directory which will also be cloned on GitHub at https://github.com/GITHUB_USERNAME/biodemo.

The above command is rather complex, so let's break it into parts.

The first part is:

curl -sSf https://raw.githubusercontent.com/bionitio-team/bionitio/master/boot/bionitio-boot.sh

This uses curl to download a copy of the bionitio-boot.sh shell script.

The second part is:

bash -s -- -i python -n biodemo -g GITHUB_USERNAME -a 'YOUR NAME' -e YOUR_EMAIL_ADDRESS

This runs the bionitio-boot.sh script on your local computer, and supplies important information via command line arguments.

The two parts are joined together using a Unix pipe |, which means that the output of the curl command is fed directly into the bash command.

If you have Docker installed on your computer you can set up a new project with the following command, which will achieve the same result as the example above using curl.

docker run -it -v "$(pwd):/out" --rm bionitio/bionitio-boot \
    -i python -n biodemo -g GITHUB_USERNAME -a 'YOUR NAME' -e YOUR_EMAIL_ADDRESS

5. Inspect the contents of the new project

You will now have a copy of various directories and files that act as a template for a bioinformatics tool. If you list the contents of the biodemo directory you will see:

$ ls biodemo
biodemo functional_tests LICENSE README.md requirements-dev.txt setup.py
  • LICENSE is a copy of the MIT license
  • README.md describes the overall project, including how to install it and how to use it
  • functional_tests is a directory that contains data and a script for automated testing of the program. This will be useful for integration testing.
  • setup.py describes metadata for the Python packaging system, including library dependencies that must be installed before the program can be used (typically this is processed by pip).
  • biodemo is a directory that contains the source code for the program.

You should also observe that the contents of your new "biodemo" directory are reflected in the new GitHub repository created for your project.

6. Install and run your program

Your project includes a README.md file.

  • In the terminal, type in less README.md to display this file.
  • Check the "Installing" section for information specific to the programming language your project is using.
  • In this case, we will install as per the instructions for Python.

Create a virtual environment

This will create an isolated place for you to install your program and its dependencies.

  • Note: this virtual environment information is only relevant for Python.
  • Navigate to the directory in which you want to run your virtual environment (such as $HOME/scratch/)
  • Set up a virtual environment. You can call it anything you want. A good heuristic is to base it on the name of your project. For, example, we will use biodemo_dev.
$ python3 -m venv biodemo_dev
  • Activate this environment. This must be done before you can use it:
$ source biodemo_dev/bin/activate
  • Update pip (the Python package installer) to the latest version. This is not always necessary, but it can help avoid problems with older versions of pip.
$ pip install --upgrade pip
  • Install biodemo into the virtual environment
$ pip install -U /path/to/biodemo/repository

If you are in the same folder as setup.py, then this command will be pip install -U .

  • Test that biodemo is now in your PATH:
$ which biodemo

Test that it can be run (this will ask biodemo to print a help message):

$ biodemo -h
  • You can deactivate your virtual environment by running the deactivate command, or by exiting the shell. You will need to activate again the next time you want to use the virtual environment.

Run the program on some test data

Sample test data is provided in the functional_tests/test_data sub-directory of the project.

Change to the directory containing the test data, and try the program on various test cases. For example:

$ biodemo one_sequence.fasta two_sequence.fasta
FILENAME	NUMSEQ	TOTAL	MIN	AVG	MAX
one_sequence.fasta	1	237	237	237	237
two_sequence.fasta	2	357	120	178	237

7. Test your program more thoroughly using the biodemo-test.sh script

In the functional_tests sub-directory of the project you will find a shell script called biodemo-test.sh. It can be used to automatically run various tests of the program and check that the output is correct.

Make sure you have your virtual environment activated.

Change to the functional_tests directory and run the script like so:

$ ./biodemo-test.sh -p biodemo -d test_data -v
biodemo-test.sh Testing stdout and exit status: biodemo one_sequence.fasta
biodemo-test.sh Testing stdout and exit status: biodemo two_sequence.fasta
biodemo-test.sh Testing stdout and exit status: biodemo --minlen 200 two_sequence.fasta
biodemo-test.sh Testing stdout and exit status: biodemo --minlen 200 < two_sequence.fasta
biodemo-test.sh Testing stdout and exit status: biodemo empty_file
biodemo-test.sh Testing stdout and exit status: biodemo --minlen 1000 two_sequence.fasta
biodemo-test.sh Testing exit status: biodemo --this_is_not_a_valid_argument > /dev/null 2>&1
biodemo-test.sh Testing exit status: biodemo this_file_does_not_exist.fasta > /dev/null 2>&1
biodemo passed all 8 successfully

The -p option specifies the program that should be tested (biodemo), the -d option specifies the location of the test data and expected results, and the -v option tells the test program to generate "verbose" output.

Input test data and the corresponding expected outputs are found in the functional_tests/test_data sub-directory.

If you make changes to the program you can run this test script again to make sure nothing is broken. You can also add new test cases and expected outputs as new features are added to the program.

This process can be automated using Travis continuous integration testing, as explained in the next section.

8. Set up Travis continuous integration testing

What is Travis?

  • Travis is a continuous integration testing tool which integrates with GitHub.
  • It allows you to run automated tests on your code each time a commit is made to the repository.
  • Your new biodemo project comes with the necessary configuration files for Travis. However, you must log into Travis in order to activate testing.
  • You can log into Travis using your existing GitHub account credentials.

Inspect the Travis configuration files

The following things are needed for Travis testing:

  1. A .travis.yml file in the top directory of the project. This contains important metadata that Travis uses to configure and run your testing environment.
  2. A script to tell Travis how to install necessary code dependencies for your project. We've put this in .travis/install-dependencies.sh.
  3. Some test cases, and some method to run them. We use the functional_tests/biodemo-test.sh script as described earlier. We also point to some unit tests in the file .travis/unit-test.sh, which points to biodemo/biodemo_test.py.

Configure Travis to test your program and trigger a new test

  • Go to travis-ci.org and sign in with GitHub.
  • In the top right, click on your profile picture, select "Profile" from the drop-down menu.
  • There is a list of all your repositories. If your new biodemo project repository is not listed, click the "Sync account" button at the top right (or: sign out and sign in again).
  • Click to tick next to your biodemo project repository (it should go green when testing is enabled).

Travis will not begin running tests until a new commit is made to the repository. For the sake of demonstration we will make a trivial change to trigger the initial test run.

  • Go to the GitHub repository for your biodemo project.
  • Make a minor change in the README, such as an adding extra space. Click commit changes.
  • Go back to the Travis page for your project, it will be something like https://travis-ci.org/GITHUB_USERNAME/biodemo
  • This latest push should have triggered a Travis test. Be patient, it takes a few seconds before Travis acknowledges the change and starts testing.

Your README.md for the project contains some HTML code to display a Travis testing icon. It is green when testing succeeds, and red when it fails. Clicking on the icon will take you to the Travis testing page for the project.

9. Add a breaking change

Try adding a change to one of your bionitio files that will cause the tests to break.

  • For example, use sed to delete line 230 from biodemo.py, which will cause it to skip printing the header line in the output (an intentional mistake for the sake of demonstration).
$ cd $HOME/code/biodemo/biodemo
$ cp biodemo.py biodemo.py.old
$ sed -e '230d' biodemo.py > biodemo.py.broken
$ cp biodemo.py.broken biodemo.py

Install this intentionally broken version into your virtual environment:

$ pip install -U $HOME/code/biodemo

Re-run the test suite:

$ cd $HOME/code/biodemo/functional_tests/ 
$ ./biodemo-test.sh -p biodemo -d test_data -v
# test output omitted here
  • Did any of the tests fail?
  • If working on the code in the command line, you can try to git add, commit, and push your change to your GitHub repository.
  • This should trigger a Travis build, which should also fail.
  • You should now "fix" the intentional error in the program and re-commit and push.

10. Add a new feature to biodemo

First, we will revert to the working version of biodemo.

$ cd $HOME/code/biodemo/biodemo
$ cp biodemo.py.old biodemo.py 
$ pip install -U $HOME/code/biodemo

We will add a new feature to the program.

For example, add a --maxlen threshold (we already have --minlen).

We will do “test driven development”. It’s a good idea to make a failing test so we can specify what we want and then fix the code to make the test pass.

First, we will make a test to see if the maxlen argument works. (Obviously it won’t, as we haven't added it to the program yet.)

First, make a file for the expected output if max_len is 200.

Create a file called two_sequence.fasta.maxlen_200.stdin.expected in the test_data directory and put the following contents into it. (This is the result we want our program to achieve.)

$ nano two_sequence.fasta.maxlen_200.stdin.expected
# a blank file will open. Paste this in:
FILENAME	NUMSEQ	TOTAL	MIN	AVG	MAX
stdin	1	120	120	120	120
# save (ctrlX)

Now we need to run our tool to see if it can produce this result. We add a test to the biodemo-test.sh file in functional_tests.

Open the biodemo-test.sh file for editing (e.g. $ nano biodemo-test.sh)

Add this to the "Run tests" section:

test_stdout_exit "$test_program --maxlen 200 < two_sequence.fasta" \
    two_sequence.fasta.maxlen_200.stdin.expected 0

Run the tests again. There are now 9 tests and 1 will fail.

Now we need to add the functionality to the program to make it pass the test and produce the correct output when we give it the “--maxlen 200” parameter on the command line.

In $HOME/code/biodemo/biodemo/biodemo.py, in "def parse_args()", add this to the argument parser after the “--minlen” argument:

parser.add_argument(
        '--maxlen',
        metavar='N',
        type=int,
        help = 'Maximum length sequence to include in stats (default = None)')

Change the “def from_file“ definition to include maxlen_threshold=None:

def from_file(self, fasta_file, minlen_threshold=DEFAULT_MIN_LEN, maxlen_threshold=None):

Then, in the Results if statement:

if this_len >= minlen_threshold 

add this:

and (maxlen_threshold is None or this_len <= maxlen_threshold):

Finally, add options.maxlen to the subroutine call in two places (highlighted in this code block with // HERE - use the grey scroll bar at the bottom of the code block to scroll to the right):

def process_files(options):
    '''Compute and print FastaStats for each input FASTA file specified on the
    command line. If no FASTA files are specified on the command line then
    read from the standard input (stdin).

    Arguments:
       options: the command line options of the program
    Result:
       None
    '''
    if options.fasta_files:
        for fasta_filename in options.fasta_files:
            logging.info("Processing FASTA file from %s", fasta_filename)
            try:
                fasta_file = open(fasta_filename)
            except IOError as exception:
                exit_with_error(str(exception), EXIT_FILE_IO_ERROR)
            else:
                with fasta_file:
                    stats = FastaStats().from_file(fasta_file, options.minlen, options.maxlen) //HERE
                    print(stats.pretty(fasta_filename))
    else:
        logging.info("Processing FASTA file from stdin")
        stats = FastaStats().from_file(sys.stdin, options.minlen, options.maxlen) //HERE
        print(stats.pretty("stdin"))

Re-install biodemo (pip install -U $HOME/code/biodemo) again to put the changes in.

Save, run the tests again.