Skip to content

Running Fusera

Matt Bianchi edited this page Apr 30, 2018 · 14 revisions

Access the help with fusera -h:

NAME:
   fusera - A FUSE interface to the NCBI Sequence Read Archive (SRA)

USAGE:
   fusera <command> [<flags>] mountpoint
   
VERSION:
   0.0.-beta
   
COMMANDS:
  mount, m  to mount a folder

    FLAGS:
      --ngc value        path to file that authenticates access [$DBGAP_CREDENTIALS]
      --acc value        comma separated list of accessions [$DBGAP_ACC]
      --acc-file value   path to a cart file, listing accession numbers [$DBGAP_ACCFILE]
      --loc value        preferred region [$DBGAP_LOC]
      --debug            Enable debugging output. [$FUSERA_DEBUG]
      --endpoint value   Change the endpoint fusera uses to communicate with NIH API. Only to be used for advanced purposes. [$DBGAP_ENDPOINT]
      --aws-batch value  Adjust the amount of accessions fusera puts in one request to the Name Resolver API when using an aws location. Only to be used for advanced purposes. (default: 0) [$DBGAP_AWSBATCH]
      --gcp-batch value  Adjust the amount of accessions fusera puts in one request to the Name Resolver API when using a gcp location. Only to be used for advanced purposes. (default: 0) [$DBGAP_GCPBATCH]

  unmount, u  to unmount a folder

    FLAGS:
      --debug  Enable debugging output. [$FUSERA_DEBUG]

MISC OPTIONS:
   --help, -h     Print this help text and exit successfully.
   --version, -v  print the version

A simple run of Fusera:

$ fusera mount --ngc [path/to/ngcfile] --acc [comma separated list of SRR#s] --loc [s3.us-east-1|gs.US] <mountpoint>

Tips and Tricks

Shortening the call length

All of these flags have equivalent environment variables ($DBGAP_CREDENTIALS, $DBGAP_ACC, $DBGAP_LOC, etc), which might be more handy when attempting to automate the use of fusera over multiple machines or reduce redundancy if you find yourself consistently invoking fusera with the same flags. Using all the environment variables, a call to fusera could look like so:

$ fusera mount ~/studies

Another way to ease the use of fusera is through using it on a compute instance on either AWS or GCP. When fusera is not given a location through the flag or environment variable, it will attempt to utilize known ways of resolving where fusera is running with respect to that cloud platform and will use the location it finds.

Running fusera in the background

If the you want to run fusera in the background you can do so with shell commands. Example:

fusera ~/tmp > output.log  2>&1 &
[1] 12464
disown %1

Breakdown: > output.log
This redirects stdout to a file named output.log. If you don't want the output, use > /dev/null instead. 2>&1
The way to redirect stderr to print with stdout so it is caught in output.log (or /dev/null) as well. &
Run this process in the background so I can continue using the shell. disown %1
This will keep fusera running even if the terminal you ran it from is closed. The %1 indicates you want to disown the first thing you started running in the background, hence the 1 in brackets that gets printed after running fusera this way: [1] 12464. If your terminal prints a different number in the brackets, use it instead.

Using fusera's unmount command on the folder fusera is mounted to will kill the process, as long as nothing is using the file system at that time.

Advice

The <mountpoint> must be an existing, empty directory, to which the user has read and write permissions.

It is recommended that the mountpoint be a directory owned by the user. Creating the mountpoint in system directories such as /mnt, /tmp have special uses in unix systems and should be avoided.

Because of the nature of FUSE systems, only the user who ran fusera will be able to read the files mounted. This can be changed by editing a config file (reference) on the machine to allow_others, but be warned that there are security implications to be considered: https://github.com/libfuse/libfuse#security-implications.

Accessions can be specified through the commmand line using the --acc flag, or, by reference to a file with space or comma separated accessions using the --acc-file option. The union of these two sets of accessions is used to build the FUSE file system, with duplicates eliminated.

Clone this wiki locally