-
Notifications
You must be signed in to change notification settings - Fork 12
Running Fusera
Access the help with fusera help
:
A FUSE interface to the NCBI Sequence Read Archive (SRA)
Usage:
fusera [command]
Available Commands:
help Help about any command
mount Mount a running instance of Fusera to a folder.
unmount Unmount a running instance of Fusera.
version Print the version number of Fusera
Flags:
-d, --debug Enable debug output.
-h, --help help for fusera
Use "fusera help [command]" for more information about a command.
The 'mount' command builds a filesystem presenting the files associated with a collection of SRA accession numbers. The 'unmount' command tears down a fusera-created filesystem, and terminates the associated fusera invocation.
$ fusera help mount
Mount a running instance of Fusera to a folder.
Usage:
fusera mount [flags] /path/to/mountpoint
Flags:
-a, --accession string A list of accessions to mount or path to accession file.
EXAMPLES: ["SRR123,SRR456" | local/accession/file | https://<bucket>.<region>.s3.amazonaws.com/<accession/file>]
NOTE: If using an s3 url, the proper aws credentials need to be in place on the machine.
Environment Variable: [$DBGAP_ACCESSION]
--aws-batch int ADVANCED: Adjust the amount of accessions put in one request to the SDL API when using an AWS location.
Environment Variable: [$DBGAP_AWS-BATCH] (default 50)
--eager ADVANCED: Have fusera request that urls be signed by the API on start up.
Environment Variable: [$DBGAP_EAGER]
-e, --endpoint string ADVANCED: Change the endpoint used to communicate with SDL API.
Environment Variable: [$DBGAP_ENDPOINT] (default "https://www.ncbi.nlm.nih.gov/Traces/sdl/1/retrieve")
-f, --filetype string A list of the only file types to copy.
EXAMPLES: "cram,crai,bam,bai"
Environment Variable: [$DBGAP_FILETYPE]
--gcp-batch int ADVANCED: Adjust the amount of accessions put in one request to the SDL API when using a GCP location.
Environment Variable: [$DBGAP_GCP-BATCH] (default 25)
-h, --help help for mount
-l, --location string Cloud provider and region where files should be located.
FORMAT: [cloud.region]
EXAMPLES: [s3.us-east-1 | gs.US]
NOTE: This can be auto-resolved if running on AWS or GCP.
Environment Variable: [$DBGAP_LOCATION]
-n, --ngc string A path to an ngc file used to authorize access to accessions in dbGaP.
EXAMPLES: [local/ngc/file | https://<bucket>.<region>.s3.amazonaws.com/<ngc/file>]
NOTE: If using an s3 url, the proper aws credentials need to be in place on the machine.
Environment Variable: [$DBGAP_NGC]
Global Flags:
-d, --debug Enable debug output.
Most of the options and environment variables are intended for advanced users and debugging. The only options intended for regular use by users are for passing the ngc file and specifying the list of accessions.
A simple run of Fusera:
$ fusera mount --ngc ~/file.ngc --accession "SRR123,SRR456" --location s3.us-east-1 ~/studies
NOTE: fusera needs to continue running in order to operate. So this command will not "end" and bring a terminal prompt back up until fusera is quit (CTRL-C
) or unmounted from another terminal command in another shell (using fusera unmount ~/studies
). fusera can be run in the background, as described below.
For ease of use, all command-line flags have equivalent environment variables ($DBGAP_NGC, $DBGAP_ACCESSION, $DBGAP_LOCATION, etc). Using the environment variables, a call to fusera could look like so:
$ fusera mount ~/studies
Another way to ease the use of fusera is through using it on a compute instance on either AWS or GCP. When fusera is not given a location through the flag or environment variable, it will attempt to utilize known ways of resolving where fusera is running with respect to that cloud platform and will use the location it finds.
If you want to run fusera in the background you can do so with shell commands. Example:
$ fusera mount ~/tmp > output.log 2>&1 &
[1] 12464
$ disown %1
Breakdown:
> output.log
This redirects stdout to a file named output.log. If you don't want the output, use > /dev/null
instead.
2>&1
The way to redirect stderr to print with stdout so it is caught in output.log (or /dev/null) as well.
&
Run this process in the background so you can continue using the shell.
[1] 12464
This is an example of the printout that will appear after entering the whole command. The numbers outside the brackets will most likely be different than this example, but it doesn't matter. What this information means is that this is the first ([1]) command started in the background from this terminal and its process id is 12464. Again, this doesn't matter except now one knows what to pass to the disown
command described below.
disown %1
This will keep fusera running even if the terminal is closed. This example passes %1
because a 1
was in the brackets of the output after executing the fusera command above. If a different number is displayed for one while attempting this, that number should be used instead.
Using fusera's unmount command on the folder fusera is mounted to will kill the process, as long as nothing is using the file system at that time.
The <mountpoint>
must be an existing, empty directory, to which the user has read and write permissions.
It is recommended that the mountpoint be a directory owned by the user. Creating the mountpoint in system directories such as /mnt
, /dev
, and /tmp
have special uses in unix systems and should be avoided.
Because of the nature of FUSE systems, only the user who ran fusera will be able to read the files mounted. This can be changed by editing a config file (reference) on the machine to allow_others
, but be warned that there are security implications to be considered: https://github.com/libfuse/libfuse#security-implications.