Skip to content

Commit

Permalink
Updated get_started docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Acribbs committed Jan 1, 2025
1 parent 145bf4e commit 76807f9
Show file tree
Hide file tree
Showing 4 changed files with 115 additions and 10 deletions.
7 changes: 7 additions & 0 deletions docs/getting_started/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,3 +222,10 @@ P.run(statement)

When running the pipeline, make sure to specify `--no-cluster` as a command line option.

### Troubleshooting

- **Common Issues**: If you encounter errors during pipeline execution, ensure that all dependencies are installed and paths are correctly set.
- **Logs**: Check the log files generated during the pipeline run for detailed error messages.
- **Support**: For further assistance, refer to the [CGAT-core documentation](https://cgat-developers.github.io/cgat-core/) or raise an issue on our [GitHub repository](https://github.com/cgat-developers/cgat-core/issues).


32 changes: 30 additions & 2 deletions docs/getting_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,35 @@ The preferred method of installation is using Conda. If you do not have Conda in
conda install -c conda-forge -c bioconda cgatcore
```

### Prerequisites

Before installing `cgatcore`, ensure that you have the following prerequisites:

- **Operating System**: Linux or macOS
- **Python**: Version 3.6 or higher
- **Conda**: Recommended for dependency management

### Troubleshooting

- **Conda Issues**: If you encounter issues with Conda, ensure that the Bioconda and Conda-Forge channels are added and prioritized correctly.
- **Pip Dependencies**: When using pip, manually install any missing dependencies listed in the error messages.
- **Script Errors**: If the installation script fails, check the script's output for error messages and ensure all prerequisites are met.

### Verification

After installation, verify the installation by running:

```bash
python
```

```python
import cgatcore
print(cgatcore.__version__)
```

This should display the installed version of `cgatcore`.

## Pip installation

We recommend installation through Conda because it manages dependencies automatically. However, `cgatcore` is generally lightweight and can also be installed using the `pip` package manager. Note that you may need to manually install other dependencies as needed:
Expand Down Expand Up @@ -86,5 +115,4 @@ To set this variable permanently, add the following line to your `.bashrc` file
export DRMAA_LIBRARY_PATH=/usr/lib/libdrmaa.so.1.0
```

[Conda documentation](https://conda.io)

[Conda documentation](https://conda.io)
67 changes: 63 additions & 4 deletions docs/getting_started/run_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,77 @@ This configuration file allows you to override the default settings. To view the

For an example of configuring a PBSPro workload manager, see the provided [config example](https://github.com/AntonioJBT/pipeline_example/blob/master/Docker_and_config_file_examples/cgat.yml).

The `.cgat.yml` file in your home directory will take precedence over the default cgatcore settings. For instance, adding the following configuration to `.cgat.yml` will implement cluster settings for PBSPro:
The `.cgat.yml` file in your home directory will take precedence over the default cgatcore settings. For instance, adding the following configuration to `.cgat.yml` will implement cluster settings for SLURM:

```yaml
memory_resource: mem

options: -l walltime=00:10:00 -l select=1:ncpus=8:mem=1gb
options: --time=00:10:00 --cpus-per-task=8 --mem=1G

queue_manager: pbspro
queue_manager: slurm

queue: NONE

parallel_environment: "dedicated"
```
This setup specifies memory resource allocation (`mem`), runtime limits (`walltime`), selection of CPU and memory resources, and the use of the PBSPro queue manager, among other settings. Make sure to adjust the parameters according to your cluster environment to optimise the workload manager for your pipeline runs.
This setup specifies memory resource allocation (`mem`), runtime limits (`walltime`), selection of CPU and memory resources, and the use of the PBSPro queue manager, among other settings. Make sure to adjust the parameters according to your cluster environment to optimise the workload manager for your pipeline runs.

## Default Parameters

The following are some of the default parameters in `cgatcore` that can be overridden in your `.cgat.yml` file:

- **memory_resource**: Defines the memory resource name (e.g., `mem` for PBSPro).
- **options**: Specifies additional options for job submission (e.g., `-l walltime=00:10:00`).
- **queue_manager**: The queue manager to be used (e.g., `pbspro`, `slurm`).
- **queue**: The default queue for job submission.
- **parallel_environment**: Specifies the parallel environment settings.

## Additional Parameters

The following additional parameters can also be configured in your `.cgat.yml` file:

- **cluster_queue**: Specifies the cluster queue to use (default: `all.q`).
- **cluster_priority**: Sets the priority of jobs in the cluster queue (default: `-10`).
- **cluster_num_jobs**: Limits the number of jobs to submit to the cluster queue (default: `100`).
- **cluster_memory_resource**: Name of the consumable resource to request memory (default: `mem_free`).
- **cluster_memory_default**: Default amount of memory allocated for each job (default: `4G`).
- **cluster_memory_ulimit**: Ensures requested memory is not exceeded via ulimit (default: `False`).
- **cluster_options**: General cluster options for job submission.
- **cluster_parallel_environment**: Parallel environment for multi-threaded jobs (default: `dedicated`).
- **cluster_queue_manager**: Specifies the cluster queue manager (default: `sge`).
- **cluster_tmpdir**: Directory specification for temporary files on cluster nodes. If set to `False`, the general `tmpdir` parameter is used.

These parameters allow you to customize the cluster environment to better suit your pipeline's needs.

## Example Configurations

### SLURM Configuration

```yaml
memory_resource: mem
options: --time=00:10:00 --cpus-per-task=8 --mem=1G
queue_manager: slurm
queue: NONE
parallel_environment: "dedicated"
```

### Torque Configuration

```yaml
memory_resource: mem
options: -l walltime=00:10:00 -l nodes=1:ppn=8
queue_manager: torque
queue: NONE
parallel_environment: "dedicated"
```

These configurations specify memory allocation, runtime limits, and other settings specific to each workload manager. Adjust these parameters to suit your cluster environment.
19 changes: 15 additions & 4 deletions docs/getting_started/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ This will generate a `pipeline.yml` file containing configuration parameters tha

### Step 3: Run the pipeline

Run the pipeline using the following command:
To run the pipeline, execute the following command in the directory containing the `pipeline.yml` file:

```bash
cgatshowcase transdiffexpres make full -v5 --no-cluster
Expand All @@ -54,7 +54,19 @@ The `--no-cluster` flag will run the pipeline locally if you do not have access
cgatshowcase --help
```

### Step 4: Generate a report
This will start the pipeline execution. Monitor the output for any errors or warnings.

### Step 4: Review Results

Once the pipeline completes, review the output files generated in the `showcase_test_data` directory. These files contain the results of the pseudoalignment.

### Troubleshooting

- **Common Issues**: If you encounter errors during execution, ensure that all dependencies are installed and paths are correctly set.
- **Logs**: Check the log files generated during the pipeline run for detailed error messages.
- **Support**: For further assistance, refer to the [CGAT-core documentation](https://cgat-core.readthedocs.io/en/latest/) or raise an issue on our [GitHub repository](https://github.com/cgat-developers/cgat-core/issues).

### Step 5: Generate a report

The final step is to generate a report to display the output of the pipeline. We recommend using `MultiQC` for generating reports from commonly used bioinformatics tools (such as mappers and pseudoaligners) and `Rmarkdown` for generating custom reports.

Expand All @@ -68,5 +80,4 @@ This will generate a `MultiQC` report in the folder `MultiQC_report.dir/` and an

## Conclusion

This completes the tutorial for running the `transdiffexprs` pipeline for `cgat-showcase`. We hope you find it as useful as we do for writing workflows in Python.

This completes the tutorial for running the `transdiffexprs` pipeline for `cgat-showcase`. We hope you find it as useful as we do for writing workflows in Python.

0 comments on commit 76807f9

Please sign in to comment.