diff --git a/.markdown-link-check.json b/.markdown-link-check.json
index b394a03..1b4f096 100644
--- a/.markdown-link-check.json
+++ b/.markdown-link-check.json
@@ -13,8 +13,8 @@
"replacement": "https://github.com/STRIDES/NIHCloudLabAzure/tree/main/docs"
},
{
- "pattern": "^/tutorials",
- "replacement": "https://github.com/STRIDES/NIHCloudLabAzure/tree/main/tutorials"
+ "pattern": "^/notebooks/",
+ "replacement": "https://github.com/STRIDES/NIHCloudLabAzure/tree/main/notebooks"
}
],
diff --git a/README.md b/README.md
index 9f63a79..a1caeb9 100644
--- a/README.md
+++ b/README.md
@@ -1,100 +1,91 @@
-
-
>This repository falls under the NIH STRIDES Initiative. STRIDES aims to harness the power of the cloud to accelerate biomedical discoveries. To learn more, visit https://cloud.nih.gov.
-# NIH Cloud Lab for Azure
----------------------------------
+# Microsoft Azure Tutorial Resources
+
NIH Cloud Lab’s goal is to make Cloud easy and accessible for you, so that you can spend less time on administrative tasks and focus more on research.
-Use this repository to learn about how to use Azure by exploring the linked resources and walking through the tutorials. If you are a beginner, we suggest you begin with this jumpstart section. If you already have foundational knowledge of Azure and Cloud, feel free to skip ahead to the [tutorials](/tutorials/) section for in-depth examples of how to run specific workflows such as genomic variant calling and medical image analysis.
+Use this repository to learn about how to use Azure by exploring the linked resources and walking through the tutorials. If you are a beginner, we suggest you start with the jumpstart section on the [Cloud Lab website](https://cloud.nih.gov/resources/cloudlab/) before returning here.
+---------------------------------
## Overview of Page Contents
-+ [Getting Started](#gs)
-+ [Overview](#ov)
-+ [Resource Groups](#rg)
-+ [Command Line Tools](#cli)
-+ [Azure Marketplace](#mark)
-+ [Ingest and Store Data](#sto)
-+ [Virtual Machines](#vm)
-+ [Azure Functions](#vm)
-+ [Disk Images](#disk)
-+ [Azure Machine Learning](#sag)
-+ [Clusters](#clu)
-+ [Creating a Conda Environment](#co)
-+ [Azure Container Registry](#con)
-+ [GitHub](#gh)
-+ [Billing and Benchmarking](#bb)
-+ [Cost Optimization](#cost)
-+ [Getting Support](#sup)
-+ [Additional Training](#tr)
-
-## **Getting Started**
-You can learn a lot of what is possible on Azure in the Azure Getting Started [Tutorials Page](https://azure.microsoft.com/en-us/get-started/) and we recommend you go there and explore some of the tutorials on offer. Nonetheless, it can be hard to know where to start if you are new to the cloud. To help you, we thought through some of the most common tasks you will encounter doing cloud-enabled research, and gathered tutorials and guides specific to those topics. We hope the following materials are helpful as you explore using Azure!
-
-## **Overview**
-There are three primary ways you can run analyses using Azure: using **Virtual Machines**, **Jupyter Notebook instances**, and **Managed services**. We give a brief overview of each of these here and go into more detail in the sections below. [Virtual Machines](https://azure.microsoft.com/en-us/products/virtual-machines/) are like desktop computers, but you access them through the cloud console and you get to pick the operating system and the specifications such as CPU and memory. In Azure, these virtual machines are called VMs for short. Jupyter Notebook instances are virtual machines with a preconfigured Jupyter Lab. On Azure these are run through [Azure Machine Learning](https://azure.microsoft.com/en-us/products/machine-learning/#product-overview), which is also Azure's ML/AI platform. You decide what kind of virtual machine you want to 'spin up' and then you can run Juptyer notebooks on those virtual machines. Finally, Serverless services are services that allow you to run things, an analysis, an app, a website, and not have to deal with your own servers (VMs). There are still servers running somewhere, you just don't have to manage them. All you have to do is call a command that runs your analysis in the background, and copies the output files to a storage account. [Azure Batch](https://learn.microsoft.com/en-us/azure/batch/batch-technical-overview) is a common example.
-
-## **Resource Groups**
-A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group. You decide how you want to allocate resources to resource groups based on what makes the most sense for your use case. Generally, add resources that share the same lifecycle to the same resource group so you can easily deploy, update, and delete them as a group. Each resource group stores metadata about the underlying resources. Therefore, when you specify a location for the resource group, you are specifying where that metadata is stored. For compliance reasons, you may need to ensure that your data is stored in a particular region.
-
-To see more information on how to manage resource groups, visit our docs about [Managing Resource Groups](/docs/resource_groups.md).
-
-## **Command Line Tools**
-Most tasks in Azure can be done without the command line, but the command line tools will generally make your life easier in the long run. Command line interface (CLI) tools are those that you use directly in a terminal/shell as opposed to clicking within the Azure portal's graphical user interface (GUI). The primary tool you will need is the Azure CLI, which will allow you to interact with Virtual Machines (VMs) or Storage Accounts (see below) from your local terminal. Instructions for the CLI can be found [here](https://learn.microsoft.com/en-us/cli/azure/). If you are unable to install locally, you can use all the CLI commands from within VM and Machine Learning instances, or from the [Cloud Shell](https://learn.microsoft.com/en-us/azure/cloud-shell/overview).
-
-To install and configure Azure CLI, redirect to [Get started with Azure CLI](https://learn.microsoft.com/en-us/cli/azure/get-started-with-azure-cli), which provides detailed instructions on installation as well as documentation on common Azure CLI commands. Microsoft Azure also has a cloud native service called [Microsoft Genomics](https://www.microsoft.com/en-us/genomics/) which offers cloud implementation of the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) for secondary analysis. Find documentation on how to use Microsoft Genomics [here](https://learn.microsoft.com/en-us/azure/genomics/overview-what-is-genomics).
-
-## **Azure Marketplace**
-The [Microsoft Azure Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/) is an online store in Azure that contains thousands of software applications and services to fit your research needs. For example, you can find VMs configured for Microsoft Genomics or NVIDIA machine learning. Within Cloud Lab, the most common use case for the Marketplace will likely be [CycleCloud](https://learn.microsoft.com/en-us/azure/cyclecloud/tutorials/tutorial?view=cyclecloud-8), which is Azure's High Performance Computing solution. If interested in CycleCloud, please contact us at `CloudLab@nih.gov` so we can help set this up in your Cloud Lab account.
-
-## **Ingest and Store Data using Azure Storage Accounts**
-Microsoft's object storage solution for the cloud is called Azure Blob. Blob is optimized for storing massive amounts of unstructured data. Azure also offers many other storage solutions listed [here](https://azure.microsoft.com/en-us/products/category/storage/). To get started you must create a [Storage Account](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-create?tabs=azure-portal). Users can grant limited access to Azure storage resources using [Shared Access Signatures](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview)(SAS). You can also read our guide to Storage Accounts and moving data in and out of Cloud Lab [here](/docs/create_storage_account.md). This [Microsoft guide](https://microsoft.github.io/Genomics-Community/mydoc_data_migration.html) for moving genomic data is also very helpful.
-
-## **Virtual Machines**
-Virtual machines (VMs) on Azure can be accessed via SSH or from the Azure portal. More information on VMs can be found [here](https://azure.microsoft.com/en-us/products/virtual-machines/#overview) as well as this [guide](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/ssh-from-windows) on how to use SSH keys with windows in Azure. To view the different types of VMs available in Azure check out the [Virtual Machine Series](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/series/).
-
-You can also spin up preconfigured VMs, such as the Azure Data Science VM, which has many data science tools preinstalled and may save you time on environment set up. Read more in [our docs](/docs/Azure_Data_Science_VMs.md).
-
-Also, for best VM provisioning experience, please see this link for VM best practices in [our docs](/docs/Virtual-machine-best-practices.md).
-
-## **Azure Functions**
-Azure Functions is a serverless solution that allows you to write less code, maintain less infrastructure, and save on costs. Instead of worrying about deploying and maintaining servers, the cloud infrastructure provides all the up-to-date resources needed to keep your applications running. For more information click [here](https://learn.microsoft.com/en-us/azure/azure-functions/). In general, you can consider functions for automating workflows.
-
-## **Disk Images**
-Part of the power of virtual machines is that they offer a blank slate for you to configure as desired. [Azure VM Image Builder](https://azure.microsoft.com/en-us/products/image-builder/#overview) simplifies the image building process allowing for custom built images to be saved. You can later redeploy these images to spin up a new machine with data or environments already installed.
-
-## **Launch a Machine Learning Workspace (Jupyter Environment)**
-[Azure Machine Learning studio](https://learn.microsoft.com/en-us/azure/machine-learning/overview-what-is-azure-machine-learning) is Azure's ML/AI solution. ML studio allows for you to run your own code in managed Jupyter notebooks. Follow the [Quickstart](https://learn.microsoft.com/en-us/azure/machine-learning/quickstart-run-notebooks) page to begin running Jupyter Notebooks in studio. Note that you will need to start and stop your compute environment, which is run separately from the notebook. Once in the AzureML portal, go to compute, then you can select Jupyter, Notebooks, or VS Code, which means a lot of flexibility in the way you utilize the compute environment.
-
-The Azure file share account of your Azure Machine Learning workspace is mounted as a drive on the compute instance. This drive is the default working directory for Jupyter, Jupyter Labs, RStudio, and Posit Workbench. This means that the notebooks and other files you create in Jupyter, JupyterLab, RStudio, or Posit are automatically stored on the file share and available to use in other compute instances as well.
-
-If you are running complex ML models, look at this Microsoft [blog post](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/azureml-observability-a-scalable-and-extensible-solution-for-ml/ba-p/3474066) for an overview of Microsoft's overvability solution. The source code is [here](https://github.com/microsoft/AzureML-Observability).
-
-## **Clusters**
-One great thing about the cloud is its ability to scale with demand. When you submit a job to a traditional cluster, you specify up front how many CPUs and memory you want to give to your job, and you may over- or under-utilize these resources. With managed resources like serverless and clusters you can leverage a feature called autoscaling, where the compute resources will scale up or down with demand. This is more efficient and keeps costs down when demand is low, but prevents latency when demand is high (think about workshop participants all submitting jobs at the same time to a cluster). For most users of Cloud Lab, the best way to leverage scaling is to use Azure Batch, but in some cases, maybe for a whole lab group or large project, it may make sense to spin up a [Kubernetes cluster](https://azure.microsoft.com/en-us/products/kubernetes-service/).
-
-If you are interested in using a more traditional scheduler like SLURM or Sun Grid Engine, you can use Azure CycleCloud, which has an easy to use GUI as well as CLI options. If interested in CycleCloud, please contact us at `CloudLab@nih.gov` and we will provision a CycleCloud instance for you.
-
-## **Creating a Conda Environment**
-Virtual environments allow you to manage package versions without having package conflicts. For example, if you needed Python 3 for one analysis, but Python 2.7 for another, you could create separate environments to use the two versions of Python. One of the most popular package managers used for creating virtual environments is the [conda package manager](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html#:~:text=A%20conda%20environment%20is%20a,NumPy%201.6%20for%20legacy%20testing). We also made a quick guide that you can reference [here](/docs/create_conda_env.md)
-
-## **Managing Containers with Azure Container Registry**
-You can host or pull containers with Azure Container Registry. See [Microsoft's documentation](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-get-started-portal?tabs=azure-cli) on how to use this service.
-
-## **GitHub**
-GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere. This [tutorial](https://docs.github.com/en/get-started/quickstart/hello-world) teaches you GitHub essentials like repositories, branches, commits, and pull requests. You'll create your own Hello World repository and learn GitHub's pull request workflow, a popular way to create and review code. Since Microsoft owns GitHub, it integrates nicely with Azure.
-
-## **Billing and Benchmarking**
-Many Cloud Lab users are interested in understanding how to estimate the price of a large-scale project using a reduced sample size. Generally, you should be able to benchmark with a few representative samples to get an idea of time and cost required for a larger scale project. Follow our [Cost Management Guide](/docs/billing_and_cost_management.md) to see how to tag specific resources for workflow benchmarking.
-
-In terms of cost, the best way to estimate costs is to use the Azure pricing calculator [here](https://azure.microsoft.com/en-us/pricing/calculator/) for an initial figure, which is a pricing tool that forecasts costs based on products and usage. Then, you can run some benchmarks and double check that everything is acting as you expect. See [our docs](/docs/Using_The_Azure_Price_Calculator.md) on best practices for using this tool.
-
-## **Cost Optimization**
-Follow our [Cost Management Guide](/docs/billing_and_cost_management.md) for details on how to monitor costs, set up budget alerts, and cost-benchmark specific analyses using resource tagging. In addition, here are a few tips to help you stay on budget. You can also configure auto-shutdown on your VM instances following [this guide](/docs/auto-shutdown-instance.md) to prevent you from accidentally leaving instances running.
-
-## **Getting Support**
-As part of your participation in Cloud Lab you will be added to the Cloud Lab Teams channel where you can chat with other Cloud Lab users, and gain support from the Cloud Lab team. For NIH Intramural users, you can submit a support ticket to Service Now. For issues related to the cloud environment, feel free to request [Azure Enterprise Support](/docs/request_enterprise_support.md). For issues related to scientific use cases, such as, `how can I best run an RNAseq pipeline in Azure?`, email us at `CloudLab@nih.gov`.
-
-## **Additional Training**
-This repo only scratches the surface of what can be done in the cloud. If you are interested in additional cloud training opportunities, please visit the [STRIDES Training page](https://cloud.nih.gov/training/). For more information on the STRIDES Initiative at the NIH, visit [our website](https://cloud.nih.gov) or contact the NIH STRIDES team at STRIDES@nih.gov for more information.
++ [Artificial Intelligence](#ai)
++ [Clinical Informatics](#ci)
++ [Medical Imaging](#mi)
++ [Genomics on Azure](#bio)
++ [GWAS](#gwas)
++ [BLAST](#blast)
++ [VCF Query](#vcf)
++ [RNAseq](#rna)
++ [scRNAseq](#sc)
++ [Long Read Sequencing Analysis](#long)
++ [Open Data](#open)
+
+## **Artificial Intelligence**
+Machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and models that enable computers to learn from and make predictions or decisions based on data, without being explicitly programmed. Artificial intelligence and machine learning algorithms are being applied to a variety of biomedical research questions, ranging from image classification to genomic variant calling. Azure offers AI services through Azure AI Studio and Azure Machine Learning.
+
+See our suite of tutorials to learn more about [Gen AI on Azure](/notebooks/GenAI/) that highlight Azure products such as [Azure AI Studio](/notebooks/GenAI/Azure_AI_Studio_README.md), [Azure OpenAI](/notebooks/GenAI/Azure_Open_AI_README.md) and [Azure AI Search](/notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb) and external tools like [Langchain](/notebooks/GenAI/notebooks/AzureAIStudio_langchain.ipynb). These notebooks walk you through how to deploy, train, and query models, as well as how to implement techniques like [Retrieval-Augmented Generation (RAG)](/notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb). If you are interested in configuring a model to work with structured data like csv or json files, we've created tutorials that walk you through how to index your csv using the [Azure UI](/docs/create_index_from_csv.md) and query your database using a [notebook within Azure ML](/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb). We also have another [tutorial that runs all the necessary steps directly from a notebook](/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_notebook.ipynb).
+
+ ## **Clinical Informatics with FHIR**
+Azure Health Data Services is a set of services that enables you to store, process, and analyze medical data in Azure. These services are designed to help organizations quickly connect disparate health data sources and formats, such as structured, imaging, and device data, and normalize it to be persisted in the cloud. At its core, Azure Health Data Services possesses the ability to transform and ingest data into FHIR (Fast Healthcare Interoperability Resources) format. This allows you to transform health data from legacy formats, such as HL7v2 or CDA, or from high-frequency IoT data in device proprietary formats to FHIR. This makes it easier to connect data stored in Azure Health Data Services with services across the Azure ecosystem, like Azure Synapse Analytics, and Azure Machine Learning (Azure ML).
+
+Azure Health Data Services includes support for multiple health data standards for the exchange of structured data, and the ability to deploy multiple instances of different service types (FHIR, DICOM, and MedTech) that seamlessly work with one another. Services deployed within a workspace also share a compliance boundary and common configuration settings. The product scales automatically to meet the varying demands of your workloads, so you spend less time managing infrastructure and more time generating insights from health data.
+
+Copying healthcare data stored in Azure FHIR Server to Synapse Analytics allows researchers to leverage a cloud-scale data warehousing and analytics tool to extract insights from their data as well as build scalable research pipelines.
+For information on how to perform this export and downstream analytics, please visit [this repository](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/healthcare-apis/fhir/copy-to-synapse.md).
+
+You can also see hands-on examples of using [FHIR on Azure](https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics), but note that you will need to supply your own VCF files as these are not provided with the tutorial content.
+
+## **Medical Imaging Analysis**
+Medical imaging analysis requires the analysis of large image files and often requires elastic storage and accelerated computing. Microsoft Azure offers cloud-based medical imaging analysis capabilities through its Azure Healthcare APIs and Azure Medical Imaging solutions. Azure's DICOM Service allows for the secure storage, management, and processing of medical images in the cloud, using industry standard DICOM (Digital Imaging and Communications in Medicine) format. The DICOM Service provides features like high availability, disaster recovery, and scalable storage options, making it an ideal solution for pipelines that need to store, manage, and analyze large amounts of medical imaging data. In addition, the server integrates with other Azure services like Azure ML, facilitating the use of advanced machine learning algorithms for image analysis tasks such as object detection, segmentation, and classification. Read about how to deploy the service [here](https://learn.microsoft.com/en-us/azure/healthcare-apis/dicom/deploy-dicom-services-in-azure).
+
+Microsoft has several medical imaging notebooks that showcase different medical imaging use-cases on Azure Machine Learning. These notebooks demonstrate various data science techniques such as manual model development with PyTorch, automated machine learning, and MLOPS-based examples for automating the machine learning lifecycle in medical use cases, including retraining.
+These notebooks are available [here](https://github.com/Azure/medical-imaging). Make sure you select a kernel that includes Pytorch else the install of dependencies can be challenging. Note also that you need to use a GPU VM for most of the notebook cells, but you can create several compute environments and switch between them as needed. Be sure to shut them off when you are finished.
+
+For Cloud Lab users interested in multi-modal clinical informatics, DICOMcast provides the ability to synchronize data from a DICOM service to a FHIR service, allowing users to integrate clinical and imaging data. DICOMcast expands the use cases for health data by supporting both a streamlined view of longitudinal patient data and the ability to effectively create cohorts for medical studies, analytics, and machine learning. For more information on how to utilize DICOMcast please visit Microsoft’s [documentation](https://learn.microsoft.com/en-us/azure/healthcare-apis/dicom/dicom-cast-overview) or the open-source [GitHub repository](https://github.com/microsoft/dicom-server/blob/main/docs/quickstarts/deploy-dicom-cast.md).
+
+For users hoping to train deep learning models on imaging data, InnerEye-DeepLearning (IE-DL) is a toolbox that Microsoft developed for easily training deep learning models on 3D medical images. Simple to run both locally and in the cloud with Azure Machine Learning, it allows users to train and run inference on the following:
+• Segmentation models
+• Classification and regression models
+• Any PyTorch Lightning model, via a bring-your-own-model setup
+This project exists in a separate [GitHub repository](https://github.com/microsoft/InnerEye-DeepLearning).
+
+## **Microsoft Genomics**
+Microsoft has several genomics-related offerings that will be useful to many Cloud Lab users. For a broad overview, visit the [Microsoft Genomics Community site](https://microsoft.github.io/Genomics-Community/index.html). You can also get an overview of different execution options from [this blog](https://techcommunity.microsoft.com/t5/healthcare-and-life-sciences/genomic-workflow-managers-on-microsoft-azure/ba-p/3747052), and a detailed analysis for Nextflow with AWS Batch at [this blog](https://techcommunity.microsoft.com/t5/healthcare-and-life-sciences/rna-sequencing-analysis-on-azure-using-nextflow-configuration/ba-p/3738854). We highlight a few key services here:
++ [Genomics Notebooks](https://github.com/microsoft/genomicsnotebook): These example notebooks highlight many common use cases in genomics research. The Bioconductor/Rstudio notebook will not work in Cloud Lab. To run Rstudio, look at [Posit Workbench from the Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/rstudio-5237862.rstudioserverprostandard).
++ [Cromwell on Azure](https://github.com/microsoft/CromwellOnAzure): Documentation on how to spin up the resources needed to run Cromwell on Azure. Note that this service will not work within Cloud Lab because you need high-level permissions, but we list it here for demonstration purposes.
++ [Microsoft Genomics](https://learn.microsoft.com/en-us/azure/genomics/quickstart-run-genomics-workflow-portal): Run BWA and GATK using this managed service. Note that it uses Python 2.7 and thus is not compatible with AzureML (which uses Python 3), but you can run it from any other shell environment.
++ [Nextflow on Azure](https://microsoft.github.io/Genomics-Community/mydoc_nextflow.html): Run Nextflow workflows using Azure Batch.
++ [NVIDIA Parabricks for Secondary Genomics Analysis on Azure](https://techcommunity.microsoft.com/t5/healthcare-and-life-sciences/benchmarking-the-nvidia-clara-parabricks-for-secondary-genomics/ba-p/3722434). Follow this guide to run Parabricks on a VM by pulling the Docker container directly from NVIDIA.
+
+## **Genome Wide Association Studies**
+Genome-wide association studies (GWAS) are large-scale investigations that analyze the genomes of many individuals to identify common genetic variants associated with traits, diseases, or other phenotypes.
+- This [NIH CFDE written tutorial](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud
+) walks you through running a simple GWAS on AWS, thus we converted it to Azure in [this notebook](/notebooks/GWAS). Note that the CFDE page has a few other bioinformatics related tutorials like BLAST and Illumina read simulation.
+- This blog post [illustrates some of the costs associated](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-to-accelerate-genome-wide-analysis-study/ba-p/2644120) with running GWAS on Azure
+
+## **NCBI BLAST+**
+NCBI BLAST (Basic Local Alignment Search Tool) is a widely used bioinformatics program provided by the National Center for Biotechnology Information (NCBI) that compares nucleotide or protein sequences against a large database to identify similar sequences and infer evolutionary relationships, functional annotations, and structural information.
+- [This Microsoft Blog](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/running-ncbi-blast-on-azure-performance-scalability-and-best/ba-p/2410483) explains how to optimize BLAST analyses on Azure VMs. Feel free to install BLAST+ on a VM or an AzureML notebook and run queries there.
+
+## **Query a VCF file in Azure Synapse**
+- You can use SQL to rapidly query a VCF file in Azure Synapse. The requires converting the file from VCF to Parquet format, a common format for databases. Read more about how to do this in Azure on [this Microsoft blog](https://techcommunity.microsoft.com/t5/healthcare-and-life-sciences/genomic-data-in-parquet-format-on-azure/ba-p/3150554). Although the notebooks for this tutorial are bundled with the other genomics notebooks, to get them to work you will need to use Azure Databricks or Synapse Analytics, not AzureML.
+
+## **RNAseq**
+RNA-seq analysis is a high-throughput sequencing method that allows the measurement and characterization of gene expression levels and transcriptome dynamics. Workflows are typically run using workflow managers, and final results can often be visualized in notebooks.
+- You can run this [Nextflow on Azure tutorial](https://microsoft.github.io/Genomics-Community/mydoc_nextflow.html) for RNAseq a variety of ways on Azure. Following the instructions outlined above, you could use Virtual Machines, Azure Machine Learning, or Azure Batch.
+- For a notebook version of a complete RNAseq pipeline from Fastq to Salmon quantification from the NIGMS Sandbox Program use this [notebook](/notebooks/rnaseq-myco-tutorial-main), which we re-wrote to work on Azure.
+
+## **Single Cell RNAseq**
+Single-cell RNA sequencing (scRNA-seq) is a technique that enables the analysis of gene expression at the individual cell level, providing insights into cellular heterogeneity, identifying rare cell types, and revealing cellular dynamics and functional states within complex biological systems.
+- This [NVIDIA blog](https://developer.nvidia.com/blog/accelerating-single-cell-genomic-analysis-using-rapids/) details how to run an accelerated scRNAseq pipeline using RAPIDS. You can find a link to the GitHub that has lots of example notebooks [here](https://github.com/clara-parabricks/rapids-single-cell-examples). For each example use case they show some nice benchmarking data with time and cost for CPU vs. GPU machine types on AWS. You will see that most runs cost less than $1.00 with GPU machines (priced on AWS). If you want a CPU version that users Scanpy you can use this [notebook](https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/hlca_lung_cpu_analysis.ipynb). Pay careful attention to the environment setup as there are a lot of dependencies for these notebooks. Create a conda environment in the terminal, then run the notebook. Consider using [mamba](https://github.com/mamba-org/mamba) to speed up environment creation. We created a [guide](/docs/create_conda_env.md) for conda environment set up as well.
+
+## **Long Read Sequence Analysis**
+Long read DNA sequence analysis involves analyzing sequencing reads typically longer than 10 thousand base pairs (bp) in length, compared with short read sequencing where reads are about 150 bp in length.
+Oxford Nanopore has a pretty complete offering of notebook tutorials for handling long read data to do a variety of things including variant calling, RNAseq, Sars-Cov-2 analysis and much more. Access the notebooks [here](https://labs.epi2me.io/nbindex/) and on [GitHub](https://github.com/epi2me-labs). These notebooks expect you are running locally and accessing the epi2me notebook server. To run them in Cloud Lab, skip the first cell that connects to the server and then the rest of the notebook should run correctly, with a few tweaks. Oxford Nanopore also offers a host of [Nextflow workflows](https://labs.epi2me.io/wfindex/) that will allow you to run a variety of long read pipelines.
+
+## **Open Data**
+These publicly available datasets can save you time on data discovery and preparation by being curated and ready to use in your workflows.
++ The [COVID-19 Data Lake](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-covid-19-data-lake) contains COVID-19 related datasets from various sources. It covers testing and patient outcome tracking data, social distancing policy, hospital capacity and mobility.
++ In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the [COVID-19 Open Research Dataset (CORD-19)](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-covid-19-open-research?tabs=azure-storage). This dataset is a free resource of over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset mobilizes researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.
++ [The Genomics Data Lake](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-genomics-data-lake) provides various public datasets that you can access for free and integrate into your genomics analysis workflows and applications. The datasets include genome sequences, variant info, and subject/sample metadata in BAM, FASTA, VCF, CSV file formats: [Illumina Platinum Genomes](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-illumina-platinum-genomes), [Human Reference Genomes](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-human-reference-genomes), [ClinVar Annotations](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-clinvar-annotations), [SnpEff](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-snpeff), [Genome Aggregation Database (gnomAD)](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-gnomad), [1000 Genomes](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-1000-genomes), [OpenCravat](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-open-cravat), [ENCODE](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-encode), [GATK Resource Bundle](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-gatk-resource-bundle).
diff --git a/docs/create_index_from_csv.md b/docs/create_index_from_csv.md
index 81d610f..bc1563f 100644
--- a/docs/create_index_from_csv.md
+++ b/docs/create_index_from_csv.md
@@ -69,7 +69,7 @@ Navigate to `Indexes` on the left panel and wait until your index shows as many
![Check index](/docs/images/10_check_index.png)
-And that is it! Now return to [the tutorial notebook to run queries against this csv using GPT-4]( /tutorials/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb).
+And that is it! Now return to [the tutorial notebook to run queries against this csv using GPT-4]( /notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb).
diff --git a/tutorials/CycleCloud/CycleCloud_CustomRole.json b/envs/CycleCloud_CustomRole.json
similarity index 100%
rename from tutorials/CycleCloud/CycleCloud_CustomRole.json
rename to envs/CycleCloud_CustomRole.json
diff --git a/tutorials/notebooks/GWAS/GWAS_coat_color.ipynb b/notebooks/GWAS/GWAS_coat_color.ipynb
similarity index 70%
rename from tutorials/notebooks/GWAS/GWAS_coat_color.ipynb
rename to notebooks/GWAS/GWAS_coat_color.ipynb
index 21b9b35..fd6bf6d 100644
--- a/tutorials/notebooks/GWAS/GWAS_coat_color.ipynb
+++ b/notebooks/GWAS/GWAS_coat_color.ipynb
@@ -2,447 +2,571 @@
"cells": [
{
"cell_type": "markdown",
+ "id": "7a244bb3",
+ "metadata": {},
+ "source": [
+ "# Runing Genome Wide Association Studies in the cloud"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
"source": [
- "# GWAS in the cloud\n",
- "We adapted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
+ "## Overview\n",
+ "Genome Wide Association Study analyses are conducted via the command line using mostly BASH commands, and then plotting often done using Python or R. Here, we adapted an [NIH CFDE tutorial](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
+ "\n",
"Most of this notebook is bash, but expects that you are using a Python kernel, until step 3, plotting, you will need to switch your kernel to R."
- ],
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
- "id": "7a244bb3"
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have provisioned a compute environment in Azure ML Studio"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {},
"source": [
- "## 1. Setup\n",
- "### Download the data\n",
- "use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook"
- ],
+ "## Learning objectives\n",
+ "+ Learn how to run GWAS analysis and visualize results in Azure AI Studio"
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
- "id": "8fbf6304"
+ "source": [
+ "## Get started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8fbf6304",
+ "metadata": {},
+ "source": [
+ "### Download the data\n",
+ "Use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "8ec900bd",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"%%bash\n",
"mkdir GWAS\n",
"curl -LO https://de.cyverse.org/dl/d/E0A502CC-F806-4857-9C3A-BAEAA0CCC694/pruned_coatColor_maf_geno.vcf.gz\n",
"curl -LO https://de.cyverse.org/dl/d/3B5C1853-C092-488C-8C2F-CE6E8526E96B/coatColor.pheno"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "8ec900bd"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "4d43ae73",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"%%bash\n",
"mv *.gz GWAS\n",
"mv *.pheno GWAS\n",
"ls GWAS"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "4d43ae73"
+ ]
},
{
"attachments": {},
"cell_type": "markdown",
+ "id": "28aadbf8",
+ "metadata": {},
"source": [
- "### Install dependencies\n",
+ "### Install packages\n",
"Here we install mamba, which is faster than conda. You could also skip this install and just use conda since that is preinstalled in the kernel."
- ],
- "metadata": {},
- "id": "28aadbf8"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "b3ba3eef",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"%%bash\n",
"curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
"bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "b3ba3eef"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#add to your path\n",
- "import os\n",
- "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
- ],
- "outputs": [],
"execution_count": null,
+ "id": "ae20d01c",
"metadata": {
"gather": {
"logged": 1686580882939
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "ae20d01c"
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "b219074a",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"! mamba install -y -c bioconda plink vcftools"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "b219074a"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "## 2. Analyze"
- ],
+ "id": "013d960d",
"metadata": {},
- "id": "3de2fc4c"
- },
- {
- "cell_type": "markdown",
"source": [
"### Make map and ped files from the vcf file to feed into plink"
- ],
- "metadata": {},
- "id": "013d960d"
+ ]
},
{
"cell_type": "code",
- "source": [
- "cd GWAS"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "e91c7a01",
"metadata": {
"gather": {
"logged": 1686579597925
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "e91c7a01"
+ "outputs": [],
+ "source": [
+ "cd GWAS"
+ ]
},
{
"cell_type": "code",
- "source": [
- "ls GWAS"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "9b770f7f",
"metadata": {
"gather": {
"logged": 1686579600325
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "9b770f7f"
+ "outputs": [],
+ "source": [
+ "ls GWAS"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "6570875d",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --plink --out coatColor"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "6570875d"
+ ]
},
{
"cell_type": "markdown",
+ "id": "b9a38761",
+ "metadata": {},
"source": [
"### Create a list of minor alleles.\n",
"For more info on these terms, look at step 2 at https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/analyze/"
- ],
- "metadata": {},
- "id": "b9a38761"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#unzip vcf\n",
- "! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --recode --out pruned_coatColor_maf_geno"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "6c868a67",
"metadata": {
"gather": {
"logged": 1686581972147
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "6c868a67"
+ "outputs": [],
+ "source": [
+ "#unzip vcf\n",
+ "! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --recode --out pruned_coatColor_maf_geno"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#create list of minor alleles\n",
- "! cat pruned_coatColor_maf_geno.recode.vcf | awk 'BEGIN{FS=\"\\t\";OFS=\"\\t\";}/#/{next;}{{if($3==\".\")$3=$1\":\"$2;}print $3,$5;}' > minor_alleles"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "8e11f991",
"metadata": {
"gather": {
"logged": 1686581979545
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "8e11f991"
+ "outputs": [],
+ "source": [
+ "#create list of minor alleles\n",
+ "! cat pruned_coatColor_maf_geno.recode.vcf | awk 'BEGIN{FS=\"\\t\";OFS=\"\\t\";}/#/{next;}{{if($3==\".\")$3=$1\":\"$2;}print $3,$5;}' > minor_alleles"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "8cff47e3",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"! head minor_alleles"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "8cff47e3"
+ ]
},
{
"cell_type": "markdown",
+ "id": "56d901c7",
+ "metadata": {},
"source": [
"### Run quality controls"
- ],
- "metadata": {},
- "id": "56d901c7"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#calculate missingness per locus\n",
- "! plink --file coatColor --make-pheno coatColor.pheno \"yellow\" --missing --out miss_stat --noweb --dog --reference-allele minor_alleles --allow-no-sex --adjust"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "dafa14a6",
"metadata": {
"gather": {
"logged": 1686582023237
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "dafa14a6"
+ "outputs": [],
+ "source": [
+ "#calculate missingness per locus\n",
+ "! plink --file coatColor --make-pheno coatColor.pheno \"yellow\" --missing --out miss_stat --noweb --dog --reference-allele minor_alleles --allow-no-sex --adjust"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#take a look at lmiss, which is the per locus rates of missingness\n",
- "! head miss_stat.lmiss"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "5cf5f51b",
"metadata": {
"gather": {
"logged": 1686582030150
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "5cf5f51b"
+ "outputs": [],
+ "source": [
+ "#take a look at lmiss, which is the per locus rates of missingness\n",
+ "! head miss_stat.lmiss"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#peek at imiss which is the individual rates of missingness\n",
- "! head miss_stat.imiss"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "915bb263",
"metadata": {
"gather": {
"logged": 1686582034753
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "915bb263"
+ "outputs": [],
+ "source": [
+ "#peek at imiss which is the individual rates of missingness\n",
+ "! head miss_stat.imiss"
+ ]
},
{
"cell_type": "markdown",
+ "id": "4c11ca71",
+ "metadata": {},
"source": [
"### Convert to plink binary format"
- ],
- "metadata": {},
- "id": "4c11ca71"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "3b8f2d7f",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"! plink --file coatColor --allow-no-sex --dog --make-bed --noweb --out coatColor.binary"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "3b8f2d7f"
+ ]
},
{
"cell_type": "markdown",
+ "id": "e36f6cd7",
+ "metadata": {},
"source": [
"### Run a simple association step (the GWAS part!)"
- ],
- "metadata": {},
- "id": "e36f6cd7"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "f926ef9b",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"! plink --bfile coatColor.binary --make-pheno coatColor.pheno \"yellow\" --assoc --reference-allele minor_alleles --allow-no-sex --adjust --dog --noweb --out coatColor"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "f926ef9b"
+ ]
},
{
"cell_type": "markdown",
+ "id": "b397d484",
+ "metadata": {},
"source": [
"### Identify statistical cutoffs\n",
"This code finds the equivalent of 0.05 and 0.01 p value in the negative-log-transformed p values file. We will use these cutoffs to draw horizontal lines in the Manhattan plot for visualization of haplotypes that cross the 0.05 and 0.01 statistical threshold (i.e. have a statistically significant association with yellow coat color)"
- ],
- "metadata": {},
- "id": "b397d484"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "id": "b94e1e2a",
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
"source": [
"%%bash\n",
"unad_cutoff_sug=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.05' | head -n1 | awk '{print $3}')\n",
"unad_cutoff_conf=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.01' | head -n1 | awk '{print $3}')"
- ],
- "outputs": [],
- "execution_count": null,
- "metadata": {},
- "id": "b94e1e2a"
+ ]
},
{
"cell_type": "markdown",
+ "id": "1f52e97c",
+ "metadata": {},
"source": [
- "## 3. Plotting\n",
+ "### Plotting\n",
"In this tutorial, plotting is done in R. Azure gets a bit funny about running these R commands, so we recommend just runnning the rest of the commands in the Terminal. Run `R` before running the commands. Otherwise you can just download the inputs and run locally in R studio."
- ],
- "metadata": {},
- "id": "1f52e97c"
+ ]
},
{
"cell_type": "markdown",
+ "id": "effb5acd",
+ "metadata": {},
"source": [
"### Install qqman"
- ],
- "metadata": {},
- "id": "effb5acd"
+ ]
},
{
"cell_type": "code",
- "source": [
- "install.packages('qqman', contriburl=contrib.url('http://cran.r-project.org/'))"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "60feed89",
"metadata": {
"gather": {
"logged": 1686582094642
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "60feed89"
+ "outputs": [],
+ "source": [
+ "install.packages('qqman', contriburl=contrib.url('http://cran.r-project.org/'))"
+ ]
},
{
"cell_type": "markdown",
+ "id": "d3f1fcd2",
+ "metadata": {},
"source": [
"### Run the plotting function"
- ],
- "metadata": {},
- "id": "d3f1fcd2"
+ ]
},
{
"cell_type": "code",
- "source": [
- "#make sure you are still CD in GWAS, when you change kernel it may reset to home\n",
- "setwd('GWAS')"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "a7e8cd2b",
"metadata": {
"gather": {
"logged": 1686584355516
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "a7e8cd2b"
+ "outputs": [],
+ "source": [
+ "#make sure you are still CD in GWAS, when you change kernel it may reset to home\n",
+ "setwd('GWAS')"
+ ]
},
{
"cell_type": "code",
- "source": [
- "require(qqman)"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "7946a3a7",
"metadata": {
"gather": {
"logged": 1686584356532
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "7946a3a7"
+ "outputs": [],
+ "source": [
+ "require(qqman)"
+ ]
},
{
"cell_type": "code",
- "source": [
- "data=read.table(\"coatColor.assoc\", header=TRUE)"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "0d28ef2c",
"metadata": {
"gather": {
"logged": 1686584364339
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "0d28ef2c"
+ "outputs": [],
+ "source": [
+ "data=read.table(\"coatColor.assoc\", header=TRUE)"
+ ]
},
{
"cell_type": "code",
- "source": [
- "data=data[!is.na(data$P),]"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "8e5207be",
"metadata": {
"gather": {
"logged": 1686584368241
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "8e5207be"
+ "outputs": [],
+ "source": [
+ "data=data[!is.na(data$P),]"
+ ]
},
{
"cell_type": "code",
- "source": [
- "manhattan(data, p = \"P\", col = c(\"blue4\", \"orange3\"),\n",
- " suggestiveline = 12,\n",
- " genomewideline = 15,\n",
- " chrlabs = c(1:38, \"X\"), annotateTop=TRUE, cex = 1.2)"
- ],
- "outputs": [],
"execution_count": null,
+ "id": "6330b1e0",
"metadata": {
"gather": {
"logged": 1686584371278
+ },
+ "vscode": {
+ "languageId": "r"
}
},
- "id": "6330b1e0"
+ "outputs": [],
+ "source": [
+ "manhattan(data, p = \"P\", col = c(\"blue4\", \"orange3\"),\n",
+ " suggestiveline = 12,\n",
+ " genomewideline = 15,\n",
+ " chrlabs = c(1:38, \"X\"), annotateTop=TRUE, cex = 1.2)"
+ ]
},
{
"cell_type": "markdown",
+ "id": "26787d84",
+ "metadata": {},
"source": [
"In our graph, haplotypes in four parts of the genome (chromosome 2, 5, 28 and X) are found to be associated with an increased occurrence of the yellow coat color phenotype.\n",
"\n",
"The top associated mutation is a nonsense SNP in the gene MC1R known to control pigment production. The MC1R allele encoding yellow coat color contains a single base change (from C to T) at the 916th nucleotide."
- ],
+ ]
+ },
+ {
+ "cell_type": "markdown",
"metadata": {},
- "id": "26787d84"
+ "source": [
+ "## Conclusions\n",
+ "You learned here how to run and visualize GWAS results using a notebook in Azure ML Studio."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Clean Up\n",
+ "Make sure you stop your compute instance and if desired, delete the resource group associated with this tutorial."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
}
],
"metadata": {
+ "kernel_info": {
+ "name": "ir"
+ },
"kernelspec": {
- "name": "ir",
+ "display_name": "R",
"language": "R",
- "display_name": "R"
+ "name": "ir"
},
"language_info": {
- "name": "R",
"codemirror_mode": "r",
- "pygments_lexer": "r",
- "mimetype": "text/x-r-source",
"file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
"version": "4.2.2"
},
"microsoft": {
@@ -450,13 +574,10 @@
"ms_spell_check_language": "en"
}
},
- "kernel_info": {
- "name": "ir"
- },
"nteract": {
"version": "nteract-front-end@1.0.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
-}
\ No newline at end of file
+}
diff --git a/tutorials/notebooks/GenAI/Azure_AI_Studio_README.md b/notebooks/GenAI/Azure_AI_Studio_README.md
similarity index 98%
rename from tutorials/notebooks/GenAI/Azure_AI_Studio_README.md
rename to notebooks/GenAI/Azure_AI_Studio_README.md
index 5a0fbbd..8c5573b 100644
--- a/tutorials/notebooks/GenAI/Azure_AI_Studio_README.md
+++ b/notebooks/GenAI/Azure_AI_Studio_README.md
@@ -5,11 +5,11 @@ Microsoft Azure migrated the AI front end from Azure OpenAI to Azure AI Studio.
Welcome to this repository, a comprehensive collection of examples that will help you chat with your data using the Azure OpenAI Studio Playground, create highly efficient large language model prompts, and build Azure OpenAI embeddings.
-The purpose of this workshop is to equip participants with the necessary skills to make the most out of the Azure OpenAI Playground, Prompt Engineering, and Azure OpenAI Embeddings in Python. You can view in-depth info on these topics in the [workshop slides](/tutorials/notebooks/GenAI/search_documents/aoai_workshop_content.pdf).
+The purpose of this workshop is to equip participants with the necessary skills to make the most out of the Azure OpenAI Playground, Prompt Engineering, and Azure OpenAI Embeddings in Python. You can view in-depth info on these topics in the [workshop slides](/notebooks/GenAI/search_documents/aoai_workshop_content.pdf).
You can also learn a lot about the details of using Azure AI at this [site](https://azure.microsoft.com/en-us/products/ai-studio).
-We recommend you 1) go through the steps in this README, 2) complete the general notebook called `notebooks/AzureOpenAI_embeddings.ipynb`, then 3) explore the other notebooks at [this directory](/tutorials/notebooks/GenAI/notebooks)
+We recommend you 1) go through the steps in this README, 2) complete the general notebook called `notebooks/AzureOpenAI_embeddings.ipynb`, then 3) explore the other notebooks at [this directory](/notebooks/GenAI/notebooks)
## Overview of Page Contents
+ [Azure AI Playground Prerequisites](#Azure-OpenAI-Playground-Prerequisites)
@@ -89,7 +89,7 @@ On the far right under *Configuration*, you can modify which model you are deplo
![modify deployment](/docs/images/19_deployment.png)
-Finally, you can select the `parameters` tab to modify the model parameters. Review [this presentation](/tutorials/notebooks/GenAI/search_documents/aoai_workshop_content.pdf) to learn more about the parameters.
+Finally, you can select the `parameters` tab to modify the model parameters. Review [this presentation](/notebooks/GenAI/search_documents/aoai_workshop_content.pdf) to learn more about the parameters.
![modify parameters](/docs/images/20_parameters.png)
@@ -396,7 +396,7 @@ Creating embeddings of search documents allows you to use vector search, which i
### Environment Setup
Navigate to your [Azure Machine Learning Studio environment](https://github.com/STRIDES/NIHCloudLabAzure#launch-a-machine-learning-workspace-jupyter-environment-). If you have not created your environment, [create one now](https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-cloud-workstation?view=azureml-api-2).
-Navigate to `Notebooks`, then clone this Git repo into your environment and navigate to the notebook called [AzureOpenAI_embeddings.ipynb](/tutorials/notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb).
+Navigate to `Notebooks`, then clone this Git repo into your environment and navigate to the notebook called [AzureOpenAI_embeddings.ipynb](/notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb).
You will need a variety of parameters to authenticate with the API. You can find these within the Playground by clicking **View Code**. Input these parameters into the notebook cell when asked.
diff --git a/tutorials/notebooks/GenAI/Azure_Open_AI_README.md b/notebooks/GenAI/Azure_Open_AI_README.md
similarity index 99%
rename from tutorials/notebooks/GenAI/Azure_Open_AI_README.md
rename to notebooks/GenAI/Azure_Open_AI_README.md
index 427234c..5f7ae1c 100644
--- a/tutorials/notebooks/GenAI/Azure_Open_AI_README.md
+++ b/notebooks/GenAI/Azure_Open_AI_README.md
@@ -8,11 +8,11 @@ Welcome to this repository, a comprehensive collection of examples that will hel
- 4 Python scripts that demonstrate how to use Azure OpenAI Embeddings to create embedding applications.
- 42 in-depth content slides on the information covered in this workshop. Please find ```aoai_workshop_content.pdf``` in [search_documents](https://github.com/t-cjackson/Azure-OpenAI-Workshop/tree/main/search_documents) folder in this repository.
-The purpose of this workshop is to equip participants with the necessary skills to make the most out of the Azure OpenAI Playground, Prompt Engineering, and Azure OpenAI Embeddings in Python. You can view in-depth info on these topics in the [workshop slides](/tutorials/notebooks/GenAI/search_documents/aoai_workshop_content.pdf).
+The purpose of this workshop is to equip participants with the necessary skills to make the most out of the Azure OpenAI Playground, Prompt Engineering, and Azure OpenAI Embeddings in Python. You can view in-depth info on these topics in the [workshop slides](/notebooks/GenAI/search_documents/aoai_workshop_content.pdf).
You can also learn a lot about the details of using Azure OpenAI at this [site](https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart?tabs=command-line&pivots=programming-language-studio).
-We recommend you 1) go through the steps in this README, 2) complete the general notebook called `notebooks/AzureOpenAI_embeddings.ipynb`, then 3) explore the other notebooks at [this directory](/tutorials/notebooks/GenAI/notebooks)
+We recommend you 1) go through the steps in this README, 2) complete the general notebook called `notebooks/AzureOpenAI_embeddings.ipynb`, then 3) explore the other notebooks at [this directory](/notebooks/GenAI/notebooks)
## Overview of Page Contents
+ [Azure OpenAI Playground Prerequisites](#Azure-OpenAI-Playground-Prerequisites)
diff --git a/tutorials/notebooks/GenAI/LICENSE b/notebooks/GenAI/LICENSE
similarity index 100%
rename from tutorials/notebooks/GenAI/LICENSE
rename to notebooks/GenAI/LICENSE
diff --git a/tutorials/notebooks/GenAI/embedding_demos/acs_embeddings.py b/notebooks/GenAI/embedding_demos/acs_embeddings.py
similarity index 100%
rename from tutorials/notebooks/GenAI/embedding_demos/acs_embeddings.py
rename to notebooks/GenAI/embedding_demos/acs_embeddings.py
diff --git a/tutorials/notebooks/GenAI/embedding_demos/aoai_embeddings.py b/notebooks/GenAI/embedding_demos/aoai_embeddings.py
similarity index 100%
rename from tutorials/notebooks/GenAI/embedding_demos/aoai_embeddings.py
rename to notebooks/GenAI/embedding_demos/aoai_embeddings.py
diff --git a/tutorials/notebooks/GenAI/example_scripts/example_azureaisearch_openaichat_zeroshot.py b/notebooks/GenAI/example_scripts/example_azureaisearch_openaichat_zeroshot.py
similarity index 100%
rename from tutorials/notebooks/GenAI/example_scripts/example_azureaisearch_openaichat_zeroshot.py
rename to notebooks/GenAI/example_scripts/example_azureaisearch_openaichat_zeroshot.py
diff --git a/tutorials/notebooks/GenAI/example_scripts/example_langchain_openaichat_zeroshot.py b/notebooks/GenAI/example_scripts/example_langchain_openaichat_zeroshot.py
similarity index 100%
rename from tutorials/notebooks/GenAI/example_scripts/example_langchain_openaichat_zeroshot.py
rename to notebooks/GenAI/example_scripts/example_langchain_openaichat_zeroshot.py
diff --git a/tutorials/notebooks/GenAI/example_scripts/workshop_embedding.py b/notebooks/GenAI/example_scripts/workshop_embedding.py
similarity index 100%
rename from tutorials/notebooks/GenAI/example_scripts/workshop_embedding.py
rename to notebooks/GenAI/example_scripts/workshop_embedding.py
diff --git a/tutorials/notebooks/GenAI/example_scripts/workshop_search.py b/notebooks/GenAI/example_scripts/workshop_search.py
similarity index 100%
rename from tutorials/notebooks/GenAI/example_scripts/workshop_search.py
rename to notebooks/GenAI/example_scripts/workshop_search.py
diff --git a/tutorials/notebooks/GenAI/microsoft-earnings.csv b/notebooks/GenAI/microsoft-earnings.csv
similarity index 100%
rename from tutorials/notebooks/GenAI/microsoft-earnings.csv
rename to notebooks/GenAI/microsoft-earnings.csv
diff --git a/tutorials/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_notebook.ipynb b/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_notebook.ipynb
similarity index 98%
rename from tutorials/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_notebook.ipynb
rename to notebooks/GenAI/notebooks/AzureAIStudio_index_structured_notebook.ipynb
index 3b097d6..66b7253 100644
--- a/tutorials/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_notebook.ipynb
+++ b/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_notebook.ipynb
@@ -13,9 +13,15 @@
"metadata": {},
"source": [
"## Overview\n",
- "LLMs work best when querying vector databases (DBs). In a few of our tutorials in this repo, we have created vector DBs from unstructured data like PDF documents. Here, we create a vector DB from structured data, which is technically complex and requires additional steps. Here we will vectorize (embed) a csv file, index our DB using Azure AI Search, and then query our vector DB using a GPT model deployed within Azure AI Studio.\n",
- "\n",
- "Note that we assume you have already deployed a model to your AI Studio Environment and have access to your keys and other variables. "
+ "LLMs work best when querying vector databases (DBs). In a few of our tutorials in this repo, we have created vector DBs from unstructured data like PDF documents. Here, we create a vector DB from structured data, which is technically complex and requires additional steps. Here we will vectorize (embed) a csv file, index our DB using Azure AI Search, and then query our vector DB using a GPT model deployed within Azure AI Studio."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have access to Azure AI Studio and Azure AI Search Service and have already deployed an LLM."
]
},
{
@@ -34,7 +40,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Get Started"
+ "## Get started"
]
},
{
@@ -777,7 +783,7 @@
"id": "0459e0ae-5183-4b6a-9eca-41c97b0b8a8c",
"metadata": {},
"source": [
- "## Clean Up"
+ "## Clean up"
]
},
{
diff --git a/tutorials/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb b/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb
similarity index 96%
rename from tutorials/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb
rename to notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb
index 77491d2..5ad4ee9 100644
--- a/tutorials/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb
+++ b/notebooks/GenAI/notebooks/AzureAIStudio_index_structured_with_console.ipynb
@@ -20,9 +20,15 @@
"## Overview\n",
"LLMs work best when querying vector databases (DBs). In a few of our tutorials in this repo, we have created vector DBs from unstructured data like PDF documents. Here, we create a vector DB from structured data, which is technically complex and requires additional steps. Here we will vectorize (embed) a csv file, index our DB using Azure AI Search, and then query our vector DB using a GPT model deployed within Azure AI Studio.\n",
"\n",
- "This notebook differs slightly from the tutorial titled `AzureAIStudio_index_structured_notebook.ipynb` in that here we create the index within Azure AI Search directly, rather than in the notebook. We also use NIH grant data here rather than a Kaggle dataset. \n",
- "\n",
- "Note that we assume you have already deployed a model to your AI Studio Environment and have access to your keys and other variables. We also assume you have an Azure Search Service and can upload your csv data to create the index through the console."
+ "This notebook differs slightly from the tutorial titled `AzureAIStudio_index_structured_notebook.ipynb` in that here we create the index within Azure AI Search directly, rather than in the notebook. We also use NIH grant data here rather than a Kaggle dataset. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have access to both Azure AI Studio and Azure AI Search Service, and have already deployed an LLM."
]
},
{
@@ -41,7 +47,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Get Started"
+ "## Get started"
]
},
{
@@ -351,7 +357,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Clean Up"
+ "## Clean up"
]
},
{
diff --git a/tutorials/notebooks/GenAI/notebooks/AzureAIStudio_langchain.ipynb b/notebooks/GenAI/notebooks/AzureAIStudio_langchain.ipynb
similarity index 96%
rename from tutorials/notebooks/GenAI/notebooks/AzureAIStudio_langchain.ipynb
rename to notebooks/GenAI/notebooks/AzureAIStudio_langchain.ipynb
index 40b3ef2..3cce50b 100644
--- a/tutorials/notebooks/GenAI/notebooks/AzureAIStudio_langchain.ipynb
+++ b/notebooks/GenAI/notebooks/AzureAIStudio_langchain.ipynb
@@ -25,9 +25,15 @@
"source": [
"## Overview\n",
"Models you deploy to your Azure AI Studio can be accessed via API calls. [Langchain](https://python.langchain.com/docs/get_started/introduction) is a development framework for applications power by language models. \n",
- "This tutorial gives you the basics of using langchain to work with Large Language Models (LLMs) for document summarization and basic chat bot functionality. You could take what we have here to build a front end application using something like streamlit, or other further iterations.\n",
- "\n",
- "We assume you have already deployed a model to your AI Studio Environment and have access to your keys and other variables. "
+ "This tutorial gives you the basics of using langchain to work with Large Language Models (LLMs) for document summarization and basic chat bot functionality. You could take what we have here to build a front end application using something like streamlit, or other further iterations."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have access to Azure AI Studio and have already deployed an LLM."
]
},
{
@@ -44,7 +50,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Get Started"
+ "## Get started"
]
},
{
diff --git a/tutorials/notebooks/GenAI/notebooks/AzureAIStudio_sql_chatbot.ipynb b/notebooks/GenAI/notebooks/AzureAIStudio_sql_chatbot.ipynb
similarity index 100%
rename from tutorials/notebooks/GenAI/notebooks/AzureAIStudio_sql_chatbot.ipynb
rename to notebooks/GenAI/notebooks/AzureAIStudio_sql_chatbot.ipynb
diff --git a/tutorials/notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb b/notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb
similarity index 98%
rename from tutorials/notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb
rename to notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb
index cd22cae..2b6885e 100644
--- a/tutorials/notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb
+++ b/notebooks/GenAI/notebooks/AzureOpenAI_embeddings.ipynb
@@ -18,9 +18,15 @@
"metadata": {},
"source": [
"## Overview\n",
- "Models you deploy to Azure OpenAI can be accessed via API calls. This tutorial gives you the basics of creating local embeddings from custom data and querying over those.\n",
- "\n",
- "We assume you have already deployed a model to your Azure OpenAI Environment and have access to your keys and other variables. "
+ "Models you deploy to Azure OpenAI can be accessed via API calls. This tutorial gives you the basics of creating local embeddings from custom data and querying over those."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have access to Azure AI Studio and have already deployed an LLM."
]
},
{
diff --git a/tutorials/notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb b/notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb
similarity index 98%
rename from tutorials/notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb
rename to notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb
index bed2f8e..ebea506 100644
--- a/tutorials/notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb
+++ b/notebooks/GenAI/notebooks/Azure_Pubmed_chatbot.ipynb
@@ -11,14 +11,17 @@
{
"cell_type": "markdown",
"metadata": {},
- "source": []
+ "source": [
+ "## Overview\n",
+ "[PubMed](https://pubmed.ncbi.nlm.nih.gov/about/) supports the search and retrieval of biomedical and life sciences literature with the aim of improving health both globally and personally. Here we create a chatbot that is grounded on PubMed data. Most Azure command line tools are already installed and it is recommended to use the **AzureML** kernel in your Jupyter notebook."
+ ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Overview\n",
- "[PubMed](https://pubmed.ncbi.nlm.nih.gov/about/) supports the search and retrieval of biomedical and life sciences literature with the aim of improving health–both globally and personally. Here we create a chatbot that is grounded on PubMed data. Most Azure command line tools are already installed and it is recommended to use the **AzureML** kernel in your Jupyter notebook. Here we assume you have already deployed an LLM within Azure AI Studio."
+ "## Prerequisites\n",
+ "We assume you have access to both Azure AI Studio and Azure AI Search, and have already deployed an LLM."
]
},
{
@@ -38,7 +41,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Get Started"
+ "## Get started"
]
},
{
@@ -54,7 +57,7 @@
"id": "9dbd13e7-afc9-416b-94dc-418a93e14587",
"metadata": {},
"source": [
- "In this tutorial we will be using Azure OpenAI which you can learn how to deploy [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=cli). This tutorial utilizes the model **gpt-35-turbo** version 0301 and the embeddings model **text-embedding-ada-002** version 2."
+ "In this tutorial we will be using Azure OpenAI which (if you havent already) you can learn how to deploy [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=cli). This tutorial utilizes the model **gpt-35-turbo** version 0301 and the embeddings model **text-embedding-ada-002** version 2."
]
},
{
@@ -662,7 +665,10 @@
" SearchIndexerDataContainer,\n",
" SearchIndexerDataSourceConnection,\n",
" SearchIndex,\n",
- " SearchIndexer\n",
+ " SearchIndexer,\n",
+ " SearchableField,\n",
+ " SearchFieldDataType,\n",
+ " SimpleField,\n",
")\n",
"\n",
"endpoint = \"https://{}.search.windows.net/\".format(service_name)\n",
@@ -1562,7 +1568,7 @@
"id": "a178c1c6-368a-48c5-8beb-278443b685a2",
"metadata": {},
"source": [
- "## Clean Up"
+ "## Clean up"
]
},
{
diff --git a/tutorials/notebooks/GenAI/requirements.txt b/notebooks/GenAI/requirements.txt
similarity index 100%
rename from tutorials/notebooks/GenAI/requirements.txt
rename to notebooks/GenAI/requirements.txt
diff --git a/tutorials/notebooks/GenAI/search_documents/Hurricane_Irene_(2005).pdf b/notebooks/GenAI/search_documents/Hurricane_Irene_(2005).pdf
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/Hurricane_Irene_(2005).pdf
rename to notebooks/GenAI/search_documents/Hurricane_Irene_(2005).pdf
diff --git a/tutorials/notebooks/GenAI/search_documents/Koutros_et_al_2023.pdf b/notebooks/GenAI/search_documents/Koutros_et_al_2023.pdf
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/Koutros_et_al_2023.pdf
rename to notebooks/GenAI/search_documents/Koutros_et_al_2023.pdf
diff --git a/tutorials/notebooks/GenAI/search_documents/New_York_State_Route_373.pdf b/notebooks/GenAI/search_documents/New_York_State_Route_373.pdf
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/New_York_State_Route_373.pdf
rename to notebooks/GenAI/search_documents/New_York_State_Route_373.pdf
diff --git a/tutorials/notebooks/GenAI/search_documents/Rai_et_al_2023.pdf b/notebooks/GenAI/search_documents/Rai_et_al_2023.pdf
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/Rai_et_al_2023.pdf
rename to notebooks/GenAI/search_documents/Rai_et_al_2023.pdf
diff --git a/tutorials/notebooks/GenAI/search_documents/Silverman_et_al_2023.pdf b/notebooks/GenAI/search_documents/Silverman_et_al_2023.pdf
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/Silverman_et_al_2023.pdf
rename to notebooks/GenAI/search_documents/Silverman_et_al_2023.pdf
diff --git a/tutorials/notebooks/GenAI/search_documents/aoai_workshop_content.pdf b/notebooks/GenAI/search_documents/aoai_workshop_content.pdf
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/aoai_workshop_content.pdf
rename to notebooks/GenAI/search_documents/aoai_workshop_content.pdf
diff --git a/tutorials/notebooks/GenAI/search_documents/grant_data_sub1.txt b/notebooks/GenAI/search_documents/grant_data_sub1.txt
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/grant_data_sub1.txt
rename to notebooks/GenAI/search_documents/grant_data_sub1.txt
diff --git a/tutorials/notebooks/GenAI/search_documents/grant_data_sub2.txt b/notebooks/GenAI/search_documents/grant_data_sub2.txt
similarity index 100%
rename from tutorials/notebooks/GenAI/search_documents/grant_data_sub2.txt
rename to notebooks/GenAI/search_documents/grant_data_sub2.txt
diff --git a/tutorials/notebooks/SRADownload/SRA-Download.ipynb b/notebooks/SRADownload/SRA-Download.ipynb
similarity index 72%
rename from tutorials/notebooks/SRADownload/SRA-Download.ipynb
rename to notebooks/SRADownload/SRA-Download.ipynb
index aad19bb..963e317 100644
--- a/tutorials/notebooks/SRADownload/SRA-Download.ipynb
+++ b/notebooks/SRADownload/SRA-Download.ipynb
@@ -18,12 +18,30 @@
"DNA sequence data are typically deposited into the NCBI Sequence Read Archive, and can be accessed through the SRA website, or via a collection of command line tools called SRA Toolkit. Individual sequence entries are assigned an Accession ID, which can be used to find and download a particular file. For example, if you go to the [SRA database](https://www.ncbi.nlm.nih.gov/sra) in a browser window, and search for `SRX15695630`, you should see an entry for _C. elegans_. Alternatively, you can search the SRA metadata using Amazon Athena and generate a list of accession numbers. Here we are going to generate a list of accessions using Athena, use tools from the SRA Toolkit to download a few fastq files, then copy those fastq files to a cloud bucket. We really only scratch the surface of how to search Athena using SQL. If you want more examples, you can also try the notebooks from [this SRA GitHub repo](https://github.com/ncbi/ASHG-Workshop-2021). "
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Learning objectives\n",
+ "+ Learn how to set up an Athena Database\n",
+ "+ Learn how to use AWS Glue to scrape the SRA metadata\n",
+ "+ Query Athena to find target Accession numbers\n",
+ "+ Use SRA tools to download genomic sequence data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Get started"
+ ]
+ },
{
"cell_type": "markdown",
"id": "39f62f42",
"metadata": {},
"source": [
- "### 1) Set up your Athena Database\n",
+ "### Set up your Athena Database\n",
"You need to set up your Athena database in the Athena console before you start this notebook. Follow our [guide](https://github.com/STRIDES/NIHCloudLabAWS/blob/main/docs/create_athena_database.md) to walk you through it."
]
},
@@ -32,7 +50,7 @@
"id": "7aed7098",
"metadata": {},
"source": [
- "### 2) Install Dependencies"
+ "### Install packages\n"
]
},
{
@@ -99,7 +117,7 @@
"id": "ddc46609",
"metadata": {},
"source": [
- "### 3) Setup Directory Structure and Create a Staging Bucket"
+ "### Setup Directory Structure and Create a Staging Bucket"
]
},
{
@@ -114,18 +132,10 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": null,
"id": "827f2447",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "/home/ec2-user/SageMaker/NIHCloudLabAWS/tutorials/notebooks/SRADownload/data\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"cd data/"
]
@@ -158,7 +168,7 @@
"id": "086a50c1",
"metadata": {},
"source": [
- "### 4) Create Accession List using Athena"
+ "### Create Accession List using Athena"
]
},
{
@@ -327,7 +337,7 @@
"id": "01437b57",
"metadata": {},
"source": [
- "### 5) Download FASTQ files with fasterq dump"
+ "### Download FASTQ files with fasterq dump"
]
},
{
@@ -340,18 +350,10 @@
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": null,
"id": "4764f355",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "/home/ec2-user/SageMaker/NIHCloudLabAWS/tutorials/notebooks/SRADownload/data/fasterqdump\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"cd fasterqdump/"
]
@@ -366,30 +368,10 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": null,
"id": "80c2e3b4",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "spots read : 2,054,166\n",
- "reads read : 4,108,332\n",
- "reads written : 4,108,332\n",
- "spots read : 25,734,849\n",
- "reads read : 51,469,698\n",
- "reads written : 25,734,849\n",
- "reads 0-length : 25,734,849\n",
- "spots read : 18,624,005\n",
- "reads read : 37,248,010\n",
- "reads written : 18,624,005\n",
- "reads 0-length : 18,624,005\n",
- "CPU times: user 6.18 s, sys: 1.26 s, total: 7.44 s\n",
- "Wall time: 6min 36s\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"%%time\n",
"!for x in `cat ../list_of_accessionIDS.txt`; do fasterq-dump -f -O raw_fastq -e 8 -m 4G $x ; done"
@@ -408,7 +390,7 @@
"id": "55bd52cd",
"metadata": {},
"source": [
- "### 6) Download FASTQ files with prefetch + fasterq dump"
+ "### Download FASTQ files with prefetch + fasterq dump"
]
},
{
@@ -421,57 +403,20 @@
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": null,
"id": "ddefec2d",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "/home/ec2-user/SageMaker/NIHCloudLabAWS/tutorials/notebooks/SRADownload/data/prefetch_fasterqdump\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"cd ../prefetch_fasterqdump"
]
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": null,
"id": "935f6ca2",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\n",
- "2022-08-30T15:45:12 prefetch.2.11.0: 1) Downloading 'SRR3617061'...\n",
- "2022-08-30T15:45:12 prefetch.2.11.0: Downloading via HTTPS...\n",
- "2022-08-30T15:45:16 prefetch.2.11.0: HTTPS download succeed\n",
- "2022-08-30T15:45:17 prefetch.2.11.0: 'SRR3617061' is valid\n",
- "2022-08-30T15:45:17 prefetch.2.11.0: 1) 'SRR3617061' was downloaded successfully\n",
- "\n",
- "2022-08-30T15:45:17 prefetch.2.11.0: 2) Downloading 'SRR8435254'...\n",
- "2022-08-30T15:45:17 prefetch.2.11.0: Downloading via HTTPS...\n",
- "2022-08-30T15:45:23 prefetch.2.11.0: HTTPS download succeed\n",
- "2022-08-30T15:45:24 prefetch.2.11.0: 'SRR8435254' is valid\n",
- "2022-08-30T15:45:24 prefetch.2.11.0: 2) 'SRR8435254' was downloaded successfully\n",
- "2022-08-30T15:45:24 prefetch.2.11.0: 'SRR8435254' has 0 dependencies\n",
- "\n",
- "2022-08-30T15:45:24 prefetch.2.11.0: 3) Downloading 'SRR8435252'...\n",
- "2022-08-30T15:45:24 prefetch.2.11.0: Downloading via HTTPS...\n",
- "2022-08-30T15:45:28 prefetch.2.11.0: HTTPS download succeed\n",
- "2022-08-30T15:45:29 prefetch.2.11.0: 'SRR8435252' is valid\n",
- "2022-08-30T15:45:29 prefetch.2.11.0: 3) 'SRR8435252' was downloaded successfully\n",
- "2022-08-30T15:45:29 prefetch.2.11.0: 'SRR8435252' has 0 dependencies\n",
- "CPU times: user 290 ms, sys: 37.5 ms, total: 327 ms\n",
- "Wall time: 17 s\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"%%time\n",
"!prefetch --option-file ../list_of_accessionIDS.txt -O raw_fastq -f yes"
@@ -479,18 +424,10 @@
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": null,
"id": "7eece75e",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\u001b[0m\u001b[01;34mSRR3617061\u001b[0m/ \u001b[01;34mSRR8435252\u001b[0m/ \u001b[01;34mSRR8435254\u001b[0m/\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"ls raw_fastq/"
]
@@ -505,30 +442,10 @@
},
{
"cell_type": "code",
- "execution_count": 24,
+ "execution_count": null,
"id": "1852a71a",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "spots read : 2,054,166\n",
- "reads read : 4,108,332\n",
- "reads written : 4,108,332\n",
- "spots read : 25,734,849\n",
- "reads read : 51,469,698\n",
- "reads written : 25,734,849\n",
- "reads 0-length : 25,734,849\n",
- "spots read : 18,624,005\n",
- "reads read : 37,248,010\n",
- "reads written : 18,624,005\n",
- "reads 0-length : 18,624,005\n",
- "CPU times: user 1.49 s, sys: 308 ms, total: 1.8 s\n",
- "Wall time: 1min 38s\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"%%time\n",
"!for x in `cat ../list_of_accessionIDS.txt`; do fasterq-dump -f -O raw_fastq -e 8 -m 4G raw_fastq/$x; done"
@@ -547,7 +464,7 @@
"id": "ea152fd7",
"metadata": {},
"source": [
- "### Step 7) Copy Files to a Bucket"
+ "### Copy Files to a Bucket"
]
},
{
@@ -560,60 +477,45 @@
},
{
"cell_type": "code",
- "execution_count": 22,
+ "execution_count": null,
"id": "ad73308f",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "upload: raw_fastq/SRR3617061/SRR3617061.sra to s3://sra-data-athena/raw_fastq/SRR3617061/SRR3617061.sra\n",
- "upload: raw_fastq/SRR8435252/SRR8435252.sra to s3://sra-data-athena/raw_fastq/SRR8435252/SRR8435252.sra\n",
- "upload: raw_fastq/SRR3617061_2.fastq to s3://sra-data-athena/raw_fastq/SRR3617061_2.fastq\n",
- "upload: raw_fastq/SRR3617061_1.fastq to s3://sra-data-athena/raw_fastq/SRR3617061_1.fastq\n",
- "upload: raw_fastq/SRR8435254/SRR8435254.sra to s3://sra-data-athena/raw_fastq/SRR8435254/SRR8435254.sra\n",
- "upload: raw_fastq/SRR8435252.fastq to s3://sra-data-athena/raw_fastq/SRR8435252.fastq\n",
- "upload: raw_fastq/SRR8435254.fastq to s3://sra-data-athena/raw_fastq/SRR8435254.fastq\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"!aws s3 cp raw_fastq/*.fastq s3://sra-data-athena/raw_fastq --recursive"
]
},
{
"cell_type": "code",
- "execution_count": 25,
+ "execution_count": null,
"id": "072ebc9a",
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- " PRE SRR3617061/\n",
- " PRE SRR8435252/\n",
- " PRE SRR8435254/\n",
- "2022-08-30 15:53:41 722868342 SRR3617061_1.fastq\n",
- "2022-08-30 15:53:41 722868342 SRR3617061_2.fastq\n",
- "2022-08-30 15:53:42 3903844648 SRR8435252.fastq\n",
- "2022-08-30 15:53:56 5411343576 SRR8435254.fastq\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"!aws s3 ls s3://sra-data-athena/raw_fastq/"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusions\n",
+ "You learned here how to bring the SRA metadata into Athena and query Athena DB to find target accession numbers, then use SRA tools to download sequence data locally."
+ ]
+ },
{
"cell_type": "markdown",
"id": "a4026566",
"metadata": {},
"source": [
- "### Step 8) Clean up\n",
+ "## Clean up\n",
"Make sure you shut down this VM, or delete it if you don't plan to use if further. You can also [delete the buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html) if you don't want to pay for the data: `aws s3 rb s3://bucket-name --force`"
]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
}
],
"metadata": {
diff --git a/tutorials/notebooks/SpleenLiverSegmentation/README.md b/notebooks/SpleenLiverSegmentation/README.md
similarity index 100%
rename from tutorials/notebooks/SpleenLiverSegmentation/README.md
rename to notebooks/SpleenLiverSegmentation/README.md
diff --git a/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb b/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb
new file mode 100644
index 0000000..cf8b3fe
--- /dev/null
+++ b/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb
@@ -0,0 +1,1017 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "1452463e",
+ "metadata": {},
+ "source": [
+ "# Spleen Model With NVIDIA Pretrain"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Overview\n",
+ "This notebook conducts image segmentation of spleen images using an NVIDIA pretrained model. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have provisioned a compute environment in Azure ML Studio **with a GPU**! A T4 GPU will work fine."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Learning objectives\n",
+ "+ Learn how to use NVIDIA pre-trained models for image segmentation within Azure ML Studio"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Get started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Install packages"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f59ba435",
+ "metadata": {},
+ "source": [
+ "Uncomment below to install all dependencies."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "82db674f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#!pip install 'monai[all]'\n",
+ "#!pip install matplotlib "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bb1228b3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "540e5d47",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# MONAI version: 0.6.0+38.gf6ad4ba5\n",
+ "# Numpy version: 1.21.1\n",
+ "# Pytorch version: 1.9.0\n",
+ "# Pytorch Ignite version: 0.4.5\n",
+ "# Nibabel version: 3.2.1\n",
+ "# scikit-image version: 0.18.2\n",
+ "# Pillow version: 8.3.1\n",
+ "# Tensorboard version: 2.5.0\n",
+ "# gdown version: 3.13.0\n",
+ "# TorchVision version: 0.10.0+cu111\n",
+ "# tqdm version: 4.61.2\n",
+ "# lmdb version: 1.2.1\n",
+ "# psutil version: 5.8.0\n",
+ "# pandas version: 1.3.0\n",
+ "# einops version: 0.3.0"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "07510582",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import tempfile\n",
+ "import glob\n",
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "#import plotly.graph_objects as go\n",
+ "import torch\n",
+ "import numpy as np\n",
+ "\n",
+ "from monai.apps import download_and_extract\n",
+ "from monai.networks.nets import UNet\n",
+ "from monai.networks.layers import Norm\n",
+ "from monai.losses import DiceFocalLoss\n",
+ "from monai.metrics import DiceMetric\n",
+ "from monai.inferers import sliding_window_inference\n",
+ "from monai.data import (\n",
+ " LMDBDataset,\n",
+ " DataLoader,\n",
+ " decollate_batch,\n",
+ " ImageDataset,\n",
+ " Dataset\n",
+ ")\n",
+ "from monai.apps import load_from_mmar\n",
+ "from monai.transforms import (\n",
+ " AsDiscrete,\n",
+ " EnsureChannelFirstd,\n",
+ " Compose,\n",
+ " LoadImaged,\n",
+ " ScaleIntensityRanged,\n",
+ " Spacingd,\n",
+ " Orientationd,\n",
+ " CropForegroundd,\n",
+ " RandCropByPosNegLabeld,\n",
+ " RandAffined,\n",
+ " RandRotated,\n",
+ " EnsureType,\n",
+ " EnsureTyped,\n",
+ ")\n",
+ "from monai.utils import first, set_determinism\n",
+ "from monai.apps.mmars import RemoteMMARKeys\n",
+ "from monai.config import print_config\n",
+ "\n",
+ "print_config()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f523cbf",
+ "metadata": {},
+ "source": [
+ "### Running a pretrained model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0be7401d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "PRETRAINED = True"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e9f3e5f3",
+ "metadata": {},
+ "source": [
+ "Create the directory for storing data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "311c3282",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "directory = \"monai_data/\"\n",
+ "root_dir = tempfile.mkdtemp() if directory is None else directory\n",
+ "print(root_dir)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "38463a18",
+ "metadata": {},
+ "source": [
+ "### Download the public dataset"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "da7cfede",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "resource = \"https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar\"\n",
+ "md5 = \"410d4a301da4e5b2f6f86ec3ddba524e\"\n",
+ "\n",
+ "compressed_file = os.path.join(root_dir, \"Task09_Spleen.tar\")\n",
+ "download_and_extract(resource, compressed_file, root_dir, md5)\n",
+ "data_dir = os.path.join(root_dir, \"Task09_Spleen\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fae7c51b",
+ "metadata": {},
+ "source": [
+ "### Create Date Dictionaries and separate files from training and validation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2515b177",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_images = sorted(\n",
+ " glob.glob(os.path.join(data_dir, \"imagesTr\", \"*.nii.gz\")))\n",
+ "train_labels = sorted(\n",
+ " glob.glob(os.path.join(data_dir, \"labelsTr\", \"*.nii.gz\")))\n",
+ "data_dicts = [\n",
+ " {\"image\": image_name, \"label\": label_name}\n",
+ " for image_name, label_name in zip(train_images, train_labels)\n",
+ "]\n",
+ "train_files, val_files = data_dicts[:-9], data_dicts[-9:]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "974fc5aa",
+ "metadata": {},
+ "source": [
+ "### Define your transformations for training and validation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2357d35d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_transforms = Compose( #Transformations for training dataset\n",
+ " [\n",
+ " LoadImaged(keys=[\"image\", \"label\"]), #Load dictionary based images and labels\n",
+ " EnsureChannelFirstd(keys=[\"image\", \"label\"]), #Ensures the first channel of each image is the channel dimension\n",
+ " Spacingd(keys=[\"image\", \"label\"], pixdim=( #Change spacing of voxels to be same across images\n",
+ " 1.5, 1.5, 2.0), mode=(\"bilinear\", \"nearest\")),\n",
+ " Orientationd(keys=[\"image\", \"label\"], axcodes=\"RAS\"), #Correct the orientation of images (Right, Anterior, Superior)\n",
+ " ScaleIntensityRanged( #Scale intensity of all images (For images only and not labels)\n",
+ " keys=[\"image\"], a_min=-57, a_max=164,\n",
+ " b_min=0.0, b_max=1.0, clip=True,\n",
+ " ),\n",
+ " CropForegroundd(keys=[\"image\", \"label\"], source_key=\"image\"), #Crop foreground of image\n",
+ " RandCropByPosNegLabeld( #Randomly crop fixed sized region\n",
+ " keys=[\"image\", \"label\"],\n",
+ " label_key=\"label\",\n",
+ " spatial_size=(96, 96, 96),\n",
+ " pos=1,\n",
+ " neg=1,\n",
+ " num_samples=4,\n",
+ " image_key=\"image\",\n",
+ " image_threshold=0,\n",
+ " ),\n",
+ " RandAffined( #Do a random affine transformation with some probability\n",
+ " keys=['image', 'label'],\n",
+ " mode=('bilinear', 'nearest'),\n",
+ " prob=0.5,\n",
+ " spatial_size=(96, 96, 96),\n",
+ " rotate_range=(np.pi/18, np.pi/18, np.pi/5),\n",
+ " scale_range=(0.05, 0.05, 0.05)\n",
+ " ),\n",
+ " EnsureTyped(keys=[\"image\", \"label\"]),\n",
+ " ]\n",
+ ")\n",
+ "val_transforms = Compose( #Transformations for testing dataset\n",
+ " [\n",
+ " LoadImaged(keys=[\"image\", \"label\"]),\n",
+ " EnsureChannelFirstd(keys=[\"image\", \"label\"]),\n",
+ " Spacingd(keys=[\"image\", \"label\"], pixdim=(\n",
+ " 1.5, 1.5, 2.0), mode=(\"bilinear\", \"nearest\")),\n",
+ " Orientationd(keys=[\"image\", \"label\"], axcodes=\"RAS\"),\n",
+ " ScaleIntensityRanged(\n",
+ " keys=[\"image\"], a_min=-57, a_max=164,\n",
+ " b_min=0.0, b_max=1.0, clip=True,\n",
+ " ),\n",
+ " RandRotated(\n",
+ " keys=['image', 'label'],\n",
+ " mode=('bilinear', 'nearest'),\n",
+ " range_x=np.pi/18,\n",
+ " range_y=np.pi/18,\n",
+ " range_z=np.pi/5,\n",
+ " prob=1.0,\n",
+ " padding_mode=('reflection', 'reflection'),\n",
+ " ),\n",
+ " CropForegroundd(keys=[\"image\", \"label\"], source_key=\"image\"),\n",
+ " EnsureTyped(keys=[\"image\", \"label\"]),\n",
+ " ]\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ada5757a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "val_files"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ba3c7695",
+ "metadata": {},
+ "source": [
+ "### Visualize Image and Label (example)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "689eea4e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "check_ds = Dataset(data=val_files, transform=val_transforms)\n",
+ "check_loader = DataLoader(check_ds, batch_size=1)\n",
+ "check_data = first(check_loader)\n",
+ "image, label = (check_data[\"image\"][0][0], check_data[\"label\"][0][0])\n",
+ "print(f\"image shape: {image.shape}, label shape: {label.shape}\")\n",
+ "# plot the slice [:, :, 80]\n",
+ "plt.figure(\"check\", (12, 6))\n",
+ "plt.subplot(1, 2, 1)\n",
+ "plt.title(\"image\")\n",
+ "plt.imshow(image[:, :, 80], cmap=\"gray\")\n",
+ "plt.subplot(1, 2, 2)\n",
+ "plt.title(\"label\")\n",
+ "plt.imshow(label[:, :, 80])\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f45ba707",
+ "metadata": {},
+ "source": [
+ "### Use a dataloader to load files\n",
+ "Ability to use LMDB (Lightning Memory-Mapped Database). Here is where transforms take place and they happen on both images and labels."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fe3285d0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_ds = LMDBDataset(data=train_files, transform=train_transforms, cache_dir=root_dir)\n",
+ "# initialize cache and print meta information\n",
+ "print(train_ds.info())\n",
+ "\n",
+ "# use batch_size=2 to load images and use RandCropByPosNegLabeld\n",
+ "# to generate 2 x 4 images for network training\n",
+ "train_loader = DataLoader(train_ds, batch_size=2, shuffle=True, num_workers=2)\n",
+ "\n",
+ "# the validation data loader will be created on the fly to ensure \n",
+ "# a deterministic validation set for demo purpose.\n",
+ "val_ds = LMDBDataset(data=val_files, transform=val_transforms, cache_dir=root_dir)\n",
+ "# initialize cache and print meta information\n",
+ "print(val_ds.info())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "455cbcdc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(train_ds.info())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a77e7856",
+ "metadata": {},
+ "source": [
+ "### Download the pretrained model from NVIDIA"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8539fb7d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mmar = {\n",
+ " RemoteMMARKeys.ID: \"clara_pt_spleen_ct_segmentation_1\",\n",
+ " RemoteMMARKeys.NAME: \"clara_pt_spleen_ct_segmentation\",\n",
+ " RemoteMMARKeys.FILE_TYPE: \"zip\",\n",
+ " RemoteMMARKeys.HASH_TYPE: \"md5\",\n",
+ " RemoteMMARKeys.HASH_VAL: None,\n",
+ " RemoteMMARKeys.MODEL_FILE: os.path.join(\"models\", \"model.pt\"),\n",
+ " RemoteMMARKeys.CONFIG_FILE: os.path.join(\"config\", \"config_train.json\"),\n",
+ " RemoteMMARKeys.VERSION: 2,\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "de7fb262",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mmar['name']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bf96f9f9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\") #torch.device(\"cpu\")\n",
+ "if PRETRAINED:\n",
+ " print(\"using a pretrained model.\")\n",
+ " try: #MONAI=0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " item = mmar['name'], \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " version=mmar['version'],\n",
+ " pretrained=True)\n",
+ " except: #MONAI<0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " mmar, \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " pretrained=True)\n",
+ " model = unet_model\n",
+ "else: \n",
+ " print(\"using a randomly init. model.\")\n",
+ " model = UNet(\n",
+ " dimensions=3,\n",
+ " in_channels=1,\n",
+ " out_channels=2,\n",
+ " channels=(16, 32, 64, 128, 256),\n",
+ " strides=(2, 2, 2, 2),\n",
+ " num_res_units=2,\n",
+ " norm=Norm.BATCH,\n",
+ " )\n",
+ "\n",
+ "model = model.to(device)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "39910557",
+ "metadata": {},
+ "source": [
+ "This will be our test file we will view for reference. Here we see how our initial model appears to perform."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4be7eb8f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "test_file = data_dicts[20:21]\n",
+ "test_ds = LMDBDataset(data=test_file, transform=None, cache_dir=root_dir)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2544a774",
+ "metadata": {},
+ "source": [
+ "We use a sliding window technique to search the image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "16fd4e94",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_classes=2\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
+ "model.eval()\n",
+ "with torch.no_grad():\n",
+ " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
+ " test_inputs, test_labels = (\n",
+ " data[\"image\"].to(device),\n",
+ " data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " test_outputs = sliding_window_inference(\n",
+ " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " test_outputspre = [post_pred(i) for i in decollate_batch(test_outputs)] # Decollate our results\n",
+ " test_labelspre = [post_label(i) for i in decollate_batch(test_labels)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9782ec96",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Actual Spleen')\n",
+ "plt.imshow(test_labelspre[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Actual spleen"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "76cd38e6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Pretrained CalculatedSpleen')\n",
+ "plt.imshow(test_outputspre[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Pretrained model spleen"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "65c68242",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Differences Between Actual and Model')\n",
+ "pretraineddif = test_labelspre[0].cpu().numpy()[1][:,:,200] - test_outputspre[0].cpu().numpy()[1][:,:,200]\n",
+ "plt.imshow(pretraineddif, cmap='Greys_r') #Differences"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2f60e5b5",
+ "metadata": {},
+ "source": [
+ "Using just the pretrained model, it appears we are performing pretty well! We can now continue to train with our data using the NVIDIA models initial weights"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c3e40010",
+ "metadata": {},
+ "source": [
+ "## Training\n",
+ " Without a GPU, training can take a while, we recommend skipping next three cells and load in model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a8ad6aee",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "loss_function = DiceFocalLoss(to_onehot_y=True, softmax=True)\n",
+ "optimizer = torch.optim.Adam(model.parameters(), 5e-4)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d91d340c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "max_epochs = 25\n",
+ "val_interval = 2\n",
+ "num_classes = 2\n",
+ "best_metric = -1\n",
+ "best_metric_epoch = -1\n",
+ "epoch_loss_values = []\n",
+ "metric_values = []\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
+ "dice_metric = DiceMetric(include_background=False, reduction=\"mean\", get_not_nans=False)\n",
+ "\n",
+ "for epoch in range(max_epochs):\n",
+ " print(\"-\" * 10)\n",
+ " print(f\"epoch {epoch + 1}/{max_epochs}\")\n",
+ " model.train()\n",
+ " epoch_loss = 0\n",
+ " step = 0\n",
+ " set_determinism(seed=42)\n",
+ " for batch_data in train_loader:\n",
+ " step += 1\n",
+ " inputs, labels = (\n",
+ " batch_data[\"image\"].to(device),\n",
+ " batch_data[\"label\"].to(device),\n",
+ " )\n",
+ " optimizer.zero_grad()\n",
+ " outputs = model(inputs)\n",
+ " loss = loss_function(outputs, labels)\n",
+ " loss.backward()\n",
+ " optimizer.step()\n",
+ " epoch_loss += loss.item()\n",
+ " print(\n",
+ " f\"{step}/{len(train_ds) // train_loader.batch_size}, \"\n",
+ " f\"train_loss: {loss.item():.4f}\")\n",
+ " epoch_loss /= step\n",
+ " epoch_loss_values.append(epoch_loss)\n",
+ " print(f\"epoch {epoch + 1} average loss: {epoch_loss:.4f}\")\n",
+ "\n",
+ " if (epoch + 1) % val_interval == 0:\n",
+ " model.eval()\n",
+ " with torch.no_grad():\n",
+ " set_determinism(seed=42)\n",
+ " for val_data in DataLoader(val_ds, batch_size=1, num_workers=2):\n",
+ " val_inputs, val_labels = (\n",
+ " val_data[\"image\"].to(device),\n",
+ " val_data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " val_outputs = sliding_window_inference(\n",
+ " val_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " val_outputs = [post_pred(i) for i in decollate_batch(val_outputs)]\n",
+ " val_labels = [post_label(i) for i in decollate_batch(val_labels)]\n",
+ " dice_metric(y_pred=val_outputs, y=val_labels)\n",
+ " metric = dice_metric.aggregate().item()\n",
+ " dice_metric.reset()\n",
+ " metric_values.append(metric)\n",
+ " if metric > best_metric:\n",
+ " best_metric = metric\n",
+ " best_metric_epoch = epoch + 1\n",
+ " torch.save(model.state_dict(), os.path.join(\n",
+ " root_dir, \"Spleen_best_metric_model_pretrained.pth\"))\n",
+ " print(\"saved new best metric model\")\n",
+ " print(\n",
+ " f\"current epoch: {epoch + 1} current mean dice: {metric:.4f}\"\n",
+ " f\"\\nbest mean dice: {best_metric:.4f} \"\n",
+ " f\"at epoch: {best_metric_epoch}\"\n",
+ " )\n",
+ "print(\n",
+ " f\"train completed, best_metric: {best_metric:.4f} \"\n",
+ " f\"at epoch: {best_metric_epoch}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5cf1fd04",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.figure(\"train\", (12, 6))\n",
+ "plt.subplot(1, 2, 1)\n",
+ "plt.title(\"Epoch Average Loss\")\n",
+ "x = [i + 1 for i in range(len(epoch_loss_values))]\n",
+ "y = epoch_loss_values\n",
+ "plt.xlabel(\"epoch\")\n",
+ "plt.ylim([0.1, 0.7])\n",
+ "plt.plot(x, y)\n",
+ "plt.subplot(1, 2, 2)\n",
+ "plt.title(\"Val Mean Dice\")\n",
+ "x = [val_interval * (i + 1) for i in range(len(metric_values))]\n",
+ "y = metric_values\n",
+ "plt.xlabel(\"epoch\")\n",
+ "plt.ylim([0, 1.0])\n",
+ "plt.plot(x, y)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4ff0035d",
+ "metadata": {},
+ "source": [
+ "The model shows that it has improved fairly quickly over just 25 epochs."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0499fa93",
+ "metadata": {},
+ "source": [
+ "## Inference\n",
+ "Without GPU skip to here to load previously trained best model (without a gpu the training will take a while)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "29441405",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "model.load_state_dict(torch.load('monai_data/best_metric_model_pretrained.pth'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fab5b4b9",
+ "metadata": {},
+ "source": [
+ "With the model loaded let's see if much has changed for our example image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "94615f38",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_classes = 2\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
+ "model.eval()\n",
+ "with torch.no_grad():\n",
+ " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
+ " test_inputs, test_labels = (\n",
+ " data[\"image\"].to(device),\n",
+ " data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " test_outputs = sliding_window_inference(\n",
+ " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " test_outputsSpl = [post_pred(i) for i in decollate_batch(test_outputs)]\n",
+ " test_labelsSpl = [post_label(i) for i in decollate_batch(test_labels)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a3f78dd4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Trained Calculated Spleen')\n",
+ "plt.imshow(test_outputsSpl[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Pretrained model spleen"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a67f89f2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Differences Between Actual and Model')\n",
+ "traineddif = test_labelsSpl[0].cpu().numpy()[1][:,:,200] - test_outputsSpl[0].cpu().numpy()[1][:,:,200]\n",
+ "plt.imshow(traineddif, cmap='Greys_r') #Differences"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "382c7285",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Differences Between The Models')\n",
+ "modelsdif = test_outputspre[0].cpu().numpy()[1][:,:,200] - test_outputsSpl[0].cpu().numpy()[1][:,:,200]\n",
+ "plt.imshow(traineddif, cmap='Greys_r') #Differences"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6606bce2",
+ "metadata": {},
+ "source": [
+ "We see not much has changed, which is a good sign for how well the NVIDIA model performs out of the box."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5cfd20c6",
+ "metadata": {},
+ "source": [
+ "Here is the final image of our Spleen!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "91e83d40",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,:,200] == 0, test_outputsSpl[0].cpu().numpy()[1][:,:,200])\n",
+ "fig = plt.figure(frameon=False, figsize=(10,10))\n",
+ "plt.imshow(np.rot90(test_ds[0]['image'][0][:,:,200]), cmap='Greys_r')\n",
+ "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=1.0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6030d210",
+ "metadata": {},
+ "source": [
+ "Feel free to play around in this notebook or download it and use it where a GPU is accessible."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "896388a1",
+ "metadata": {},
+ "source": [
+ "## Additional Exercise: Use liver segmentation in addition to spleen\n",
+ "Her we are loading in liver segmentation from NVIDIA. While we can't train this model, since we don't have training data, we can use it as a rough estimate."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "657e44a0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mmarliver = {\n",
+ " RemoteMMARKeys.ID: \"clara_pt_liver_and_tumor_ct_segmentation_1\",\n",
+ " RemoteMMARKeys.NAME: \"clara_pt_liver_and_tumor_ct_segmentation\",\n",
+ " RemoteMMARKeys.FILE_TYPE: \"zip\",\n",
+ " RemoteMMARKeys.HASH_TYPE: \"md5\",\n",
+ " RemoteMMARKeys.HASH_VAL: None,\n",
+ " RemoteMMARKeys.MODEL_FILE: os.path.join(\"models\", \"model.pt\"),\n",
+ " RemoteMMARKeys.CONFIG_FILE: os.path.join(\"config\", \"config_train.json\"),\n",
+ " RemoteMMARKeys.VERSION: 1,\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a6fb0da7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ " try: #MONAI=0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " item = mmarliver['name'], \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " version=mmarliver['version'],\n",
+ " pretrained=True)\n",
+ " except: #MONAI<0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " mmarliver, \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " pretrained=True)\n",
+ " model = unet_model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "55034354",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+ "\n",
+ "print(\"using a pretrained model.\")\n",
+ "try: #MONAI=0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " item = mmarliver['name'], \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " version=mmarliver['version'],\n",
+ " pretrained=True)\n",
+ "except: #MONAI<0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " mmarliver, \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " pretrained=True)\n",
+ "model = unet_model.to(device)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a79c1731",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_classesP=3\n",
+ "num_classesL=2\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classesP)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classesL)])\n",
+ "model.eval()\n",
+ "with torch.no_grad():\n",
+ " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
+ " test_inputs, test_labels = (\n",
+ " data[\"image\"].to(device),\n",
+ " data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " test_outputs = sliding_window_inference(\n",
+ " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " test_outputsliv = [post_pred(i) for i in decollate_batch(test_outputs)] # Decollate our results\n",
+ " test_labelsliv = [post_label(i) for i in decollate_batch(test_labels)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c0956706",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sliceval = 215\n",
+ "maskedliv = np.ma.masked_where(test_outputsliv[0].cpu().numpy()[1][:,:,sliceval] == 0, test_outputsliv[0].cpu().numpy()[1][:,:,sliceval])\n",
+ "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,:,sliceval] == 0, test_outputsSpl[0].cpu().numpy()[1][:,:,sliceval])\n",
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Pretrained Calculated Liver and spleen')\n",
+ "plt.imshow(np.rot90(test_ds[0]['image'][0][:,:,sliceval]), cmap='Greys_r')\n",
+ "plt.imshow(np.rot90(maskedliv), cmap='cividis', alpha=0.75)\n",
+ "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=0.75)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5bdfdbe9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sliceval = 110\n",
+ "maskedliv = np.ma.masked_where(test_outputsliv[0].cpu().numpy()[1][:,sliceval,:] == 0, test_outputsliv[0].cpu().numpy()[1][:,sliceval,:])\n",
+ "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,sliceval,:] == 0, test_outputsSpl[0].cpu().numpy()[1][:,sliceval,:])\n",
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Pretrained Calculated Liver and Spleen')\n",
+ "plt.imshow(np.rot90(test_ds[0]['image'][0][:,sliceval,:]), cmap='Greys_r')\n",
+ "plt.imshow(np.rot90(maskedliv), cmap='cividis', alpha=0.75)\n",
+ "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=0.75)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "af1169b6",
+ "metadata": {},
+ "source": [
+ "Continue including more models found at the NGC Catalog: https://catalog.ngc.nvidia.com/models. We recommend filtering by 'CT'."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusions\n",
+ "Here you learned how to use NVIDIA pre-trained models for image segmentation"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Clean up\n",
+ "Shut down your compute environment and delete any resource groups associated with this notebook."
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "name": "pytorch-gpu.1-9.m75",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/pytorch-gpu.1-9:m75"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/SpleenLiverSegmentation/monai_data/Spleen_best_metric_model_pretrained.pth b/notebooks/SpleenLiverSegmentation/monai_data/Spleen_best_metric_model_pretrained.pth
similarity index 100%
rename from tutorials/notebooks/SpleenLiverSegmentation/monai_data/Spleen_best_metric_model_pretrained.pth
rename to notebooks/SpleenLiverSegmentation/monai_data/Spleen_best_metric_model_pretrained.pth
diff --git a/notebooks/pangolin/pangolin_pipeline.ipynb b/notebooks/pangolin/pangolin_pipeline.ipynb
new file mode 100644
index 0000000..453aa95
--- /dev/null
+++ b/notebooks/pangolin/pangolin_pipeline.ipynb
@@ -0,0 +1,361 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "31e8c3cd",
+ "metadata": {},
+ "source": [
+ "# Pangolin SARS-CoV-2 Pipeline Notebook"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Overview \n",
+ "SARS-CoV-2 sequence is usually analyzed using a bioinformatic pipeline called Pangolin. Here we will download some genomic data and run Pangolin following [standard instructions](https://cov-lineages.org/resources/pangolin/usage.html). "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have access to Azure AI Studio and have already deployed an LLM "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Learning objectives\n",
+ "+ Download genomic data from NCBI from the commnd line\n",
+ "+ Run pangolin to identify viral lineages\n",
+ "+ Generate a phylogeny to visualize lineage identity"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Get started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "03541941",
+ "metadata": {},
+ "source": [
+ "### Install packages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f994b990",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#change this depending on how many threads are available in your notebook\n",
+ "CPU=4"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a19b662e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a40f7ebc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f421805e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#install biopython to import packages below\n",
+ "! pip install biopython"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fd936fd6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! mamba install ipyrad iqtree -c conda-forge -c bioconda"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5a99cf0d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#import libraries\n",
+ "import os\n",
+ "from Bio import SeqIO\n",
+ "from Bio import Entrez\n",
+ "import ipyrad.analysis as ipa\n",
+ "import toytree"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up directory structure"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8f831fca",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "if not os.path.exists('pangolin_analysis'):\n",
+ " os.mkdir('pangolin_analysis')\n",
+ "os.chdir('pangolin_analysis')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6423ca5d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "if os.path.exists('sarscov2_sequences.fasta'):\n",
+ " os.remove('sarscov2_sequences.fasta')\n",
+ "!rm sarscov2_*\n",
+ "!rm lineage_report.csv"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9d7015e6",
+ "metadata": {},
+ "source": [
+ "### Fetch viral sequences using a list of accession IDs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "16824bcf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#give a list of accession number for covid sequences\n",
+ "acc_nums=['NC_045512','LR757995','LR757996','OL698718','OL677199','OL672836','MZ914912','MZ916499','MZ908464','MW580573','MW580574','MW580576','MW991906','MW931310','MW932027','MW424864','MW453109','MW453110']\n",
+ "print('the number of sequences we will analyze = ',len(acc_nums))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9e382d33",
+ "metadata": {},
+ "source": [
+ "Let this block run without going to the next until it finishes, otherwise you may get an error about too many requests. If that happens, reset your kernel and just rerun everything (except installing software)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a28a7122",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#use the bio.entrez toolkit within biopython to download the accession numbers\n",
+ "#save those sequences to a single fasta file\n",
+ "Entrez.email = \"email@example.com\" # Always tell NCBI who you are\n",
+ "filename = \"sarscov2_seqs.fasta\"\n",
+ "if not os.path.isfile(filename):\n",
+ " # Downloading...\n",
+ " for acc in acc_nums:\n",
+ " net_handle = Entrez.efetch(\n",
+ " db=\"nucleotide\", id=acc, rettype=\"fasta\", retmode=\"text\"\n",
+ " )\n",
+ " out_handle = open(filename, \"a\")\n",
+ " out_handle.write(net_handle.read())\n",
+ " out_handle.close()\n",
+ " net_handle.close()\n",
+ " print(\"Saved\",acc)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "56acb7cc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#make sure our fasta file has the same number of seqs as the acc_nums list\n",
+ "print('the number of seqs in our fasta file: ')\n",
+ "! grep '>' sarscov2_seqs.fasta | wc -l"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8606c352",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#let's peek at our new fasta file\n",
+ "! head sarscov2_seqs.fasta"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2db37b4e",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Run pangolin to identify lineages and output alignment\n",
+ "Here we call pangolin, give it our input sequences and the number of threads. We also tell it to output the alignment. The full list of pangolin parameters can be found in the [docs](https://cov-lineages.org/resources/pangolin/usage.html)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f1a17a74",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! pangolin sarscov2_seqs.fasta --alignment --threads $CPU"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b0e56a4b",
+ "metadata": {},
+ "source": [
+ "You can view the output file from pangolin called lineage_report.csv (within pangolin_analysis folder) by double clicking on the file, or by right clicking and downloading. What lineages are present in the dataset? Is Omicron in there?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "37e6efbe",
+ "metadata": {},
+ "source": [
+ "### Run iqtree to estimate maximum likelihood tree for our sequences\n",
+ "iqtree can find the best nucleotide model for the data, but here we are going to assign a model to save time (HKY) and just estimate the phylogeny without any bootstrap support values. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f2782855",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#run iqtree with threads = $CPU variable, if you exclude the -m it will do a phylogenetic model search before tree search\n",
+ "! iqtree -s sequences.aln.fasta -nt $CPU -m HKY --prefix sarscov2_tree --redo-tree"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c7197dd4",
+ "metadata": {},
+ "source": [
+ "### Visualize the tree with toytree"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cef2ba18",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Define the tree file\n",
+ "tre = toytree.tree('sarscov2_tree.treefile')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "842af165",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#draw the tree\n",
+ "rtre = tre.root(wildcard=\"OL\")\n",
+ "rtre.draw(tip_labels_align=True);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "52d9389f",
+ "metadata": {},
+ "source": [
+ "You can also visualize the tree by downloading it and opening in figtree."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusions\n",
+ "Here you learned how to use Azure ML Studio to conduct a basic phylogenetic analysis"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Clean Up\n",
+ "Make sure you stop your compute instance and if desired, delete the resource group associated with this tutorial."
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "r-cpu.4-1.m87",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-1:m87"
+ },
+ "kernelspec": {
+ "display_name": "conda_amazonei_mxnet_p36",
+ "language": "python",
+ "name": "conda_amazonei_mxnet_p36"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/rnaseq-myco-tutorial-main/LICENSE b/notebooks/rnaseq-myco-tutorial-main/LICENSE
similarity index 100%
rename from tutorials/notebooks/rnaseq-myco-tutorial-main/LICENSE
rename to notebooks/rnaseq-myco-tutorial-main/LICENSE
diff --git a/tutorials/notebooks/rnaseq-myco-tutorial-main/README.md b/notebooks/rnaseq-myco-tutorial-main/README.md
similarity index 100%
rename from tutorials/notebooks/rnaseq-myco-tutorial-main/README.md
rename to notebooks/rnaseq-myco-tutorial-main/README.md
diff --git a/notebooks/rnaseq-myco-tutorial-main/RNAseq_pipeline.ipynb b/notebooks/rnaseq-myco-tutorial-main/RNAseq_pipeline.ipynb
new file mode 100644
index 0000000..fe594c4
--- /dev/null
+++ b/notebooks/rnaseq-myco-tutorial-main/RNAseq_pipeline.ipynb
@@ -0,0 +1,493 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# RNA-Seq Analysis Training Demo on Azure"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Overview"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This short tutorial demonstrates how to run an RNA-Seq workflow using a prokaryotic data set. Steps in the workflow include read trimming, read QC, read mapping, and counting mapped reads per gene to quantitative gene expression."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "We assume you have provisioned a compute environment in Azure ML Studio"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Learning objectives\n",
+ "+ Learn how to copy data to and from Blob storage\n",
+ "+ Learn how to run and visualize basic RNAseq analysis"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Get started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Install packages"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Note that within Jupyter you can run a bash command either by using the magic '!' in front of your command, or by adding %%bash to the top of your cell."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For example\n",
+ "```\n",
+ "%%bash\n",
+ "example command\n",
+ "```\n",
+ "Or\n",
+ "```\n",
+ "!example command\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The first step is to install mambaforge, which is the newer and faster version of the conda package manager."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1682515170386
+ }
+ },
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! mamba info --envs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Next, we will install the necessary packages into the current environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! mamba install -c conda-forge -c bioconda -c defaults -y sra-tools pigz pbzip2 fastp fastqc multiqc salmon"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Create a set of directories to store the reads, reference sequence files, and output files.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bash\n",
+ "mkdir -p data\n",
+ "mkdir -p data/raw_fastq\n",
+ "mkdir -p data/trimmed\n",
+ "mkdir -p data/fastqc\n",
+ "mkdir -p data/aligned\n",
+ "mkdir -p data/reference\n",
+ "mkdir -p data/quants"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Copy FASTQ Files\n",
+ "In order for this tutorial to run quickly, we will only analyze 50,000 reads from a sample from both sample groups instead of analyzing all the reads from all six samples. These files have been posted on a Azure Blob storage containers that we made publicly accessible."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349122_1.fastq --output data/raw_fastq/SRR13349122_1.fastq\n",
+ "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349122_2.fastq --output data/raw_fastq/SRR13349122_2.fastq\n",
+ "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349128_1.fastq --output data/raw_fastq/SRR13349128_1.fastq\n",
+ "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349128_2.fastq --output data/raw_fastq/SRR13349128_2.fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Copy reference transcriptome files that will be used by Salmon\n",
+ "Salmon is a tool that aligns RNA-Seq reads to a set of transcripts rather than the entire genome."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/reference/M_chelonae_transcripts.fasta --output data/reference/M_chelonae_transcripts.fasta\n",
+ "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/reference/decoys.txt --output data/reference/decoys.txt"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1682517580413
+ },
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ls data/raw_fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Trim our data with Fastp"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [],
+ "source": [
+ "! fastp -i data/raw_fastq/SRR13349122_1.fastq -I data/raw_fastq/SRR13349122_2.fastq -o data/trimmed/SRR13349122_1_trimmed.fastq -O data/trimmed/SRR13349122_2_trimmed.fastq\n",
+ "! fastp -i data/raw_fastq/SRR13349128_1.fastq -I data/raw_fastq/SRR13349128_2.fastq -o data/trimmed/SRR13349128_1_trimmed.fastq -O data/trimmed/SRR13349128_2_trimmed.fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Run FastQC\n",
+ "FastQC is an invaluable tool that allows you to evaluate whether there are problems with a set of reads. For example, it will provide a report of whether there is any bias in the sequence composition of the reads."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Once FastQC is done running, look at the outputs in data/fastqc. What can you say about the quality of the two samples we are looking at here? "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bash\n",
+ "fastqc -o data/fastqc data/trimmed/SRR13349122_1_trimmed.fastq\n",
+ "fastqc -o data/fastqc data/trimmed/SRR13349128_1_trimmed.fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Run MultiQC\n",
+ "MultiQC reads in the FastQQ reports and generate a compiled report for all the analyzed FASTQ files.\n",
+ "Just as with fastqc, we can look at the mulitqc results after it finishes at data/multiqc_data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1682517201690
+ }
+ },
+ "outputs": [],
+ "source": [
+ "! multiqc -f data/fastqc -f\n",
+ "#! mv multiqc_data/ data/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Index the Transcriptome so that Trimmed Reads Can Be Mapped Using Salmon"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! salmon index -t data/reference/M_chelonae_transcripts.fasta -p 8 -i data/reference/transcriptome_index --decoys data/reference/decoys.txt -k 31 --keepDuplicates"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Run Salmon to Map Reads to Transcripts and Quantify Expression Levels\n",
+ "Salmon aligns the trimmed reads to the reference transcriptome and generates the read counts per transcript. In this analysis, each gene has a single transcript."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "%%bash\n",
+ "salmon quant -i data/reference/transcriptome_index -l SR -r data/trimmed/SRR13349122_1_trimmed.fastq -p 8 --validateMappings -o data/quants/SRR13349122_quant\n",
+ "salmon quant -i data/reference/transcriptome_index -l SR -r data/trimmed/SRR13349128_1_trimmed.fastq -p 8 --validateMappings -o data/quants/SRR13349128_quant"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1682518630201
+ },
+ "jupyter": {
+ "outputs_hidden": false,
+ "source_hidden": false
+ },
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ls data/quants/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Report the top 10 most highly expressed genes in the samples"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Top 10 most highly expressed genes in the wild-type sample.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! sort -nrk 4,4 data/quants/SRR13349122_quant/quant.sf | head -10"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Top 10 most highly expressed genes in the double lysogen sample.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!sort -nrk 4,4 data/quants/SRR13349128_quant/quant.sf | head -10"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Report the expression of a putative acyl-ACP desaturase (BB28_RS16545) that was downregulated in the double lysogen relative to wild-type\n",
+ "A acyl-transferase was reported to be downregulated in the double lysogen as shown in the table of the top 20 upregulated and downregulated genes from the paper describing the study."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Use `grep` to report the expression in the wild-type sample. The fields in the Salmon `quant.sf` file are as follows. The level of expression is reported in the Transcripts Per Million (`TPM`) and number of reads (`NumReads`) fields: \n",
+ "`Name Length EffectiveLength TPM NumReads`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!grep 'BB28_RS16545' data/quants/SRR13349122_quant/quant.sf"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Use `grep` to report the expression in the double lysogen sample. The fields in the Salmon `quant.sf` file are as follows. The level of expression is reported in the Transcripts Per Million (`TPM`) and number of reads (`NumReads`) fields: \n",
+ "`Name Length EffectiveLength TPM NumReads`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!grep 'BB28_RS16545' data/quants/SRR13349128_quant/quant.sf"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "Here you learned how to import data to and from a Blob storage container and then use fastq files to run basic RNAseq analysis! "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Clean Up\n",
+ "Make sure you stop your compute instance and if desired, delete the resource group associated with this tutorial."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernel_info": {
+ "name": "python3"
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.13"
+ },
+ "microsoft": {
+ "ms_spell_check": {
+ "ms_spell_check_language": "en"
+ }
+ },
+ "nteract": {
+ "version": "nteract-front-end@1.0.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/tutorials/notebooks/rnaseq-myco-tutorial-main/images/count-workflow.png b/notebooks/rnaseq-myco-tutorial-main/images/count-workflow.png
similarity index 100%
rename from tutorials/notebooks/rnaseq-myco-tutorial-main/images/count-workflow.png
rename to notebooks/rnaseq-myco-tutorial-main/images/count-workflow.png
diff --git a/tutorials/notebooks/rnaseq-myco-tutorial-main/images/rnaseq-workflow.png b/notebooks/rnaseq-myco-tutorial-main/images/rnaseq-workflow.png
similarity index 100%
rename from tutorials/notebooks/rnaseq-myco-tutorial-main/images/rnaseq-workflow.png
rename to notebooks/rnaseq-myco-tutorial-main/images/rnaseq-workflow.png
diff --git a/tutorials/notebooks/rnaseq-myco-tutorial-main/images/table-cushman.png b/notebooks/rnaseq-myco-tutorial-main/images/table-cushman.png
similarity index 100%
rename from tutorials/notebooks/rnaseq-myco-tutorial-main/images/table-cushman.png
rename to notebooks/rnaseq-myco-tutorial-main/images/table-cushman.png
diff --git a/tutorials/README.md b/tutorials/README.md
index 357a135..fe21dc9 100644
--- a/tutorials/README.md
+++ b/tutorials/README.md
@@ -83,3 +83,4 @@ These publicly available datasets can save you time on data discovery and prepar
+ The [COVID-19 Data Lake](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-covid-19-data-lake) contains COVID-19 related datasets from various sources. It covers testing and patient outcome tracking data, social distancing policy, hospital capacity and mobility.
+ In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the [COVID-19 Open Research Dataset (CORD-19)](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-covid-19-open-research?tabs=azure-storage). This dataset is a free resource of over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset mobilizes researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.
+ [The Genomics Data Lake](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-genomics-data-lake) provides various public datasets that you can access for free and integrate into your genomics analysis workflows and applications. The datasets include genome sequences, variant info, and subject/sample metadata in BAM, FASTA, VCF, CSV file formats: [Illumina Platinum Genomes](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-illumina-platinum-genomes), [Human Reference Genomes](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-human-reference-genomes), [ClinVar Annotations](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-clinvar-annotations), [SnpEff](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-snpeff), [Genome Aggregation Database (gnomAD)](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-gnomad), [1000 Genomes](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-1000-genomes), [OpenCravat](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-open-cravat), [ENCODE](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-encode), [GATK Resource Bundle](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-gatk-resource-bundle).
+
diff --git a/tutorials/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb b/tutorials/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb
deleted file mode 100644
index 48b8141..0000000
--- a/tutorials/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb
+++ /dev/null
@@ -1,2002 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "1452463e",
- "metadata": {},
- "source": [
- "## Spleen Model With NVIDIA Pretrain\n",
- "- Uses Unet architecture\n",
- "- Pretrained model at: https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_spleen_ct_segmentation"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f59ba435",
- "metadata": {},
- "source": [
- "##### Uncomment below to install all dependencies"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "82db674f",
- "metadata": {},
- "outputs": [],
- "source": [
- "#!pip install 'monai[all]'\n",
- "#!pip install matplotlib "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "bb1228b3",
- "metadata": {},
- "outputs": [],
- "source": [
- "%matplotlib inline"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "540e5d47",
- "metadata": {},
- "outputs": [],
- "source": [
- "# MONAI version: 0.6.0+38.gf6ad4ba5\n",
- "# Numpy version: 1.21.1\n",
- "# Pytorch version: 1.9.0\n",
- "# Pytorch Ignite version: 0.4.5\n",
- "# Nibabel version: 3.2.1\n",
- "# scikit-image version: 0.18.2\n",
- "# Pillow version: 8.3.1\n",
- "# Tensorboard version: 2.5.0\n",
- "# gdown version: 3.13.0\n",
- "# TorchVision version: 0.10.0+cu111\n",
- "# tqdm version: 4.61.2\n",
- "# lmdb version: 1.2.1\n",
- "# psutil version: 5.8.0\n",
- "# pandas version: 1.3.0\n",
- "# einops version: 0.3.0"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "07510582",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "MONAI version: 0.8.1\n",
- "Numpy version: 1.21.1\n",
- "Pytorch version: 1.9.0\n",
- "MONAI flags: HAS_EXT = False, USE_COMPILED = False\n",
- "MONAI rev id: 71ff399a3ea07aef667b23653620a290364095b1\n",
- "\n",
- "Optional dependencies:\n",
- "Pytorch Ignite version: 0.4.8\n",
- "Nibabel version: 3.2.1\n",
- "scikit-image version: 0.18.2\n",
- "Pillow version: 8.3.1\n",
- "Tensorboard version: 2.5.0\n",
- "gdown version: 3.13.0\n",
- "TorchVision version: 0.10.0+cu111\n",
- "tqdm version: 4.61.2\n",
- "lmdb version: 1.2.1\n",
- "psutil version: 5.8.0\n",
- "pandas version: 1.3.0\n",
- "einops version: 0.3.0\n",
- "transformers version: 4.18.0\n",
- "mlflow version: 1.25.1\n",
- "\n",
- "For details about installing the optional dependencies, please visit:\n",
- " https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies\n",
- "\n"
- ]
- }
- ],
- "source": [
- "import os\n",
- "import tempfile\n",
- "import glob\n",
- "\n",
- "import matplotlib.pyplot as plt\n",
- "#import plotly.graph_objects as go\n",
- "import torch\n",
- "import numpy as np\n",
- "\n",
- "from monai.apps import download_and_extract\n",
- "from monai.networks.nets import UNet\n",
- "from monai.networks.layers import Norm\n",
- "from monai.losses import DiceFocalLoss\n",
- "from monai.metrics import DiceMetric\n",
- "from monai.inferers import sliding_window_inference\n",
- "from monai.data import (\n",
- " LMDBDataset,\n",
- " DataLoader,\n",
- " decollate_batch,\n",
- " ImageDataset,\n",
- " Dataset\n",
- ")\n",
- "from monai.apps import load_from_mmar\n",
- "from monai.transforms import (\n",
- " AsDiscrete,\n",
- " EnsureChannelFirstd,\n",
- " Compose,\n",
- " LoadImaged,\n",
- " ScaleIntensityRanged,\n",
- " Spacingd,\n",
- " Orientationd,\n",
- " CropForegroundd,\n",
- " RandCropByPosNegLabeld,\n",
- " RandAffined,\n",
- " RandRotated,\n",
- " EnsureType,\n",
- " EnsureTyped,\n",
- ")\n",
- "from monai.utils import first, set_determinism\n",
- "from monai.apps.mmars import RemoteMMARKeys\n",
- "from monai.config import print_config\n",
- "\n",
- "print_config()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6f523cbf",
- "metadata": {},
- "source": [
- "#### Running a pretrained model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "0be7401d",
- "metadata": {},
- "outputs": [],
- "source": [
- "PRETRAINED = True"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e9f3e5f3",
- "metadata": {},
- "source": [
- "#### Create the directory for storing data"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "id": "311c3282",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "monai_data/\n"
- ]
- }
- ],
- "source": [
- "directory = \"monai_data/\"\n",
- "root_dir = tempfile.mkdtemp() if directory is None else directory\n",
- "print(root_dir)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "38463a18",
- "metadata": {},
- "source": [
- "#### Download the public dataset"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "id": "da7cfede",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "2022-04-27 14:49:41,401 - INFO - Verified 'Task09_Spleen.tar', md5: 410d4a301da4e5b2f6f86ec3ddba524e.\n",
- "2022-04-27 14:49:41,402 - INFO - File exists: monai_data/Task09_Spleen.tar, skipped downloading.\n",
- "2022-04-27 14:49:41,403 - INFO - Non-empty folder exists in monai_data/Task09_Spleen, skipped extracting.\n"
- ]
- }
- ],
- "source": [
- "resource = \"https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar\"\n",
- "md5 = \"410d4a301da4e5b2f6f86ec3ddba524e\"\n",
- "\n",
- "compressed_file = os.path.join(root_dir, \"Task09_Spleen.tar\")\n",
- "download_and_extract(resource, compressed_file, root_dir, md5)\n",
- "data_dir = os.path.join(root_dir, \"Task09_Spleen\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "fae7c51b",
- "metadata": {},
- "source": [
- "#### Create Date Dictionaries and separate files from training and validation"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "id": "2515b177",
- "metadata": {},
- "outputs": [],
- "source": [
- "train_images = sorted(\n",
- " glob.glob(os.path.join(data_dir, \"imagesTr\", \"*.nii.gz\")))\n",
- "train_labels = sorted(\n",
- " glob.glob(os.path.join(data_dir, \"labelsTr\", \"*.nii.gz\")))\n",
- "data_dicts = [\n",
- " {\"image\": image_name, \"label\": label_name}\n",
- " for image_name, label_name in zip(train_images, train_labels)\n",
- "]\n",
- "train_files, val_files = data_dicts[:-9], data_dicts[-9:]"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "974fc5aa",
- "metadata": {},
- "source": [
- "#### Define your transformations for training and validation"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "id": "2357d35d",
- "metadata": {},
- "outputs": [],
- "source": [
- "train_transforms = Compose( #Transformations for training dataset\n",
- " [\n",
- " LoadImaged(keys=[\"image\", \"label\"]), #Load dictionary based images and labels\n",
- " EnsureChannelFirstd(keys=[\"image\", \"label\"]), #Ensures the first channel of each image is the channel dimension\n",
- " Spacingd(keys=[\"image\", \"label\"], pixdim=( #Change spacing of voxels to be same across images\n",
- " 1.5, 1.5, 2.0), mode=(\"bilinear\", \"nearest\")),\n",
- " Orientationd(keys=[\"image\", \"label\"], axcodes=\"RAS\"), #Correct the orientation of images (Right, Anterior, Superior)\n",
- " ScaleIntensityRanged( #Scale intensity of all images (For images only and not labels)\n",
- " keys=[\"image\"], a_min=-57, a_max=164,\n",
- " b_min=0.0, b_max=1.0, clip=True,\n",
- " ),\n",
- " CropForegroundd(keys=[\"image\", \"label\"], source_key=\"image\"), #Crop foreground of image\n",
- " RandCropByPosNegLabeld( #Randomly crop fixed sized region\n",
- " keys=[\"image\", \"label\"],\n",
- " label_key=\"label\",\n",
- " spatial_size=(96, 96, 96),\n",
- " pos=1,\n",
- " neg=1,\n",
- " num_samples=4,\n",
- " image_key=\"image\",\n",
- " image_threshold=0,\n",
- " ),\n",
- " RandAffined( #Do a random affine transformation with some probability\n",
- " keys=['image', 'label'],\n",
- " mode=('bilinear', 'nearest'),\n",
- " prob=0.5,\n",
- " spatial_size=(96, 96, 96),\n",
- " rotate_range=(np.pi/18, np.pi/18, np.pi/5),\n",
- " scale_range=(0.05, 0.05, 0.05)\n",
- " ),\n",
- " EnsureTyped(keys=[\"image\", \"label\"]),\n",
- " ]\n",
- ")\n",
- "val_transforms = Compose( #Transformations for testing dataset\n",
- " [\n",
- " LoadImaged(keys=[\"image\", \"label\"]),\n",
- " EnsureChannelFirstd(keys=[\"image\", \"label\"]),\n",
- " Spacingd(keys=[\"image\", \"label\"], pixdim=(\n",
- " 1.5, 1.5, 2.0), mode=(\"bilinear\", \"nearest\")),\n",
- " Orientationd(keys=[\"image\", \"label\"], axcodes=\"RAS\"),\n",
- " ScaleIntensityRanged(\n",
- " keys=[\"image\"], a_min=-57, a_max=164,\n",
- " b_min=0.0, b_max=1.0, clip=True,\n",
- " ),\n",
- " RandRotated(\n",
- " keys=['image', 'label'],\n",
- " mode=('bilinear', 'nearest'),\n",
- " range_x=np.pi/18,\n",
- " range_y=np.pi/18,\n",
- " range_z=np.pi/5,\n",
- " prob=1.0,\n",
- " padding_mode=('reflection', 'reflection'),\n",
- " ),\n",
- " CropForegroundd(keys=[\"image\", \"label\"], source_key=\"image\"),\n",
- " EnsureTyped(keys=[\"image\", \"label\"]),\n",
- " ]\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "id": "ada5757a",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "[{'image': 'monai_data/Task09_Spleen/imagesTr/spleen_56.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_56.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_59.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_59.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_6.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_6.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_60.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_60.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_61.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_61.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_62.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_62.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_63.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_63.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_8.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_8.nii.gz'},\n",
- " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_9.nii.gz',\n",
- " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_9.nii.gz'}]"
- ]
- },
- "execution_count": 10,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "val_files"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "ba3c7695",
- "metadata": {},
- "source": [
- "#### Visualize Image and Label (example)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "id": "689eea4e",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "image shape: torch.Size([239, 239, 113]), label shape: torch.Size([239, 239, 113])\n"
- ]
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "check_ds = Dataset(data=val_files, transform=val_transforms)\n",
- "check_loader = DataLoader(check_ds, batch_size=1)\n",
- "check_data = first(check_loader)\n",
- "image, label = (check_data[\"image\"][0][0], check_data[\"label\"][0][0])\n",
- "print(f\"image shape: {image.shape}, label shape: {label.shape}\")\n",
- "# plot the slice [:, :, 80]\n",
- "plt.figure(\"check\", (12, 6))\n",
- "plt.subplot(1, 2, 1)\n",
- "plt.title(\"image\")\n",
- "plt.imshow(image[:, :, 80], cmap=\"gray\")\n",
- "plt.subplot(1, 2, 2)\n",
- "plt.title(\"label\")\n",
- "plt.imshow(label[:, :, 80])\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f45ba707",
- "metadata": {},
- "source": [
- "#### Use a dataloader to load files\n",
- " - Ability to use LMDB (Lightning Memory-Mapped Database)\n",
- " - Here is where transforms take place and they happen on both images and labels"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "id": "fe3285d0",
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "100%|██████████| 32/32 [00:00<00:00, 57113.93it/s]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Accessing lmdb file: /home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb.\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "100%|██████████| 32/32 [00:00<00:00, 47679.48it/s]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "{'map_addr': 0, 'map_size': 1099511627776, 'last_pgno': 941102, 'last_txnid': 100, 'max_readers': 126, 'num_readers': 0, 'size': 32, 'filename': '/home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb'}\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "100%|██████████| 9/9 [00:00<00:00, 10999.05it/s]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Accessing lmdb file: /home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb.\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "100%|██████████| 9/9 [00:00<00:00, 17739.07it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "{'map_addr': 0, 'map_size': 1099511627776, 'last_pgno': 941102, 'last_txnid': 100, 'max_readers': 126, 'num_readers': 0, 'size': 9, 'filename': '/home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb'}\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n"
- ]
- }
- ],
- "source": [
- "train_ds = LMDBDataset(data=train_files, transform=train_transforms, cache_dir=root_dir)\n",
- "# initialize cache and print meta information\n",
- "print(train_ds.info())\n",
- "\n",
- "# use batch_size=2 to load images and use RandCropByPosNegLabeld\n",
- "# to generate 2 x 4 images for network training\n",
- "train_loader = DataLoader(train_ds, batch_size=2, shuffle=True, num_workers=2)\n",
- "\n",
- "# the validation data loader will be created on the fly to ensure \n",
- "# a deterministic validation set for demo purpose.\n",
- "val_ds = LMDBDataset(data=val_files, transform=val_transforms, cache_dir=root_dir)\n",
- "# initialize cache and print meta information\n",
- "print(val_ds.info())"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "id": "455cbcdc",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "{'map_addr': 0, 'map_size': 1099511627776, 'last_pgno': 941102, 'last_txnid': 100, 'max_readers': 126, 'num_readers': 0, 'size': 32, 'filename': '/home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb'}\n"
- ]
- }
- ],
- "source": [
- "print(train_ds.info())"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a77e7856",
- "metadata": {},
- "source": [
- "#### Now we want to download the pretrained model from NVIDIA"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "id": "8539fb7d",
- "metadata": {},
- "outputs": [],
- "source": [
- "mmar = {\n",
- " RemoteMMARKeys.ID: \"clara_pt_spleen_ct_segmentation_1\",\n",
- " RemoteMMARKeys.NAME: \"clara_pt_spleen_ct_segmentation\",\n",
- " RemoteMMARKeys.FILE_TYPE: \"zip\",\n",
- " RemoteMMARKeys.HASH_TYPE: \"md5\",\n",
- " RemoteMMARKeys.HASH_VAL: None,\n",
- " RemoteMMARKeys.MODEL_FILE: os.path.join(\"models\", \"model.pt\"),\n",
- " RemoteMMARKeys.CONFIG_FILE: os.path.join(\"config\", \"config_train.json\"),\n",
- " RemoteMMARKeys.VERSION: 2,\n",
- "}"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "id": "de7fb262",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "'clara_pt_spleen_ct_segmentation'"
- ]
- },
- "execution_count": 15,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "mmar['name']"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "id": "bf96f9f9",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "using a pretrained model.\n",
- "2022-04-27 14:49:45,704 - INFO - Expected md5 is None, skip md5 check for file monai_data/clara_pt_spleen_ct_segmentation_2.zip.\n",
- "2022-04-27 14:49:45,705 - INFO - File exists: monai_data/clara_pt_spleen_ct_segmentation_2.zip, skipped downloading.\n",
- "2022-04-27 14:49:45,706 - INFO - Non-empty folder exists in monai_data/clara_pt_spleen_ct_segmentation, skipped extracting.\n",
- "2022-04-27 14:49:45,707 - INFO - \n",
- "*** \"clara_pt_spleen_ct_segmentation\" available at monai_data/clara_pt_spleen_ct_segmentation.\n",
- "2022-04-27 14:49:49,353 - INFO - *** Model: \n",
- "2022-04-27 14:49:49,400 - INFO - *** Model params: {'dimensions': 3, 'in_channels': 1, 'out_channels': 2, 'channels': [16, 32, 64, 128, 256], 'strides': [2, 2, 2, 2], 'num_res_units': 2, 'norm': 'batch'}\n",
- "2022-04-27 14:49:49,411 - INFO - \n",
- "---\n",
- "2022-04-27 14:49:49,412 - INFO - For more information, please visit https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_spleen_ct_segmentation\n",
- "\n"
- ]
- }
- ],
- "source": [
- "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\") #torch.device(\"cpu\")\n",
- "if PRETRAINED:\n",
- " print(\"using a pretrained model.\")\n",
- " try: #MONAI=0.8\n",
- " unet_model = load_from_mmar(\n",
- " item = mmar['name'], \n",
- " mmar_dir=root_dir,\n",
- " map_location=device,\n",
- " version=mmar['version'],\n",
- " pretrained=True)\n",
- " except: #MONAI<0.8\n",
- " unet_model = load_from_mmar(\n",
- " mmar, \n",
- " mmar_dir=root_dir,\n",
- " map_location=device,\n",
- " pretrained=True)\n",
- " model = unet_model\n",
- "else: \n",
- " print(\"using a randomly init. model.\")\n",
- " model = UNet(\n",
- " dimensions=3,\n",
- " in_channels=1,\n",
- " out_channels=2,\n",
- " channels=(16, 32, 64, 128, 256),\n",
- " strides=(2, 2, 2, 2),\n",
- " num_res_units=2,\n",
- " norm=Norm.BATCH,\n",
- " )\n",
- "\n",
- "model = model.to(device)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "39910557",
- "metadata": {},
- "source": [
- "### This will be our test file we will view for reference\n",
- " - Here we see how our initial model appears to perform"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 17,
- "id": "4be7eb8f",
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "100%|██████████| 1/1 [00:00<00:00, 4639.72it/s]"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Accessing lmdb file: /home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb.\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n"
- ]
- }
- ],
- "source": [
- "test_file = data_dicts[20:21]\n",
- "test_ds = LMDBDataset(data=test_file, transform=None, cache_dir=root_dir)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2544a774",
- "metadata": {},
- "source": [
- "#### We use a sliding window technique to search the image"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 18,
- "id": "16fd4e94",
- "metadata": {},
- "outputs": [],
- "source": [
- "num_classes=2\n",
- "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
- "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
- "model.eval()\n",
- "with torch.no_grad():\n",
- " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
- " test_inputs, test_labels = (\n",
- " data[\"image\"].to(device),\n",
- " data[\"label\"].to(device),\n",
- " )\n",
- " roi_size = (160, 160, 160)\n",
- " sw_batch_size = 4\n",
- " test_outputs = sliding_window_inference(\n",
- " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
- " test_outputspre = [post_pred(i) for i in decollate_batch(test_outputs)] # Decollate our results\n",
- " test_labelspre = [post_label(i) for i in decollate_batch(test_labels)]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 19,
- "id": "9782ec96",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 19,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVcAAAGrCAYAAAB0YdR6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAapElEQVR4nO3df7SVdb3g8fcHUBh+JDCkAVL+uDCprRUlaaNpv9F+zIB3pYNON+4ylt1ZNpM54whac81mSu+6dZeTM93Bm3MpTK5TuWJVN38l481SxFJTGYTUhECIDEFUEPjMH/vBNngO58A53/Psvc/7tdZeZ59n//rsvTZvnv3sZ+8TmYkkqX8NqXsASepExlWSCjCuklSAcZWkAoyrJBVgXCWpAOOqjhERV0XE4gLXe0xEZEQM6+/rVucyruo3EbEsIv4QEcN7ef4/j4iflp6r6fauiIinIuKFiFgXEf8wULetwce4ql9ExDHAGUAC/7reaV4rIuYCfwZ8IDNHAzOAu+qdSp3MuKq/fAK4D/h7YG7zCRExJSK+FxG/i4jfR8T1EXEC8LfAv6zWJLdU510WEfOaLrvP2m1EXBcRayNia0Q8GBFn9HK+dwC3ZeavATLz2cxc2HS9yyLiyxGxPCKej4jvR8T4rq4oIo6IiG9ExIaI+G1E/NeIGNp0+oURsbJai78tIt7UdFpGxF9ExOrq9P8REdHL+6A2YlzVXz4B3FQdzoqIowCq6PwA+A1wDDAZWJKZK4G/AH6emaMzc2wvb+cBYDowHvg28H8iYkQvLncf8ImIuCwiZjTHcL/7cCEwCdgF/PdurmtRdfqfAG8DZgLzACJiNnAF8KfA64F/Am7e7/IfpRH7twLnAWf1Yn61GeOqPouIdwFvAm7JzAeBXwMXVCefQiNWl2Xm9sx8OTMPeTtrZi7OzN9n5q7M/AowHPgXvbkc8O9phOz/ApsiYv5+Z/tWZj6amduBzwPn7R/h6j+NDwGXVPdnE/A3wJzqLJ8CvpyZKzNzF/AlYHrz2itwTWZuycxngLtp/GehDmNc1R/mArdn5ubq92/zx00DU4DfVKHps4j4j9VL7uerTQlHABN6c9nMvCkzPwCMpbHWfHVENK81rm06/hvgsC6u+03V8g0RsaWa4X8BRzadfl3Tac8BQWONfa9nm46/CIzuzfxqL+5aoj6JiH9G46Xt0IjYG43hwNiIeCuNYL0xIoZ1EdiuvpJtOzCy6fc3NN3WGcDlwPuBxzJzT0T8gUa8ei0zX6GxOeFy4C3AbdVJU5rO9kbgFWDzfsvXAjuACd38h7EW+G+ZedPBzKTO45qr+mo2sBs4kcbL2+nACTS2NX4CWA5sAK6JiFERMSIiTq8uuxE4OiIOb7q+h4A/jYiREfEnwCebThtDY1vn74BhEfFfgNf1ZsjqjbGPRMSYiBgSER8CTgLubzrbxyPixIgYCVwNfCczdzdfT2ZuAG4HvhIRr6uu6/iIeHd1lr8FFkTESdXtHhER5/ZmRnUW46q+mgv878x8pnoH/tnMfBa4Hvi3NNYq/xWNN3+eAdYB/6a67E+Ax4BnI2LvJoW/AXbSCO8iGm+Q7XUb8I/AEzRetr/Mvi/lD2QrjTeangG2AH8F/Lv9tv9+i8beDs8CI4D/0M11fQI4HHgc+APwHWAiQGbeClwLLImIrcCjNLbRapAJvyxbauyKBSzOzL+rexZ1BtdcJakA4ypJBRTbLBARZwPXAUOBv8vMa4rckCS1oCJxrXa8fgL4II03MB4Azs/Mx/v9xiSpBZXaz/UUYE1mPgkQEUuAWTTeXX2NiPBdNUltKTO73M+61DbXyey7i8w69v2EChFxUUSsiIgVhWaQpNqUWnPtquT7rJ1W30i0EFxzldR5Sq25rmPfjwweDawvdFuS1HJKxfUBYGpEHFt9tHEOsLTQbUlSyymyWSAzd0XEp2l8XHEocGNmPlbitiSpFbXEx1/d5iqpXQ303gKSNKgZV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpgGF1DyD1xXXXXccFF1zQp+tYvHgxn/3sZ/tpIqnBuKrljRs3jmuvvbbL02bOnMmECRP6dP3nnHMOo0aNAuDzn/88Gzdu7NP1SQCRmXXPQETUP4Razpvf/GZGjx7N8ccfz5IlSwbkNqdPn87DDz88ILelzpCZ0dVy11zVsm6//XamTJkyoLcZ0eW/E+mg+YaWajd69Gi2bNnC1q1b9zlMmjRpwGf56U9/ype//OUBv111HtdcVZsrr7ySd7zjHQwfPpzXve51LbHWOGrUKI444oi6x1AHMK4acBHBBz7wAebOncvUqVPrHkcqwrhqQAwdOvTV4yNGjOCHP/whhx12WI0TSWX1Ka4R8TSwDdgN7MrMGRExHvgH4BjgaeC8zPxD38ZUOxsyZAibN29mxIgRry4zrOp0/bHm+t7M3Nz0+3zgrsy8JiLmV79f3g+3ozZz6aWXMnv2bCKCMWPG7LP2KnW6EpsFZgHvqY4vApZhXAel0047jTPOOKPuMaRa9HVXrARuj4gHI+KiatlRmbkBoPp5ZB9vQ20kIhg5ciQjR470pb8Gtb6uuZ6emesj4kjgjoj4f729YBXji3o8o9rK9OnTWb58OdDY1ioNVn2Ka2aur35uiohbgVOAjRExMTM3RMREYFM3l10ILAQ//toJfvzjHzNp0iRGjhzJsGHtuxPK/PnzWbp0ad1jqAMc8ncLRMQoYEhmbquO3wFcDbwf+H3TG1rjM/M/93BdxrXN/e53v+vzF6i0Ar9bQAerxHcLHAXcWn2qZhjw7cz8cUQ8ANwSEZ8EngHO7cNtqIWNGDGCkSNHAu2/CSAz2bFjB7t37657FHUIvxVLh+xLX/oSl1/e2BGk3eO6detWxo8fb1x10PxWLPWrO++8kxkzZrR9VAHuvfderrjiCsOqftX+/zJUi2nTpnXEF5z8/Oc/Z9GiRdxzzz11j6IO45qrem3UqFGvvmnVznsEQGMb6wsvvMCll17KfffdV/c46kDt/S9EA+riiy/u9s+ttJuXXnqJsWPHsmfPnrpHUYdys4AGnQceeIDTTjvNsKoo11zVK5dccgnnntv+e9UtW7aMxYsXuy+rinNXLPXKqlWrmDZtWt1jHJKdO3eydetWAC644ALuuOOOmidSJ3FXLA1av/zlL3nnO99Z9xgaZNzmqm59/OMfZ8uWLWzZsoXjjjuu7nEOyYIFC5g1a1bdY2gQcs1V3Ro5cmTb78u6adMmNm7cWPcYGoSMqzrOjh07ePbZZwF47rnnap5Gg5VxVZcioi3/LEtmsmbNGt7ylrfUPYoGOeOqLi1btoxTTz217jEO2he+8AWuu+66useQfENLXRs9ejTDhw+ve4yDtm3bNrZs2VL3GJJrrtpXRHDOOecwduzYukc5aKtXr2b9+vV1jyEBfohATSKCI444gk2bNrXVHxfMTHbt2sWkSZPYvHlzzxeQ+lF3HyJws4Be9bGPfaztwgqwbt06xo0bZ1jVUoyrXjV06NC2CyvAnj172L59e91jSPswrgLgfe97H+9973vrHuOgrV+/nrvvvrvuMaTXyszaD0B6qPewbNmybEdXX3117Y+dh8F9yG665pqrJBXgrliD3JAhQ3jwwQeZOnVq3aNIHcW4DmITJ07kwgsv5KSTTmrLN7LuueceHnnkkbrHkLrkfq6D2Ac/+EFuv/32usc4aHv27OGll17ihBNOYO3atXWPo0Eu/bJsdYrNmzfzhje8gVZYMZC64xtaajv5x71MpJblmusgdd5553H++efXPcZBW7NmDd/5znfqHkPqkWuug9T555/P7Nmz6x7joN12220sWLCg7jGkHhlXSSrAzQJqG+eddx533XVX3WNIvWJc1fJefvllrr/+en7yk5/4N7HUNozrIHTyySfz+te/vu4xeuXFF19k9erVXHbZZXWPIh0UP0QwCG3dupUxY8bUPUav3HzzzVxwwQV1jyF1yw8RqO3MmTOHH/3oR3WPIR0S9xZQy9m5cydXXnkly5YtY9u2bXWPIx0SNwsMQq28WWD79u2sWrWKGTNm+CkstYXuNgu45qqW8rOf/YyTTz7ZsKrtGVdJKsC4SlIBxnUQWr58OVu2bOnTdaxevZqNGzf2z0BNJk+ezNy5c5k7dy7jxo3r9+uXBkx3f1xrIA+0wB8ZG2yHW2+9tcc//rd79+7cuXNnl4fTTz89r7766ld/37Nnz0H+acGefeQjH8nDDz+89sfKg4cDHbrtWm8DWPJQ94MzGA+9iet9992Xhx9+eJeHiMihQ4e++vvmzZsPvaLd2LVrV37zm9+s/bHy4OFAh+665q5Yg9Rb3/pWpk2bdsDzPPPMM9x///29ur6zzz6bBQsWcOaZZ/bHeK96/vnnefjhh3n3u9/dr9cr9ZfsZlcs46p+M2/evFe/gPuMM87otz96+NJLL/G5z32OG2+8sc/biqX+Zlw1oDZs2MCECRMYNqz/PmE9c+ZM7r33Xl588cV+u06pr7qLq3sLqIhJkyaxdOnSfr3O2267jWuvvbZfr1MqxbiqiMzksssu4/3vfz8zZ85kx44dfb7OiCCiy5UEqeX4rVgq5sknn+TJJ58kIli8eDGjR4/mmGOO4dRTT617NKk446riMpN58+YB8NGPfpSbbrqJMWPGuBaqjuZmAQ2oH/zgBxx11FHs2rWr7lGkooyrBtyOHTuYPn06K1eurHsUqRg3C2jAZSaPP/44X/va1zjhhBMYMWIE8+bN63Yzwcsvv8wNN9wAwJ133jmQo0qHzP1cVbtRo0bxxBNPMGTIEEaNGvWaL/J+/vnnGTt2bD3DST1wP1e1rO3btzN58mQmTpzIF7/4xbrHkfqFa65qKaNGjWLChAn7LHvllVdYv359TRNJB+bHXyWpADcLSNIA6jGuEXFjRGyKiEeblo2PiDsiYnX1c1zTaQsiYk1ErIqIs0oNLkmtrDdrrn8PnL3fsvnAXZk5Fbir+p2IOBGYA5xUXeZ/RsTQfptWktpEj3HNzHuA5/ZbPAtYVB1fBMxuWr4kM3dk5lPAGuCU/hlVktrHoW5zPSozNwBUP4+slk8G1jadb1217DUi4qKIWBERKw5xBklqWf39Ca2u3jXrck+AzFwILAT3FpDUeQ51zXVjREwEqH5uqpavA6Y0ne9owB0UJQ06hxrXpcDc6vhc4PtNy+dExPCIOBaYCizv24iS1H563CwQETcD7wEmRMQ64C+Ba4BbIuKTwDPAuQCZ+VhE3AI8DuwCLs7M3YVml6SW5Se0JKkP/ISWJA0g4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqYAe4xoRN0bEpoh4tGnZVRHx24h4qDp8uOm0BRGxJiJWRcRZpQaXpFYWmXngM0ScCbwAfDMz31Ituwp4ITP/er/zngjcDJwCTALuBKZl5u4ebuPAQ0hSi8rM6Gp5j2uumXkP8Fwvb2cWsCQzd2TmU8AaGqGVpEGlL9tcPx0Rj1SbDcZVyyYDa5vOs65a9hoRcVFErIiIFX2YQZJa0qHG9evA8cB0YAPwlWp5V6vHXb7kz8yFmTkjM2cc4gyS1LIOKa6ZuTEzd2fmHuAG/vjSfx0wpemsRwPr+zaiJLWfQ4prRExs+vUcYO+eBEuBORExPCKOBaYCy/s2oiS1n2E9nSEibgbeA0yIiHXAXwLviYjpNF7yPw18CiAzH4uIW4DHgV3AxT3tKSBJnajHXbEGZAh3xZLUpg55VyxJ0sEzrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBPcY1IqZExN0RsTIiHouIz1TLx0fEHRGxuvo5rukyCyJiTUSsioizSt4BSWpFkZkHPkPERGBiZv4iIsYADwKzgT8HnsvMayJiPjAuMy+PiBOBm4FTgEnAncC0zNx9gNs48BCS1KIyM7pa3uOaa2ZuyMxfVMe3ASuBycAsYFF1tkU0gku1fElm7sjMp4A1NEIrSYPGQW1zjYhjgLcB9wNHZeYGaAQYOLI622RgbdPF1lXL9r+uiyJiRUSsOIS5JamlDevtGSNiNPBd4JLM3BrR5ZowQFcnvOZlf2YuBBZW1+1mAUkdpVdrrhFxGI2w3pSZ36sWb6y2x+7dLrupWr4OmNJ08aOB9f0zriS1h97sLRDAN4CVmfnVppOWAnOr43OB7zctnxMRwyPiWGAqsLz/Rpak1tebvQXeBfwT8CtgT7X4ChrbXW8B3gg8A5ybmc9Vl7kSuBDYRWMzwj/2cBtuFpDUlrrbW6DHuA4E4yqpXR3yrliSpINnXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFdBjXCNiSkTcHRErI+KxiPhMtfyqiPhtRDxUHT7cdJkFEbEmIlZFxFkl74AktaLIzAOfIWIiMDEzfxERY4AHgdnAecALmfnX+53/ROBm4BRgEnAnMC0zdx/gNg48hCS1qMyMrpb3uOaamRsy8xfV8W3ASmDyAS4yC1iSmTsy8ylgDY3QStKgcVDbXCPiGOBtwP3Vok9HxCMRcWNEjKuWTQbWNl1sHV3EOCIuiogVEbHi4MeWpNbW67hGxGjgu8AlmbkV+DpwPDAd2AB8Ze9Zu7j4a172Z+bCzJyRmTMOdmhJanW9imtEHEYjrDdl5vcAMnNjZu7OzD3ADfzxpf86YErTxY8G1vffyJLU+nqzt0AA3wBWZuZXm5ZPbDrbOcCj1fGlwJyIGB4RxwJTgeX9N7Iktb5hvTjP6cCfAb+KiIeqZVcA50fEdBov+Z8GPgWQmY9FxC3A48Au4OID7SkgSZ2ox12xBmQId8WS1KYOeVcsSdLBM66SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUYV0kqwLhKUgHGVZIKMK6SVIBxlaQCjKskFWBcJakA4ypJBRhXSSrAuEpSAcZVkgowrpJUgHGVpAKMqyQVYFwlqQDjKkkFGFdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBVgXCWpAOMqSQUMq3uAymZge/VzsJuAj4OPQYOPQ+s/Bm/q7oTIzIEcpFsRsSIzZ9Q9R918HHwM9vJxaO/HwM0CklSAcZWkAloprgvrHqBF+Dj4GOzl49DGj0HLbHOVpE7SSmuuktQxjKskFVB7XCPi7IhYFRFrImJ+3fMMpIh4OiJ+FREPRcSKatn4iLgjIlZXP8fVPWd/i4gbI2JTRDzatKzb+x0RC6rnx6qIOKueqftXN4/BVRHx2+r58FBEfLjptI57DAAiYkpE3B0RKyPisYj4TLW8/Z8PmVnbARgK/Bo4DjgceBg4sc6ZBvj+Pw1M2G/ZXwHzq+PzgWvrnrPA/T4TeDvwaE/3Gzixel4MB46tni9D674PhR6Dq4D/1MV5O/IxqO7bRODt1fExwBPV/W3750Pda66nAGsy88nM3AksAWbVPFPdZgGLquOLgNn1jVJGZt4DPLff4u7u9yxgSWbuyMyngDU0njdtrZvHoDsd+RgAZOaGzPxFdXwbsBKYTAc8H+qO62RgbdPv66plg0UCt0fEgxFxUbXsqMzcAI0nHnBkbdMNrO7u92B7jnw6Ih6pNhvsfSk8KB6DiDgGeBtwPx3wfKg7rtHFssG0b9jpmfl24EPAxRFxZt0DtaDB9Bz5OnA8MB3YAHylWt7xj0FEjAa+C1ySmVsPdNYulrXkY1F3XNcBU5p+PxpYX9MsAy4z11c/NwG30nh5szEiJgJUPzfVN+GA6u5+D5rnSGZuzMzdmbkHuIE/vtzt6McgIg6jEdabMvN71eK2fz7UHdcHgKkRcWxEHA7MAZbWPNOAiIhRETFm73FgJvAojfs/tzrbXOD79Uw44Lq730uBORExPCKOBaYCy2uYr7i9MamcQ+P5AB38GEREAN8AVmbmV5tOav/nQ93vqAEfpvEO4a+BK+ueZwDv93E03vV8GHhs730H/jlwF7C6+jm+7lkL3PebabzsfYXGmsgnD3S/gSur58cq4EN1z1/wMfgW8CvgERoRmdjJj0F1v95F42X9I8BD1eHDnfB88OOvklRA3ZsFJKkjGVdJKsC4SlIBxlWSCjCuklSAcZWkAoyrJBXw/wF4rwmBW83i7AAAAABJRU5ErkJggg==\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Actual Spleen')\n",
- "plt.imshow(test_labelspre[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Actual spleen"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 20,
- "id": "76cd38e6",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 20,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Pretrained CalculatedSpleen')\n",
- "plt.imshow(test_outputspre[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Pretrained model spleen"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 21,
- "id": "65c68242",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 21,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Differences Between Actual and Model')\n",
- "pretraineddif = test_labelspre[0].cpu().numpy()[1][:,:,200] - test_outputspre[0].cpu().numpy()[1][:,:,200]\n",
- "plt.imshow(pretraineddif, cmap='Greys_r') #Differences"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2f60e5b5",
- "metadata": {},
- "source": [
- "#### Using just the pretrained model, it appears we are performing pretty well\n",
- " - We can now continue to train with our data using the NVIDIA models initial weights"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c3e40010",
- "metadata": {},
- "source": [
- "## Training\n",
- "#### Without a GPU, training can take a while\n",
- "#### Recommend skipping next three cells and load in model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 22,
- "id": "a8ad6aee",
- "metadata": {},
- "outputs": [],
- "source": [
- "loss_function = DiceFocalLoss(to_onehot_y=True, softmax=True)\n",
- "optimizer = torch.optim.Adam(model.parameters(), 5e-4)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 23,
- "id": "d91d340c",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "----------\n",
- "epoch 1/25\n",
- "1/16, train_loss: 0.8680\n",
- "2/16, train_loss: 0.3699\n",
- "3/16, train_loss: 0.3849\n",
- "4/16, train_loss: 0.1306\n",
- "5/16, train_loss: 0.2781\n",
- "6/16, train_loss: 0.3628\n",
- "7/16, train_loss: 0.3609\n",
- "8/16, train_loss: 0.1828\n",
- "9/16, train_loss: 0.1493\n",
- "10/16, train_loss: 0.5063\n",
- "11/16, train_loss: 0.2929\n",
- "12/16, train_loss: 0.2826\n",
- "13/16, train_loss: 0.2017\n",
- "14/16, train_loss: 0.2591\n",
- "15/16, train_loss: 0.2568\n",
- "16/16, train_loss: 0.2385\n",
- "epoch 1 average loss: 0.3203\n",
- "----------\n",
- "epoch 2/25\n",
- "1/16, train_loss: 0.3457\n",
- "2/16, train_loss: 0.2234\n",
- "3/16, train_loss: 0.3443\n",
- "4/16, train_loss: 0.0816\n",
- "5/16, train_loss: 0.2259\n",
- "6/16, train_loss: 0.1580\n",
- "7/16, train_loss: 0.2593\n",
- "8/16, train_loss: 0.1651\n",
- "9/16, train_loss: 0.1124\n",
- "10/16, train_loss: 0.4822\n",
- "11/16, train_loss: 0.2900\n",
- "12/16, train_loss: 0.2571\n",
- "13/16, train_loss: 0.1799\n",
- "14/16, train_loss: 0.1984\n",
- "15/16, train_loss: 0.2286\n",
- "16/16, train_loss: 0.2216\n",
- "epoch 2 average loss: 0.2359\n",
- "saved new best metric model\n",
- "current epoch: 2 current mean dice: 0.8615\n",
- "best mean dice: 0.8615 at epoch: 2\n",
- "----------\n",
- "epoch 3/25\n",
- "1/16, train_loss: 0.3400\n",
- "2/16, train_loss: 0.2297\n",
- "3/16, train_loss: 0.3453\n",
- "4/16, train_loss: 0.0822\n",
- "5/16, train_loss: 0.2285\n",
- "6/16, train_loss: 0.1213\n",
- "7/16, train_loss: 0.2370\n",
- "8/16, train_loss: 0.1607\n",
- "9/16, train_loss: 0.1065\n",
- "10/16, train_loss: 0.4543\n",
- "11/16, train_loss: 0.2848\n",
- "12/16, train_loss: 0.2848\n",
- "13/16, train_loss: 0.1763\n",
- "14/16, train_loss: 0.1748\n",
- "15/16, train_loss: 0.4361\n",
- "16/16, train_loss: 0.2234\n",
- "epoch 3 average loss: 0.2429\n",
- "----------\n",
- "epoch 4/25\n",
- "1/16, train_loss: 0.3328\n",
- "2/16, train_loss: 0.2447\n",
- "3/16, train_loss: 0.3436\n",
- "4/16, train_loss: 0.0723\n",
- "5/16, train_loss: 0.2213\n",
- "6/16, train_loss: 0.1676\n",
- "7/16, train_loss: 0.2672\n",
- "8/16, train_loss: 0.2121\n",
- "9/16, train_loss: 0.1122\n",
- "10/16, train_loss: 0.5265\n",
- "11/16, train_loss: 0.2810\n",
- "12/16, train_loss: 0.2688\n",
- "13/16, train_loss: 0.1795\n",
- "14/16, train_loss: 0.1853\n",
- "15/16, train_loss: 0.2458\n",
- "16/16, train_loss: 0.2314\n",
- "epoch 4 average loss: 0.2433\n",
- "saved new best metric model\n",
- "current epoch: 4 current mean dice: 0.8744\n",
- "best mean dice: 0.8744 at epoch: 4\n",
- "----------\n",
- "epoch 5/25\n",
- "1/16, train_loss: 0.3378\n",
- "2/16, train_loss: 0.2047\n",
- "3/16, train_loss: 0.3350\n",
- "4/16, train_loss: 0.0583\n",
- "5/16, train_loss: 0.2161\n",
- "6/16, train_loss: 0.1008\n",
- "7/16, train_loss: 0.2325\n",
- "8/16, train_loss: 0.1629\n",
- "9/16, train_loss: 0.1037\n",
- "10/16, train_loss: 0.4499\n",
- "11/16, train_loss: 0.2763\n",
- "12/16, train_loss: 0.2321\n",
- "13/16, train_loss: 0.1702\n",
- "14/16, train_loss: 0.1652\n",
- "15/16, train_loss: 0.2206\n",
- "16/16, train_loss: 0.2169\n",
- "epoch 5 average loss: 0.2177\n",
- "----------\n",
- "epoch 6/25\n",
- "1/16, train_loss: 0.3303\n",
- "2/16, train_loss: 0.1888\n",
- "3/16, train_loss: 0.3331\n",
- "4/16, train_loss: 0.0535\n",
- "5/16, train_loss: 0.2149\n",
- "6/16, train_loss: 0.0962\n",
- "7/16, train_loss: 0.2267\n",
- "8/16, train_loss: 0.1555\n",
- "9/16, train_loss: 0.0995\n",
- "10/16, train_loss: 0.4476\n",
- "11/16, train_loss: 0.2751\n",
- "12/16, train_loss: 0.2215\n",
- "13/16, train_loss: 0.1644\n",
- "14/16, train_loss: 0.1603\n",
- "15/16, train_loss: 0.2159\n",
- "16/16, train_loss: 0.2141\n",
- "epoch 6 average loss: 0.2123\n",
- "saved new best metric model\n",
- "current epoch: 6 current mean dice: 0.8952\n",
- "best mean dice: 0.8952 at epoch: 6\n",
- "----------\n",
- "epoch 7/25\n",
- "1/16, train_loss: 0.3286\n",
- "2/16, train_loss: 0.1815\n",
- "3/16, train_loss: 0.3317\n",
- "4/16, train_loss: 0.0487\n",
- "5/16, train_loss: 0.2127\n",
- "6/16, train_loss: 0.0926\n",
- "7/16, train_loss: 0.2236\n",
- "8/16, train_loss: 0.1536\n",
- "9/16, train_loss: 0.0955\n",
- "10/16, train_loss: 0.4468\n",
- "11/16, train_loss: 0.2730\n",
- "12/16, train_loss: 0.2171\n",
- "13/16, train_loss: 0.1616\n",
- "14/16, train_loss: 0.1565\n",
- "15/16, train_loss: 0.2147\n",
- "16/16, train_loss: 0.2123\n",
- "epoch 7 average loss: 0.2094\n",
- "----------\n",
- "epoch 8/25\n",
- "1/16, train_loss: 0.3276\n",
- "2/16, train_loss: 0.1800\n",
- "3/16, train_loss: 0.3311\n",
- "4/16, train_loss: 0.0459\n",
- "5/16, train_loss: 0.2114\n",
- "6/16, train_loss: 0.0853\n",
- "7/16, train_loss: 0.2206\n",
- "8/16, train_loss: 0.1529\n",
- "9/16, train_loss: 0.0939\n",
- "10/16, train_loss: 0.4467\n",
- "11/16, train_loss: 0.2725\n",
- "12/16, train_loss: 0.2171\n",
- "13/16, train_loss: 0.1600\n",
- "14/16, train_loss: 0.1502\n",
- "15/16, train_loss: 0.2140\n",
- "16/16, train_loss: 0.2115\n",
- "epoch 8 average loss: 0.2075\n",
- "saved new best metric model\n",
- "current epoch: 8 current mean dice: 0.8957\n",
- "best mean dice: 0.8957 at epoch: 8\n",
- "----------\n",
- "epoch 9/25\n",
- "1/16, train_loss: 0.3275\n",
- "2/16, train_loss: 0.1822\n",
- "3/16, train_loss: 0.3309\n",
- "4/16, train_loss: 0.0455\n",
- "5/16, train_loss: 0.2110\n",
- "6/16, train_loss: 0.0818\n",
- "7/16, train_loss: 0.2194\n",
- "8/16, train_loss: 0.1520\n",
- "9/16, train_loss: 0.0917\n",
- "10/16, train_loss: 0.4467\n",
- "11/16, train_loss: 0.2723\n",
- "12/16, train_loss: 0.2165\n",
- "13/16, train_loss: 0.1593\n",
- "14/16, train_loss: 0.1236\n",
- "15/16, train_loss: 0.2136\n",
- "16/16, train_loss: 0.2107\n",
- "epoch 9 average loss: 0.2053\n",
- "----------\n",
- "epoch 10/25\n",
- "1/16, train_loss: 0.3271\n",
- "2/16, train_loss: 0.1726\n",
- "3/16, train_loss: 0.3308\n",
- "4/16, train_loss: 0.0439\n",
- "5/16, train_loss: 0.2106\n",
- "6/16, train_loss: 0.0886\n",
- "7/16, train_loss: 0.2209\n",
- "8/16, train_loss: 0.1518\n",
- "9/16, train_loss: 0.0860\n",
- "10/16, train_loss: 0.4452\n",
- "11/16, train_loss: 0.2715\n",
- "12/16, train_loss: 0.2150\n",
- "13/16, train_loss: 0.1589\n",
- "14/16, train_loss: 0.1150\n",
- "15/16, train_loss: 0.2142\n",
- "16/16, train_loss: 0.2095\n",
- "epoch 10 average loss: 0.2038\n",
- "saved new best metric model\n",
- "current epoch: 10 current mean dice: 0.8958\n",
- "best mean dice: 0.8958 at epoch: 10\n",
- "----------\n",
- "epoch 11/25\n",
- "1/16, train_loss: 0.3271\n",
- "2/16, train_loss: 0.1735\n",
- "3/16, train_loss: 0.3314\n",
- "4/16, train_loss: 0.0430\n",
- "5/16, train_loss: 0.2099\n",
- "6/16, train_loss: 0.0801\n",
- "7/16, train_loss: 0.2201\n",
- "8/16, train_loss: 0.1508\n",
- "9/16, train_loss: 0.0721\n",
- "10/16, train_loss: 0.4451\n",
- "11/16, train_loss: 0.2714\n",
- "12/16, train_loss: 0.2155\n",
- "13/16, train_loss: 0.1592\n",
- "14/16, train_loss: 0.1247\n",
- "15/16, train_loss: 0.2139\n",
- "16/16, train_loss: 0.2107\n",
- "epoch 11 average loss: 0.2030\n",
- "----------\n",
- "epoch 12/25\n",
- "1/16, train_loss: 0.3268\n",
- "2/16, train_loss: 0.1712\n",
- "3/16, train_loss: 0.3305\n",
- "4/16, train_loss: 0.0453\n",
- "5/16, train_loss: 0.2103\n",
- "6/16, train_loss: 0.0783\n",
- "7/16, train_loss: 0.2179\n",
- "8/16, train_loss: 0.1529\n",
- "9/16, train_loss: 0.0912\n",
- "10/16, train_loss: 0.4469\n",
- "11/16, train_loss: 0.2724\n",
- "12/16, train_loss: 0.2162\n",
- "13/16, train_loss: 0.1588\n",
- "14/16, train_loss: 0.1072\n",
- "15/16, train_loss: 0.2129\n",
- "16/16, train_loss: 0.2091\n",
- "epoch 12 average loss: 0.2030\n",
- "saved new best metric model\n",
- "current epoch: 12 current mean dice: 0.9008\n",
- "best mean dice: 0.9008 at epoch: 12\n",
- "----------\n",
- "epoch 13/25\n",
- "1/16, train_loss: 0.3266\n",
- "2/16, train_loss: 0.1666\n",
- "3/16, train_loss: 0.3304\n",
- "4/16, train_loss: 0.0419\n",
- "5/16, train_loss: 0.2105\n",
- "6/16, train_loss: 0.0826\n",
- "7/16, train_loss: 0.2195\n",
- "8/16, train_loss: 0.1506\n",
- "9/16, train_loss: 0.0553\n",
- "10/16, train_loss: 0.4447\n",
- "11/16, train_loss: 0.2715\n",
- "12/16, train_loss: 0.2125\n",
- "13/16, train_loss: 0.1575\n",
- "14/16, train_loss: 0.1083\n",
- "15/16, train_loss: 0.2135\n",
- "16/16, train_loss: 0.2085\n",
- "epoch 13 average loss: 0.2000\n",
- "----------\n",
- "epoch 14/25\n",
- "1/16, train_loss: 0.3270\n",
- "2/16, train_loss: 0.1647\n",
- "3/16, train_loss: 0.3316\n",
- "4/16, train_loss: 0.0405\n",
- "5/16, train_loss: 0.2091\n",
- "6/16, train_loss: 0.0686\n",
- "7/16, train_loss: 0.2185\n",
- "8/16, train_loss: 0.1499\n",
- "9/16, train_loss: 0.0482\n",
- "10/16, train_loss: 0.4443\n",
- "11/16, train_loss: 0.2708\n",
- "12/16, train_loss: 0.2106\n",
- "13/16, train_loss: 0.1568\n",
- "14/16, train_loss: 0.1043\n",
- "15/16, train_loss: 0.2121\n",
- "16/16, train_loss: 0.2079\n",
- "epoch 14 average loss: 0.1978\n",
- "saved new best metric model\n",
- "current epoch: 14 current mean dice: 0.9015\n",
- "best mean dice: 0.9015 at epoch: 14\n",
- "----------\n",
- "epoch 15/25\n",
- "1/16, train_loss: 0.3259\n",
- "2/16, train_loss: 0.1630\n",
- "3/16, train_loss: 0.3303\n",
- "4/16, train_loss: 0.0399\n",
- "5/16, train_loss: 0.2085\n",
- "6/16, train_loss: 0.0579\n",
- "7/16, train_loss: 0.2165\n",
- "8/16, train_loss: 0.1509\n",
- "9/16, train_loss: 0.0487\n",
- "10/16, train_loss: 0.4449\n",
- "11/16, train_loss: 0.2704\n",
- "12/16, train_loss: 0.2090\n",
- "13/16, train_loss: 0.1557\n",
- "14/16, train_loss: 0.1021\n",
- "15/16, train_loss: 0.2118\n",
- "16/16, train_loss: 0.2084\n",
- "epoch 15 average loss: 0.1965\n",
- "----------\n",
- "epoch 16/25\n",
- "1/16, train_loss: 0.3258\n",
- "2/16, train_loss: 0.1620\n",
- "3/16, train_loss: 0.3307\n",
- "4/16, train_loss: 0.0394\n",
- "5/16, train_loss: 0.2086\n",
- "6/16, train_loss: 0.0699\n",
- "7/16, train_loss: 0.2170\n",
- "8/16, train_loss: 0.1516\n",
- "9/16, train_loss: 0.0540\n",
- "10/16, train_loss: 0.4444\n",
- "11/16, train_loss: 0.2698\n",
- "12/16, train_loss: 0.2102\n",
- "13/16, train_loss: 0.1548\n",
- "14/16, train_loss: 0.1016\n",
- "15/16, train_loss: 0.2114\n",
- "16/16, train_loss: 0.2078\n",
- "epoch 16 average loss: 0.1974\n",
- "current epoch: 16 current mean dice: 0.8994\n",
- "best mean dice: 0.9015 at epoch: 14\n",
- "----------\n",
- "epoch 17/25\n",
- "1/16, train_loss: 0.3255\n",
- "2/16, train_loss: 0.1636\n",
- "3/16, train_loss: 0.3300\n",
- "4/16, train_loss: 0.0399\n",
- "5/16, train_loss: 0.2085\n",
- "6/16, train_loss: 0.0483\n",
- "7/16, train_loss: 0.2150\n",
- "8/16, train_loss: 0.1506\n",
- "9/16, train_loss: 0.0446\n",
- "10/16, train_loss: 0.4445\n",
- "11/16, train_loss: 0.2692\n",
- "12/16, train_loss: 0.2077\n",
- "13/16, train_loss: 0.1515\n",
- "14/16, train_loss: 0.0980\n",
- "15/16, train_loss: 0.2110\n",
- "16/16, train_loss: 0.2076\n",
- "epoch 17 average loss: 0.1947\n",
- "----------\n",
- "epoch 18/25\n",
- "1/16, train_loss: 0.3255\n",
- "2/16, train_loss: 0.1614\n",
- "3/16, train_loss: 0.3297\n",
- "4/16, train_loss: 0.0381\n",
- "5/16, train_loss: 0.2081\n",
- "6/16, train_loss: 0.0422\n",
- "7/16, train_loss: 0.2152\n",
- "8/16, train_loss: 0.1485\n",
- "9/16, train_loss: 0.0415\n",
- "10/16, train_loss: 0.4442\n",
- "11/16, train_loss: 0.2690\n",
- "12/16, train_loss: 0.2070\n",
- "13/16, train_loss: 0.1515\n",
- "14/16, train_loss: 0.0980\n",
- "15/16, train_loss: 0.2112\n",
- "16/16, train_loss: 0.2068\n",
- "epoch 18 average loss: 0.1936\n",
- "current epoch: 18 current mean dice: 0.8991\n",
- "best mean dice: 0.9015 at epoch: 14\n",
- "----------\n",
- "epoch 19/25\n",
- "1/16, train_loss: 0.3254\n",
- "2/16, train_loss: 0.1635\n",
- "3/16, train_loss: 0.3297\n",
- "4/16, train_loss: 0.0372\n",
- "5/16, train_loss: 0.2078\n",
- "6/16, train_loss: 0.0424\n",
- "7/16, train_loss: 0.2145\n",
- "8/16, train_loss: 0.1483\n",
- "9/16, train_loss: 0.0402\n",
- "10/16, train_loss: 0.4436\n",
- "11/16, train_loss: 0.2695\n",
- "12/16, train_loss: 0.2076\n",
- "13/16, train_loss: 0.1514\n",
- "14/16, train_loss: 0.1009\n",
- "15/16, train_loss: 0.2116\n",
- "16/16, train_loss: 0.2071\n",
- "epoch 19 average loss: 0.1938\n",
- "----------\n",
- "epoch 20/25\n",
- "1/16, train_loss: 0.3256\n",
- "2/16, train_loss: 0.1616\n",
- "3/16, train_loss: 0.3302\n",
- "4/16, train_loss: 0.0376\n",
- "5/16, train_loss: 0.2080\n",
- "6/16, train_loss: 0.0756\n",
- "7/16, train_loss: 0.2150\n",
- "8/16, train_loss: 0.1476\n",
- "9/16, train_loss: 0.0400\n",
- "10/16, train_loss: 0.4440\n",
- "11/16, train_loss: 0.2686\n",
- "12/16, train_loss: 0.2071\n",
- "13/16, train_loss: 0.1512\n",
- "14/16, train_loss: 0.0990\n",
- "15/16, train_loss: 0.2103\n",
- "16/16, train_loss: 0.2066\n",
- "epoch 20 average loss: 0.1955\n",
- "current epoch: 20 current mean dice: 0.8984\n",
- "best mean dice: 0.9015 at epoch: 14\n",
- "----------\n",
- "epoch 21/25\n",
- "1/16, train_loss: 0.3253\n",
- "2/16, train_loss: 0.1599\n",
- "3/16, train_loss: 0.3295\n",
- "4/16, train_loss: 0.0370\n",
- "5/16, train_loss: 0.2074\n",
- "6/16, train_loss: 0.0587\n",
- "7/16, train_loss: 0.2138\n",
- "8/16, train_loss: 0.1483\n",
- "9/16, train_loss: 0.0479\n",
- "10/16, train_loss: 0.4449\n",
- "11/16, train_loss: 0.2684\n",
- "12/16, train_loss: 0.2082\n",
- "13/16, train_loss: 0.1520\n",
- "14/16, train_loss: 0.1122\n",
- "15/16, train_loss: 0.2110\n",
- "16/16, train_loss: 0.2088\n",
- "epoch 21 average loss: 0.1958\n",
- "----------\n",
- "epoch 22/25\n",
- "1/16, train_loss: 0.3258\n",
- "2/16, train_loss: 0.1628\n",
- "3/16, train_loss: 0.3298\n",
- "4/16, train_loss: 0.0395\n",
- "5/16, train_loss: 0.2082\n",
- "6/16, train_loss: 0.0614\n",
- "7/16, train_loss: 0.2181\n",
- "8/16, train_loss: 0.1566\n",
- "9/16, train_loss: 0.0650\n",
- "10/16, train_loss: 0.4442\n",
- "11/16, train_loss: 0.2693\n",
- "12/16, train_loss: 0.2118\n",
- "13/16, train_loss: 0.1532\n",
- "14/16, train_loss: 0.0998\n",
- "15/16, train_loss: 0.2121\n",
- "16/16, train_loss: 0.2076\n",
- "epoch 22 average loss: 0.1978\n",
- "saved new best metric model\n",
- "current epoch: 22 current mean dice: 0.9054\n",
- "best mean dice: 0.9054 at epoch: 22\n",
- "----------\n",
- "epoch 23/25\n",
- "1/16, train_loss: 0.3266\n",
- "2/16, train_loss: 0.1723\n",
- "3/16, train_loss: 0.3315\n",
- "4/16, train_loss: 0.0413\n",
- "5/16, train_loss: 0.2091\n",
- "6/16, train_loss: 0.0807\n",
- "7/16, train_loss: 0.2143\n",
- "8/16, train_loss: 0.1514\n",
- "9/16, train_loss: 0.0432\n",
- "10/16, train_loss: 0.4441\n",
- "11/16, train_loss: 0.2704\n",
- "12/16, train_loss: 0.2081\n",
- "13/16, train_loss: 0.1532\n",
- "14/16, train_loss: 0.0983\n",
- "15/16, train_loss: 0.2106\n",
- "16/16, train_loss: 0.2072\n",
- "epoch 23 average loss: 0.1976\n",
- "----------\n",
- "epoch 24/25\n",
- "1/16, train_loss: 0.3257\n",
- "2/16, train_loss: 0.1711\n",
- "3/16, train_loss: 0.3307\n",
- "4/16, train_loss: 0.0376\n",
- "5/16, train_loss: 0.2077\n",
- "6/16, train_loss: 0.0705\n",
- "7/16, train_loss: 0.2141\n",
- "8/16, train_loss: 0.1482\n",
- "9/16, train_loss: 0.0392\n",
- "10/16, train_loss: 0.4439\n",
- "11/16, train_loss: 0.2688\n",
- "12/16, train_loss: 0.2070\n",
- "13/16, train_loss: 0.1512\n",
- "14/16, train_loss: 0.0969\n",
- "15/16, train_loss: 0.2098\n",
- "16/16, train_loss: 0.2062\n",
- "epoch 24 average loss: 0.1955\n",
- "saved new best metric model\n",
- "current epoch: 24 current mean dice: 0.9060\n",
- "best mean dice: 0.9060 at epoch: 24\n",
- "----------\n",
- "epoch 25/25\n",
- "1/16, train_loss: 0.3251\n",
- "2/16, train_loss: 0.1621\n",
- "3/16, train_loss: 0.3298\n",
- "4/16, train_loss: 0.0367\n",
- "5/16, train_loss: 0.2075\n",
- "6/16, train_loss: 0.0430\n",
- "7/16, train_loss: 0.2132\n",
- "8/16, train_loss: 0.1490\n",
- "9/16, train_loss: 0.0390\n",
- "10/16, train_loss: 0.4432\n",
- "11/16, train_loss: 0.2699\n",
- "12/16, train_loss: 0.2080\n",
- "13/16, train_loss: 0.1520\n",
- "14/16, train_loss: 0.0959\n",
- "15/16, train_loss: 0.2101\n",
- "16/16, train_loss: 0.2057\n",
- "epoch 25 average loss: 0.1931\n",
- "train completed, best_metric: 0.9060 at epoch: 24\n"
- ]
- }
- ],
- "source": [
- "max_epochs = 25\n",
- "val_interval = 2\n",
- "num_classes = 2\n",
- "best_metric = -1\n",
- "best_metric_epoch = -1\n",
- "epoch_loss_values = []\n",
- "metric_values = []\n",
- "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
- "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
- "dice_metric = DiceMetric(include_background=False, reduction=\"mean\", get_not_nans=False)\n",
- "\n",
- "for epoch in range(max_epochs):\n",
- " print(\"-\" * 10)\n",
- " print(f\"epoch {epoch + 1}/{max_epochs}\")\n",
- " model.train()\n",
- " epoch_loss = 0\n",
- " step = 0\n",
- " set_determinism(seed=42)\n",
- " for batch_data in train_loader:\n",
- " step += 1\n",
- " inputs, labels = (\n",
- " batch_data[\"image\"].to(device),\n",
- " batch_data[\"label\"].to(device),\n",
- " )\n",
- " optimizer.zero_grad()\n",
- " outputs = model(inputs)\n",
- " loss = loss_function(outputs, labels)\n",
- " loss.backward()\n",
- " optimizer.step()\n",
- " epoch_loss += loss.item()\n",
- " print(\n",
- " f\"{step}/{len(train_ds) // train_loader.batch_size}, \"\n",
- " f\"train_loss: {loss.item():.4f}\")\n",
- " epoch_loss /= step\n",
- " epoch_loss_values.append(epoch_loss)\n",
- " print(f\"epoch {epoch + 1} average loss: {epoch_loss:.4f}\")\n",
- "\n",
- " if (epoch + 1) % val_interval == 0:\n",
- " model.eval()\n",
- " with torch.no_grad():\n",
- " set_determinism(seed=42)\n",
- " for val_data in DataLoader(val_ds, batch_size=1, num_workers=2):\n",
- " val_inputs, val_labels = (\n",
- " val_data[\"image\"].to(device),\n",
- " val_data[\"label\"].to(device),\n",
- " )\n",
- " roi_size = (160, 160, 160)\n",
- " sw_batch_size = 4\n",
- " val_outputs = sliding_window_inference(\n",
- " val_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
- " val_outputs = [post_pred(i) for i in decollate_batch(val_outputs)]\n",
- " val_labels = [post_label(i) for i in decollate_batch(val_labels)]\n",
- " dice_metric(y_pred=val_outputs, y=val_labels)\n",
- " metric = dice_metric.aggregate().item()\n",
- " dice_metric.reset()\n",
- " metric_values.append(metric)\n",
- " if metric > best_metric:\n",
- " best_metric = metric\n",
- " best_metric_epoch = epoch + 1\n",
- " torch.save(model.state_dict(), os.path.join(\n",
- " root_dir, \"Spleen_best_metric_model_pretrained.pth\"))\n",
- " print(\"saved new best metric model\")\n",
- " print(\n",
- " f\"current epoch: {epoch + 1} current mean dice: {metric:.4f}\"\n",
- " f\"\\nbest mean dice: {best_metric:.4f} \"\n",
- " f\"at epoch: {best_metric_epoch}\"\n",
- " )\n",
- "print(\n",
- " f\"train completed, best_metric: {best_metric:.4f} \"\n",
- " f\"at epoch: {best_metric_epoch}\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 24,
- "id": "5cf1fd04",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "plt.figure(\"train\", (12, 6))\n",
- "plt.subplot(1, 2, 1)\n",
- "plt.title(\"Epoch Average Loss\")\n",
- "x = [i + 1 for i in range(len(epoch_loss_values))]\n",
- "y = epoch_loss_values\n",
- "plt.xlabel(\"epoch\")\n",
- "plt.ylim([0.1, 0.7])\n",
- "plt.plot(x, y)\n",
- "plt.subplot(1, 2, 2)\n",
- "plt.title(\"Val Mean Dice\")\n",
- "x = [val_interval * (i + 1) for i in range(len(metric_values))]\n",
- "y = metric_values\n",
- "plt.xlabel(\"epoch\")\n",
- "plt.ylim([0, 1.0])\n",
- "plt.plot(x, y)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "4ff0035d",
- "metadata": {},
- "source": [
- "#### The model shows that it has improved fairly quickly over just 25 epochs"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "0499fa93",
- "metadata": {},
- "source": [
- "## Inference\n",
- "#### Without GPU skip to here to load previously trained best model (without a gpu the training will take a while)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 25,
- "id": "29441405",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 25,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "model.load_state_dict(torch.load('monai_data/best_metric_model_pretrained.pth'))"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "fab5b4b9",
- "metadata": {},
- "source": [
- "#### With the model loaded let's see if much has changed for our example image"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 26,
- "id": "94615f38",
- "metadata": {},
- "outputs": [],
- "source": [
- "num_classes = 2\n",
- "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
- "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
- "model.eval()\n",
- "with torch.no_grad():\n",
- " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
- " test_inputs, test_labels = (\n",
- " data[\"image\"].to(device),\n",
- " data[\"label\"].to(device),\n",
- " )\n",
- " roi_size = (160, 160, 160)\n",
- " sw_batch_size = 4\n",
- " test_outputs = sliding_window_inference(\n",
- " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
- " test_outputsSpl = [post_pred(i) for i in decollate_batch(test_outputs)]\n",
- " test_labelsSpl = [post_label(i) for i in decollate_batch(test_labels)]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 27,
- "id": "a3f78dd4",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 27,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Trained Calculated Spleen')\n",
- "plt.imshow(test_outputsSpl[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Pretrained model spleen"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 28,
- "id": "a67f89f2",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 28,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Differences Between Actual and Model')\n",
- "traineddif = test_labelsSpl[0].cpu().numpy()[1][:,:,200] - test_outputsSpl[0].cpu().numpy()[1][:,:,200]\n",
- "plt.imshow(traineddif, cmap='Greys_r') #Differences"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 29,
- "id": "382c7285",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 29,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Differences Between The Models')\n",
- "modelsdif = test_outputspre[0].cpu().numpy()[1][:,:,200] - test_outputsSpl[0].cpu().numpy()[1][:,:,200]\n",
- "plt.imshow(traineddif, cmap='Greys_r') #Differences"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6606bce2",
- "metadata": {},
- "source": [
- "#### We see not much has changed, which is a good sign for how well the NVIDIA model performs out of the box."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "5cfd20c6",
- "metadata": {},
- "source": [
- "#### Here is the final image of our Spleen"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 30,
- "id": "91e83d40",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 30,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,:,200] == 0, test_outputsSpl[0].cpu().numpy()[1][:,:,200])\n",
- "fig = plt.figure(frameon=False, figsize=(10,10))\n",
- "plt.imshow(np.rot90(test_ds[0]['image'][0][:,:,200]), cmap='Greys_r')\n",
- "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=1.0)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6030d210",
- "metadata": {},
- "source": [
- "#### Feel free to play around in this notebook or download it and use it where a GPU is accessible"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "896388a1",
- "metadata": {},
- "source": [
- "## Additional Exercise: Use liver segmentation in addition to spleen\n",
- " - Just need to load liver segmentation from NVIDIA\n",
- " - While we can't train this model, since we don't have training data, we can use it as a rough estimate"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 31,
- "id": "657e44a0",
- "metadata": {},
- "outputs": [],
- "source": [
- "mmarliver = {\n",
- " RemoteMMARKeys.ID: \"clara_pt_liver_and_tumor_ct_segmentation_1\",\n",
- " RemoteMMARKeys.NAME: \"clara_pt_liver_and_tumor_ct_segmentation\",\n",
- " RemoteMMARKeys.FILE_TYPE: \"zip\",\n",
- " RemoteMMARKeys.HASH_TYPE: \"md5\",\n",
- " RemoteMMARKeys.HASH_VAL: None,\n",
- " RemoteMMARKeys.MODEL_FILE: os.path.join(\"models\", \"model.pt\"),\n",
- " RemoteMMARKeys.CONFIG_FILE: os.path.join(\"config\", \"config_train.json\"),\n",
- " RemoteMMARKeys.VERSION: 1,\n",
- "}"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 32,
- "id": "a6fb0da7",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "2022-04-27 15:06:54,404 - INFO - Expected md5 is None, skip md5 check for file monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip.\n",
- "2022-04-27 15:06:54,405 - INFO - File exists: monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip, skipped downloading.\n",
- "2022-04-27 15:06:54,425 - INFO - Non-empty folder exists in monai_data/clara_pt_liver_and_tumor_ct_segmentation, skipped extracting.\n",
- "2022-04-27 15:06:54,426 - INFO - \n",
- "*** \"clara_pt_liver_and_tumor_ct_segmentation\" available at monai_data/clara_pt_liver_and_tumor_ct_segmentation.\n",
- "2022-04-27 15:06:54,889 - INFO - *** Model: \n",
- "2022-04-27 15:06:54,938 - INFO - *** Model params: {'dimensions': 3, 'in_channels': 1, 'out_channels': 3, 'channels': [16, 32, 64, 128, 256], 'strides': [2, 2, 2, 2], 'num_res_units': 2, 'norm': 'batch'}\n",
- "2022-04-27 15:06:54,950 - INFO - \n",
- "---\n",
- "2022-04-27 15:06:54,951 - INFO - For more information, please visit https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_liver_and_tumor_ct_segmentation\n",
- "\n"
- ]
- }
- ],
- "source": [
- " try: #MONAI=0.8\n",
- " unet_model = load_from_mmar(\n",
- " item = mmarliver['name'], \n",
- " mmar_dir=root_dir,\n",
- " map_location=device,\n",
- " version=mmarliver['version'],\n",
- " pretrained=True)\n",
- " except: #MONAI<0.8\n",
- " unet_model = load_from_mmar(\n",
- " mmarliver, \n",
- " mmar_dir=root_dir,\n",
- " map_location=device,\n",
- " pretrained=True)\n",
- " model = unet_model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 33,
- "id": "55034354",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "using a pretrained model.\n",
- "2022-04-27 15:06:55,931 - INFO - Expected md5 is None, skip md5 check for file monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip.\n",
- "2022-04-27 15:06:55,931 - INFO - File exists: monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip, skipped downloading.\n",
- "2022-04-27 15:06:55,932 - INFO - Non-empty folder exists in monai_data/clara_pt_liver_and_tumor_ct_segmentation, skipped extracting.\n",
- "2022-04-27 15:06:55,933 - INFO - \n",
- "*** \"clara_pt_liver_and_tumor_ct_segmentation\" available at monai_data/clara_pt_liver_and_tumor_ct_segmentation.\n",
- "2022-04-27 15:06:55,962 - INFO - *** Model: \n",
- "2022-04-27 15:06:56,010 - INFO - *** Model params: {'dimensions': 3, 'in_channels': 1, 'out_channels': 3, 'channels': [16, 32, 64, 128, 256], 'strides': [2, 2, 2, 2], 'num_res_units': 2, 'norm': 'batch'}\n",
- "2022-04-27 15:06:56,023 - INFO - \n",
- "---\n",
- "2022-04-27 15:06:56,024 - INFO - For more information, please visit https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_liver_and_tumor_ct_segmentation\n",
- "\n"
- ]
- }
- ],
- "source": [
- "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
- "\n",
- "print(\"using a pretrained model.\")\n",
- "try: #MONAI=0.8\n",
- " unet_model = load_from_mmar(\n",
- " item = mmarliver['name'], \n",
- " mmar_dir=root_dir,\n",
- " map_location=device,\n",
- " version=mmarliver['version'],\n",
- " pretrained=True)\n",
- "except: #MONAI<0.8\n",
- " unet_model = load_from_mmar(\n",
- " mmarliver, \n",
- " mmar_dir=root_dir,\n",
- " map_location=device,\n",
- " pretrained=True)\n",
- "model = unet_model.to(device)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 34,
- "id": "a79c1731",
- "metadata": {},
- "outputs": [],
- "source": [
- "num_classesP=3\n",
- "num_classesL=2\n",
- "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classesP)])\n",
- "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classesL)])\n",
- "model.eval()\n",
- "with torch.no_grad():\n",
- " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
- " test_inputs, test_labels = (\n",
- " data[\"image\"].to(device),\n",
- " data[\"label\"].to(device),\n",
- " )\n",
- " roi_size = (160, 160, 160)\n",
- " sw_batch_size = 4\n",
- " test_outputs = sliding_window_inference(\n",
- " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
- " test_outputsliv = [post_pred(i) for i in decollate_batch(test_outputs)] # Decollate our results\n",
- " test_labelsliv = [post_label(i) for i in decollate_batch(test_labels)]"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 35,
- "id": "c0956706",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 35,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "sliceval = 215\n",
- "maskedliv = np.ma.masked_where(test_outputsliv[0].cpu().numpy()[1][:,:,sliceval] == 0, test_outputsliv[0].cpu().numpy()[1][:,:,sliceval])\n",
- "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,:,sliceval] == 0, test_outputsSpl[0].cpu().numpy()[1][:,:,sliceval])\n",
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Pretrained Calculated Liver and spleen')\n",
- "plt.imshow(np.rot90(test_ds[0]['image'][0][:,:,sliceval]), cmap='Greys_r')\n",
- "plt.imshow(np.rot90(maskedliv), cmap='cividis', alpha=0.75)\n",
- "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=0.75)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 36,
- "id": "5bdfdbe9",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- ""
- ]
- },
- "execution_count": 36,
- "metadata": {},
- "output_type": "execute_result"
- },
- {
- "data": {
- "image/png": "\n",
- "text/plain": [
- ""
- ]
- },
- "metadata": {
- "needs_background": "light"
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "sliceval = 110\n",
- "maskedliv = np.ma.masked_where(test_outputsliv[0].cpu().numpy()[1][:,sliceval,:] == 0, test_outputsliv[0].cpu().numpy()[1][:,sliceval,:])\n",
- "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,sliceval,:] == 0, test_outputsSpl[0].cpu().numpy()[1][:,sliceval,:])\n",
- "fig = plt.figure(frameon=False, figsize=(7,7))\n",
- "plt.title('Pretrained Calculated Liver and Spleen')\n",
- "plt.imshow(np.rot90(test_ds[0]['image'][0][:,sliceval,:]), cmap='Greys_r')\n",
- "plt.imshow(np.rot90(maskedliv), cmap='cividis', alpha=0.75)\n",
- "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=0.75)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "af1169b6",
- "metadata": {},
- "source": [
- "#### Continue including more models found at the NGC Catalog: \n",
- "#### https://catalog.ngc.nvidia.com/models\n",
- "##### - Recommend filtering by 'CT' "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "0dce4d55",
- "metadata": {},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "e17e6228",
- "metadata": {},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "7034135a",
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "environment": {
- "name": "pytorch-gpu.1-9.m75",
- "type": "gcloud",
- "uri": "gcr.io/deeplearning-platform-release/pytorch-gpu.1-9:m75"
- },
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.10"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/tutorials/notebooks/pangolin/pangolin.yaml b/tutorials/notebooks/pangolin/pangolin.yaml
deleted file mode 100644
index 4a235b0..0000000
--- a/tutorials/notebooks/pangolin/pangolin.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-name: pangolin
-channels:
- - bioconda
- - conda-forge
- - defaults
- - eaton-lab
-
-dependencies:
- - sra-tools
- - ipyrad
- - toytree
- - pangolin
- - iqtree
diff --git a/tutorials/notebooks/pangolin/pangolin_pipeline.ipynb b/tutorials/notebooks/pangolin/pangolin_pipeline.ipynb
deleted file mode 100644
index 862c45a..0000000
--- a/tutorials/notebooks/pangolin/pangolin_pipeline.ipynb
+++ /dev/null
@@ -1,1333 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "31e8c3cd",
- "metadata": {},
- "source": [
- "# Pangolin SARS-CoV-2 Pipeline Notebook "
- ]
- },
- {
- "cell_type": "markdown",
- "id": "56a29212",
- "metadata": {},
- "source": [
- "We are going to run a standard covid bioinformatics pipeline using the Pangolin workflow. https://cov-lineages.org/resources/pangolin/usage.html"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "03541941",
- "metadata": {},
- "source": [
- "### Required software"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "f994b990",
- "metadata": {},
- "outputs": [],
- "source": [
- "#change this depending on how many threads are available in your notebook\n",
- "CPU=4"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "a19b662e",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "PREFIX=/home/ec2-user/mambaforge\n",
- "Unpacking payload ...\n",
- "Extracting \"libmambapy-0.19.0-py39h8bfa403_0.tar.bz2\"\n",
- "Extracting \"zstd-1.5.0-ha95c52a_0.tar.bz2\"\n",
- "Extracting \"readline-8.1-h46c0cb4_0.tar.bz2\"\n",
- "Extracting \"yaml-cpp-0.6.3-he1b5a44_4.tar.bz2\"\n",
- "Extracting \"libgcc-ng-11.2.0-h1d223b6_11.tar.bz2\"\n",
- "Extracting \"cffi-1.15.0-py39h4bc2ebd_0.tar.bz2\"\n",
- "Extracting \"wheel-0.37.0-pyhd8ed1ab_1.tar.bz2\"\n",
- "Extracting \"colorama-0.4.4-pyh9f0ad1d_0.tar.bz2\"\n",
- "Extracting \"_openmp_mutex-4.5-1_gnu.tar.bz2\"\n",
- "Extracting \"tzdata-2021e-he74cb21_0.tar.bz2\"\n",
- "Extracting \"reproc-cpp-14.2.3-h9c3ff4c_0.tar.bz2\"\n",
- "Extracting \"lz4-c-1.9.3-h9c3ff4c_1.tar.bz2\"\n",
- "Extracting \"libarchive-3.5.2-hccf745f_1.tar.bz2\"\n",
- "Extracting \"libedit-3.1.20191231-he28a2e2_2.tar.bz2\"\n",
- "Extracting \"icu-69.1-h9c3ff4c_0.tar.bz2\"\n",
- "Extracting \"ld_impl_linux-64-2.36.1-hea4e1c9_2.tar.bz2\"\n",
- "Extracting \"libzlib-1.2.11-h36c2ea0_1013.tar.bz2\"\n",
- "Extracting \"tqdm-4.62.3-pyhd8ed1ab_0.tar.bz2\"\n",
- "Extracting \"ncurses-6.2-h58526e2_4.tar.bz2\"\n",
- "Extracting \"openssl-1.1.1l-h7f98852_0.tar.bz2\"\n",
- "Extracting \"cryptography-36.0.0-py39h95dcef6_0.tar.bz2\"\n",
- "Extracting \"zlib-1.2.11-h36c2ea0_1013.tar.bz2\"\n",
- "Extracting \"lzo-2.10-h516909a_1000.tar.bz2\"\n",
- "Extracting \"c-ares-1.18.1-h7f98852_0.tar.bz2\"\n",
- "Extracting \"pyopenssl-21.0.0-pyhd8ed1ab_0.tar.bz2\"\n",
- "Extracting \"conda-package-handling-1.7.3-py39h3811e60_1.tar.bz2\"\n",
- "Extracting \"idna-3.1-pyhd3deb0d_0.tar.bz2\"\n",
- "Extracting \"libmamba-0.19.0-h3985d26_0.tar.bz2\"\n",
- "Extracting \"reproc-14.2.3-h7f98852_0.tar.bz2\"\n",
- "Extracting \"pip-21.3.1-pyhd8ed1ab_0.tar.bz2\"\n",
- "Extracting \"tk-8.6.11-h27826a3_1.tar.bz2\"\n",
- "Extracting \"conda-4.11.0-py39hf3d152e_0.tar.bz2\"\n",
- "Extracting \"requests-2.26.0-pyhd8ed1ab_1.tar.bz2\"\n",
- "Extracting \"_libgcc_mutex-0.1-conda_forge.tar.bz2\"\n",
- "Extracting \"brotlipy-0.7.0-py39h3811e60_1003.tar.bz2\"\n",
- "Extracting \"python-3.9.7-hb7a2778_3_cpython.tar.bz2\"\n",
- "Extracting \"yaml-0.2.5-h516909a_0.tar.bz2\"\n",
- "Extracting \"bzip2-1.0.8-h7f98852_4.tar.bz2\"\n",
- "Extracting \"libffi-3.4.2-h7f98852_5.tar.bz2\"\n",
- "Extracting \"krb5-1.19.2-hcc1bbae_3.tar.bz2\"\n",
- "Extracting \"charset-normalizer-2.0.9-pyhd8ed1ab_0.tar.bz2\"\n",
- "Extracting \"pysocks-1.7.1-py39hf3d152e_4.tar.bz2\"\n",
- "Extracting \"libgomp-11.2.0-h1d223b6_11.tar.bz2\"\n",
- "Extracting \"pybind11-abi-4-hd8ed1ab_3.tar.bz2\"\n",
- "Extracting \"python_abi-3.9-2_cp39.tar.bz2\"\n",
- "Extracting \"libiconv-1.16-h516909a_0.tar.bz2\"\n",
- "Extracting \"libcurl-7.80.0-h2574ce0_0.tar.bz2\"\n",
- "Extracting \"libxml2-2.9.12-h885dcf4_1.tar.bz2\"\n",
- "Extracting \"pycosat-0.6.3-py39h3811e60_1009.tar.bz2\"\n",
- "Extracting \"certifi-2021.10.8-py39hf3d152e_1.tar.bz2\"\n",
- "Extracting \"libssh2-1.10.0-ha56f1ee_2.tar.bz2\"\n",
- "Extracting \"libnghttp2-1.43.0-h812cca2_1.tar.bz2\"\n",
- "Extracting \"mamba-0.19.0-py39hfa8f2c8_0.tar.bz2\"\n",
- "Extracting \"ruamel_yaml-0.15.80-py39h3811e60_1006.tar.bz2\"\n",
- "Extracting \"xz-5.2.5-h516909a_1.tar.bz2\"\n",
- "Extracting \"setuptools-59.4.0-py39hf3d152e_0.tar.bz2\"\n",
- "Extracting \"six-1.16.0-pyh6c4a22f_0.tar.bz2\"\n",
- "Extracting \"urllib3-1.26.7-pyhd8ed1ab_0.tar.bz2\"\n",
- "Extracting \"libstdcxx-ng-11.2.0-he4da1e4_11.tar.bz2\"\n",
- "Extracting \"libsolv-0.7.19-h780b84a_5.tar.bz2\"\n",
- "Extracting \"pycparser-2.21-pyhd8ed1ab_0.tar.bz2\"\n",
- "Extracting \"ca-certificates-2021.10.8-ha878542_0.tar.bz2\"\n",
- "Extracting \"sqlite-3.37.0-h9cd32fc_0.tar.bz2\"\n",
- "Extracting \"libev-4.33-h516909a_1.tar.bz2\"\n",
- "\n",
- " __\n",
- " __ ______ ___ ____ _____ ___ / /_ ____ _\n",
- " / / / / __ `__ \\/ __ `/ __ `__ \\/ __ \\/ __ `/\n",
- " / /_/ / / / / / / /_/ / / / / / / /_/ / /_/ /\n",
- " / .___/_/ /_/ /_/\\__,_/_/ /_/ /_/_.___/\\__,_/\n",
- " /_/\n",
- "\n",
- "\n",
- "Transaction\n",
- "\n",
- " Prefix: /home/ec2-user/mambaforge\n",
- "\n",
- " Updating specs:\n",
- "\n",
- " - python==3.9.7=hb7a2778_3_cpython\n",
- " - _libgcc_mutex==0.1=conda_forge\n",
- " - ca-certificates==2021.10.8=ha878542_0\n",
- " - ld_impl_linux-64==2.36.1=hea4e1c9_2\n",
- " - libstdcxx-ng==11.2.0=he4da1e4_11\n",
- " - pybind11-abi==4=hd8ed1ab_3\n",
- " - tzdata==2021e=he74cb21_0\n",
- " - libgomp==11.2.0=h1d223b6_11\n",
- " - _openmp_mutex==4.5=1_gnu\n",
- " - libgcc-ng==11.2.0=h1d223b6_11\n",
- " - bzip2==1.0.8=h7f98852_4\n",
- " - c-ares==1.18.1=h7f98852_0\n",
- " - icu==69.1=h9c3ff4c_0\n",
- " - libev==4.33=h516909a_1\n",
- " - libffi==3.4.2=h7f98852_5\n",
- " - libiconv==1.16=h516909a_0\n",
- " - libzlib==1.2.11=h36c2ea0_1013\n",
- " - lz4-c==1.9.3=h9c3ff4c_1\n",
- " - lzo==2.10=h516909a_1000\n",
- " - ncurses==6.2=h58526e2_4\n",
- " - openssl==1.1.1l=h7f98852_0\n",
- " - reproc==14.2.3=h7f98852_0\n",
- " - xz==5.2.5=h516909a_1\n",
- " - yaml==0.2.5=h516909a_0\n",
- " - yaml-cpp==0.6.3=he1b5a44_4\n",
- " - libedit==3.1.20191231=he28a2e2_2\n",
- " - readline==8.1=h46c0cb4_0\n",
- " - reproc-cpp==14.2.3=h9c3ff4c_0\n",
- " - zlib==1.2.11=h36c2ea0_1013\n",
- " - libnghttp2==1.43.0=h812cca2_1\n",
- " - libsolv==0.7.19=h780b84a_5\n",
- " - libssh2==1.10.0=ha56f1ee_2\n",
- " - libxml2==2.9.12=h885dcf4_1\n",
- " - sqlite==3.37.0=h9cd32fc_0\n",
- " - tk==8.6.11=h27826a3_1\n",
- " - zstd==1.5.0=ha95c52a_0\n",
- " - krb5==1.19.2=hcc1bbae_3\n",
- " - libarchive==3.5.2=hccf745f_1\n",
- " - charset-normalizer==2.0.9=pyhd8ed1ab_0\n",
- " - colorama==0.4.4=pyh9f0ad1d_0\n",
- " - idna==3.1=pyhd3deb0d_0\n",
- " - libcurl==7.80.0=h2574ce0_0\n",
- " - pycparser==2.21=pyhd8ed1ab_0\n",
- " - python_abi==3.9=2_cp39\n",
- " - six==1.16.0=pyh6c4a22f_0\n",
- " - wheel==0.37.0=pyhd8ed1ab_1\n",
- " - certifi==2021.10.8=py39hf3d152e_1\n",
- " - cffi==1.15.0=py39h4bc2ebd_0\n",
- " - libmamba==0.19.0=h3985d26_0\n",
- " - pycosat==0.6.3=py39h3811e60_1009\n",
- " - pysocks==1.7.1=py39hf3d152e_4\n",
- " - ruamel_yaml==0.15.80=py39h3811e60_1006\n",
- " - setuptools==59.4.0=py39hf3d152e_0\n",
- " - tqdm==4.62.3=pyhd8ed1ab_0\n",
- " - brotlipy==0.7.0=py39h3811e60_1003\n",
- " - conda-package-handling==1.7.3=py39h3811e60_1\n",
- " - cryptography==36.0.0=py39h95dcef6_0\n",
- " - libmambapy==0.19.0=py39h8bfa403_0\n",
- " - pip==21.3.1=pyhd8ed1ab_0\n",
- " - pyopenssl==21.0.0=pyhd8ed1ab_0\n",
- " - urllib3==1.26.7=pyhd8ed1ab_0\n",
- " - requests==2.26.0=pyhd8ed1ab_1\n",
- " - conda==4.11.0=py39hf3d152e_0\n",
- " - mamba==0.19.0=py39hfa8f2c8_0\n",
- "\n",
- "\n",
- " Package Version Build Channel Size\n",
- "───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n",
- " Install:\n",
- "───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n",
- "\n",
- " + _libgcc_mutex 0.1 conda_forge conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2 Cached\n",
- " + _openmp_mutex 4.5 1_gnu conda-forge/linux-64/_openmp_mutex-4.5-1_gnu.tar.bz2 Cached\n",
- " + brotlipy 0.7.0 py39h3811e60_1003 conda-forge/linux-64/brotlipy-0.7.0-py39h3811e60_1003.tar.bz2 Cached\n",
- " + bzip2 1.0.8 h7f98852_4 conda-forge/linux-64/bzip2-1.0.8-h7f98852_4.tar.bz2 Cached\n",
- " + c-ares 1.18.1 h7f98852_0 conda-forge/linux-64/c-ares-1.18.1-h7f98852_0.tar.bz2 Cached\n",
- " + ca-certificates 2021.10.8 ha878542_0 conda-forge/linux-64/ca-certificates-2021.10.8-ha878542_0.tar.bz2 Cached\n",
- " + certifi 2021.10.8 py39hf3d152e_1 conda-forge/linux-64/certifi-2021.10.8-py39hf3d152e_1.tar.bz2 Cached\n",
- " + cffi 1.15.0 py39h4bc2ebd_0 conda-forge/linux-64/cffi-1.15.0-py39h4bc2ebd_0.tar.bz2 Cached\n",
- " + charset-normalizer 2.0.9 pyhd8ed1ab_0 conda-forge/noarch/charset-normalizer-2.0.9-pyhd8ed1ab_0.tar.bz2 Cached\n",
- " + colorama 0.4.4 pyh9f0ad1d_0 conda-forge/noarch/colorama-0.4.4-pyh9f0ad1d_0.tar.bz2 Cached\n",
- " + conda 4.11.0 py39hf3d152e_0 conda-forge/linux-64/conda-4.11.0-py39hf3d152e_0.tar.bz2 Cached\n",
- " + conda-package-handling 1.7.3 py39h3811e60_1 conda-forge/linux-64/conda-package-handling-1.7.3-py39h3811e60_1.tar.bz2 Cached\n",
- " + cryptography 36.0.0 py39h95dcef6_0 conda-forge/linux-64/cryptography-36.0.0-py39h95dcef6_0.tar.bz2 Cached\n",
- " + icu 69.1 h9c3ff4c_0 conda-forge/linux-64/icu-69.1-h9c3ff4c_0.tar.bz2 Cached\n",
- " + idna 3.1 pyhd3deb0d_0 conda-forge/noarch/idna-3.1-pyhd3deb0d_0.tar.bz2 Cached\n",
- " + krb5 1.19.2 hcc1bbae_3 conda-forge/linux-64/krb5-1.19.2-hcc1bbae_3.tar.bz2 Cached\n",
- " + ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge/linux-64/ld_impl_linux-64-2.36.1-hea4e1c9_2.tar.bz2 Cached\n",
- " + libarchive 3.5.2 hccf745f_1 conda-forge/linux-64/libarchive-3.5.2-hccf745f_1.tar.bz2 Cached\n",
- " + libcurl 7.80.0 h2574ce0_0 conda-forge/linux-64/libcurl-7.80.0-h2574ce0_0.tar.bz2 Cached\n",
- " + libedit 3.1.20191231 he28a2e2_2 conda-forge/linux-64/libedit-3.1.20191231-he28a2e2_2.tar.bz2 Cached\n",
- " + libev 4.33 h516909a_1 conda-forge/linux-64/libev-4.33-h516909a_1.tar.bz2 Cached\n",
- " + libffi 3.4.2 h7f98852_5 conda-forge/linux-64/libffi-3.4.2-h7f98852_5.tar.bz2 Cached\n",
- " + libgcc-ng 11.2.0 h1d223b6_11 conda-forge/linux-64/libgcc-ng-11.2.0-h1d223b6_11.tar.bz2 Cached\n",
- " + libgomp 11.2.0 h1d223b6_11 conda-forge/linux-64/libgomp-11.2.0-h1d223b6_11.tar.bz2 Cached\n",
- " + libiconv 1.16 h516909a_0 conda-forge/linux-64/libiconv-1.16-h516909a_0.tar.bz2 Cached\n",
- " + libmamba 0.19.0 h3985d26_0 conda-forge/linux-64/libmamba-0.19.0-h3985d26_0.tar.bz2 Cached\n",
- " + libmambapy 0.19.0 py39h8bfa403_0 conda-forge/linux-64/libmambapy-0.19.0-py39h8bfa403_0.tar.bz2 Cached\n",
- " + libnghttp2 1.43.0 h812cca2_1 conda-forge/linux-64/libnghttp2-1.43.0-h812cca2_1.tar.bz2 Cached\n",
- " + libsolv 0.7.19 h780b84a_5 conda-forge/linux-64/libsolv-0.7.19-h780b84a_5.tar.bz2 Cached\n",
- " + libssh2 1.10.0 ha56f1ee_2 conda-forge/linux-64/libssh2-1.10.0-ha56f1ee_2.tar.bz2 Cached\n",
- " + libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge/linux-64/libstdcxx-ng-11.2.0-he4da1e4_11.tar.bz2 Cached\n",
- " + libxml2 2.9.12 h885dcf4_1 conda-forge/linux-64/libxml2-2.9.12-h885dcf4_1.tar.bz2 Cached\n",
- " + libzlib 1.2.11 h36c2ea0_1013 conda-forge/linux-64/libzlib-1.2.11-h36c2ea0_1013.tar.bz2 Cached\n",
- " + lz4-c 1.9.3 h9c3ff4c_1 conda-forge/linux-64/lz4-c-1.9.3-h9c3ff4c_1.tar.bz2 Cached\n",
- " + lzo 2.10 h516909a_1000 conda-forge/linux-64/lzo-2.10-h516909a_1000.tar.bz2 Cached\n",
- " + mamba 0.19.0 py39hfa8f2c8_0 conda-forge/linux-64/mamba-0.19.0-py39hfa8f2c8_0.tar.bz2 Cached\n",
- " + ncurses 6.2 h58526e2_4 conda-forge/linux-64/ncurses-6.2-h58526e2_4.tar.bz2 Cached\n",
- " + openssl 1.1.1l h7f98852_0 conda-forge/linux-64/openssl-1.1.1l-h7f98852_0.tar.bz2 Cached\n",
- " + pip 21.3.1 pyhd8ed1ab_0 conda-forge/noarch/pip-21.3.1-pyhd8ed1ab_0.tar.bz2 Cached\n",
- " + pybind11-abi 4 hd8ed1ab_3 conda-forge/noarch/pybind11-abi-4-hd8ed1ab_3.tar.bz2 Cached\n",
- " + pycosat 0.6.3 py39h3811e60_1009 conda-forge/linux-64/pycosat-0.6.3-py39h3811e60_1009.tar.bz2 Cached\n",
- " + pycparser 2.21 pyhd8ed1ab_0 conda-forge/noarch/pycparser-2.21-pyhd8ed1ab_0.tar.bz2 Cached\n",
- " + pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge/noarch/pyopenssl-21.0.0-pyhd8ed1ab_0.tar.bz2 Cached\n",
- " + pysocks 1.7.1 py39hf3d152e_4 conda-forge/linux-64/pysocks-1.7.1-py39hf3d152e_4.tar.bz2 Cached\n",
- " + python 3.9.7 hb7a2778_3_cpython conda-forge/linux-64/python-3.9.7-hb7a2778_3_cpython.tar.bz2 Cached\n",
- " + python_abi 3.9 2_cp39 conda-forge/linux-64/python_abi-3.9-2_cp39.tar.bz2 Cached\n",
- " + readline 8.1 h46c0cb4_0 conda-forge/linux-64/readline-8.1-h46c0cb4_0.tar.bz2 Cached\n",
- " + reproc 14.2.3 h7f98852_0 conda-forge/linux-64/reproc-14.2.3-h7f98852_0.tar.bz2 Cached\n",
- " + reproc-cpp 14.2.3 h9c3ff4c_0 conda-forge/linux-64/reproc-cpp-14.2.3-h9c3ff4c_0.tar.bz2 Cached\n",
- " + requests 2.26.0 pyhd8ed1ab_1 conda-forge/noarch/requests-2.26.0-pyhd8ed1ab_1.tar.bz2 Cached\n",
- " + ruamel_yaml 0.15.80 py39h3811e60_1006 conda-forge/linux-64/ruamel_yaml-0.15.80-py39h3811e60_1006.tar.bz2 Cached\n",
- " + setuptools 59.4.0 py39hf3d152e_0 conda-forge/linux-64/setuptools-59.4.0-py39hf3d152e_0.tar.bz2 Cached\n",
- " + six 1.16.0 pyh6c4a22f_0 conda-forge/noarch/six-1.16.0-pyh6c4a22f_0.tar.bz2 Cached\n",
- " + sqlite 3.37.0 h9cd32fc_0 conda-forge/linux-64/sqlite-3.37.0-h9cd32fc_0.tar.bz2 Cached\n",
- " + tk 8.6.11 h27826a3_1 conda-forge/linux-64/tk-8.6.11-h27826a3_1.tar.bz2 Cached\n",
- " + tqdm 4.62.3 pyhd8ed1ab_0 conda-forge/noarch/tqdm-4.62.3-pyhd8ed1ab_0.tar.bz2 Cached\n",
- " + tzdata 2021e he74cb21_0 conda-forge/noarch/tzdata-2021e-he74cb21_0.tar.bz2 Cached\n",
- " + urllib3 1.26.7 pyhd8ed1ab_0 conda-forge/noarch/urllib3-1.26.7-pyhd8ed1ab_0.tar.bz2 Cached\n",
- " + wheel 0.37.0 pyhd8ed1ab_1 conda-forge/noarch/wheel-0.37.0-pyhd8ed1ab_1.tar.bz2 Cached\n",
- " + xz 5.2.5 h516909a_1 conda-forge/linux-64/xz-5.2.5-h516909a_1.tar.bz2 Cached\n",
- " + yaml 0.2.5 h516909a_0 conda-forge/linux-64/yaml-0.2.5-h516909a_0.tar.bz2 Cached\n",
- " + yaml-cpp 0.6.3 he1b5a44_4 conda-forge/linux-64/yaml-cpp-0.6.3-he1b5a44_4.tar.bz2 Cached\n",
- " + zlib 1.2.11 h36c2ea0_1013 conda-forge/linux-64/zlib-1.2.11-h36c2ea0_1013.tar.bz2 Cached\n",
- " + zstd 1.5.0 ha95c52a_0 conda-forge/linux-64/zstd-1.5.0-ha95c52a_0.tar.bz2 Cached\n",
- "\n",
- " Summary:\n",
- "\n",
- " Install: 64 packages\n",
- "\n",
- " Total download: 0 B\n",
- "\n",
- "───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n",
- "\n",
- "\n",
- "\n",
- "Transaction starting\n",
- "Linking ca-certificates-2021.10.8-ha878542_0\n",
- "Linking ld_impl_linux-64-2.36.1-hea4e1c9_2\n",
- "Linking libstdcxx-ng-11.2.0-he4da1e4_11\n",
- "Linking pybind11-abi-4-hd8ed1ab_3\n",
- "Linking _libgcc_mutex-0.1-conda_forge\n",
- "Linking tzdata-2021e-he74cb21_0\n",
- "Linking libgomp-11.2.0-h1d223b6_11\n",
- "Linking _openmp_mutex-4.5-1_gnu\n",
- "Linking libgcc-ng-11.2.0-h1d223b6_11\n",
- "Linking ncurses-6.2-h58526e2_4\n",
- "Linking libzlib-1.2.11-h36c2ea0_1013\n",
- "Linking libiconv-1.16-h516909a_0\n",
- "Linking libev-4.33-h516909a_1\n",
- "Linking yaml-cpp-0.6.3-he1b5a44_4\n",
- "Linking yaml-0.2.5-h516909a_0\n",
- "Linking xz-5.2.5-h516909a_1\n",
- "Linking reproc-14.2.3-h7f98852_0\n",
- "Linking openssl-1.1.1l-h7f98852_0\n",
- "Linking lzo-2.10-h516909a_1000\n",
- "Linking lz4-c-1.9.3-h9c3ff4c_1\n",
- "Linking libffi-3.4.2-h7f98852_5\n",
- "Linking icu-69.1-h9c3ff4c_0\n",
- "Linking c-ares-1.18.1-h7f98852_0\n",
- "Linking bzip2-1.0.8-h7f98852_4\n",
- "Linking readline-8.1-h46c0cb4_0\n",
- "Linking libedit-3.1.20191231-he28a2e2_2\n",
- "Linking zlib-1.2.11-h36c2ea0_1013\n",
- "Linking reproc-cpp-14.2.3-h9c3ff4c_0\n",
- "Linking tk-8.6.11-h27826a3_1\n",
- "Linking zstd-1.5.0-ha95c52a_0\n",
- "Linking sqlite-3.37.0-h9cd32fc_0\n",
- "Linking libxml2-2.9.12-h885dcf4_1\n",
- "Linking libssh2-1.10.0-ha56f1ee_2\n",
- "Linking libsolv-0.7.19-h780b84a_5\n",
- "Linking libnghttp2-1.43.0-h812cca2_1\n",
- "Linking krb5-1.19.2-hcc1bbae_3\n",
- "Linking python-3.9.7-hb7a2778_3_cpython\n",
- "Linking libarchive-3.5.2-hccf745f_1\n",
- "Linking libcurl-7.80.0-h2574ce0_0\n",
- "Linking python_abi-3.9-2_cp39\n",
- "Linking wheel-0.37.0-pyhd8ed1ab_1\n",
- "Linking libmamba-0.19.0-h3985d26_0\n",
- "Linking setuptools-59.4.0-py39hf3d152e_0\n",
- "Linking pip-21.3.1-pyhd8ed1ab_0\n",
- "Linking six-1.16.0-pyh6c4a22f_0\n",
- "Linking idna-3.1-pyhd3deb0d_0\n",
- "Linking libmambapy-0.19.0-py39h8bfa403_0\n",
- "Linking ruamel_yaml-0.15.80-py39h3811e60_1006\n",
- "Linking pysocks-1.7.1-py39hf3d152e_4\n",
- "Linking pycosat-0.6.3-py39h3811e60_1009\n",
- "Linking certifi-2021.10.8-py39hf3d152e_1\n",
- "Linking pycparser-2.21-pyhd8ed1ab_0\n",
- "Linking colorama-0.4.4-pyh9f0ad1d_0\n",
- "Linking charset-normalizer-2.0.9-pyhd8ed1ab_0\n",
- "Linking cffi-1.15.0-py39h4bc2ebd_0\n",
- "Linking tqdm-4.62.3-pyhd8ed1ab_0\n",
- "Linking cryptography-36.0.0-py39h95dcef6_0\n",
- "Linking brotlipy-0.7.0-py39h3811e60_1003\n",
- "Linking conda-package-handling-1.7.3-py39h3811e60_1\n",
- "Linking pyopenssl-21.0.0-pyhd8ed1ab_0\n",
- "Linking urllib3-1.26.7-pyhd8ed1ab_0\n",
- "Linking requests-2.26.0-pyhd8ed1ab_1\n",
- "Linking conda-4.11.0-py39hf3d152e_0\n",
- "Linking mamba-0.19.0-py39hfa8f2c8_0\n",
- "Transaction finished\n",
- "installation finished.\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- " % Total % Received % Xferd Average Speed Time Time Time Current\n",
- " Dload Upload Total Spent Left Speed\n",
- "100 160 100 160 0 0 969 0 --:--:-- --:--:-- --:--:-- 969\n",
- "100 665 100 665 0 0 2224 0 --:--:-- --:--:-- --:--:-- 2224\n",
- "100 102M 100 102M 0 0 14.3M 0 0:00:07 0:00:07 --:--:-- 14.4M\n",
- "bash: line 7: !cp: command not found\n"
- ]
- }
- ],
- "source": [
- "%%bash\n",
- "\n",
- "#install Mamba\n",
- "curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
- "bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge\n",
- "rm Mamba*"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 20,
- "id": "a40f7ebc",
- "metadata": {},
- "outputs": [],
- "source": [
- "#move mamba executable to your path\n",
- "!cp ~/mambaforge/bin/mamba /home/ec2-user/anaconda3/condabin"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 21,
- "id": "f421805e",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Requirement already satisfied: biopython in /home/ec2-user/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages (1.79)\n",
- "Requirement already satisfied: numpy in /home/ec2-user/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages (from biopython) (1.19.5)\n",
- "\u001b[33mWARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.\n",
- "You should consider upgrading via the '/home/ec2-user/anaconda3/envs/amazonei_mxnet_p36/bin/python -m pip install --upgrade pip' command.\u001b[0m\n"
- ]
- }
- ],
- "source": [
- "#install biopython to import packages below\n",
- "!pip install biopython"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 41,
- "id": "fd936fd6",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Collecting package metadata (current_repodata.json): done\n",
- "Solving environment: - \n",
- "The environment is inconsistent, please check the package plan carefully\n",
- "The following packages are causing the inconsistency:\n",
- "\n",
- " - conda-forge/noarch::seaborn-base==0.11.1=pyhd8ed1ab_1\n",
- " - conda-forge/noarch::nbclassic==0.2.6=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::typing-extensions==3.7.4.3=0\n",
- " - conda-forge/linux-64::pluggy==0.13.1=py36h5fab9bb_4\n",
- " - conda-forge/linux-64::blaze==0.11.3=py36_0\n",
- " - conda-forge/linux-64::matplotlib==3.3.4=py36h5fab9bb_0\n",
- " - conda-forge/noarch::python-language-server==0.36.2=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::jupyterlab_server==2.3.0=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::pyls-black==0.4.6=pyh9f0ad1d_0\n",
- " - conda-forge/linux-64::scikit-image==0.16.2=py36hb3f55d8_0\n",
- " - conda-forge/noarch::path.py==12.5.0=0\n",
- " - conda-forge/noarch::qdarkstyle==2.8.1=pyhd8ed1ab_2\n",
- " - conda-forge/noarch::ipywidgets==7.6.3=pyhd3deb0d_0\n",
- " - conda-forge/noarch::black==20.8b1=py_1\n",
- " - conda-forge/linux-64::anyio==2.1.0=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::jupyter_server==1.4.1=py36h5fab9bb_0\n",
- " - conda-forge/noarch::nbclient==0.5.2=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::widgetsnbextension==3.5.1=py36h5fab9bb_4\n",
- " - conda-forge/linux-64::bokeh==2.2.3=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::keyring==22.0.1=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::nbconvert==6.0.7=py36h5fab9bb_3\n",
- " - conda-forge/noarch::numpydoc==1.1.0=py_1\n",
- " - conda-forge/linux-64::spyder==4.2.0=py36h5fab9bb_0\n",
- " - conda-forge/noarch::flake8==3.8.4=py_0\n",
- " - conda-forge/noarch::pyls-spyder==0.3.2=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::nbformat==5.1.2=pyhd8ed1ab_1\n",
- " - conda-forge/noarch::importlib_metadata==3.6.0=hd8ed1ab_0\n",
- " - conda-forge/noarch::aioitertools==0.7.1=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::jupyterlab_launcher==0.13.1=py_2\n",
- " - conda-forge/noarch::odo==0.5.1=py_1\n",
- " - conda-forge/noarch::imageio==2.9.0=py_0\n",
- " - conda-forge/noarch::helpdev==0.7.1=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::path==15.1.2=py36h5fab9bb_0\n",
- " - conda-forge/noarch::jsonschema==3.2.0=py_2\n",
- " - conda-forge/linux-64::yarl==1.6.3=py36h8f6f2f9_1\n",
- " - conda-forge/noarch::sphinx==3.5.1=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::seaborn==0.11.1=hd8ed1ab_1\n",
- " - conda-forge/linux-64::jupyter==1.0.0=py36h5fab9bb_6\n",
- " - conda-forge/linux-64::nb_conda==2.2.1=py36h5fab9bb_4\n",
- " - conda-forge/noarch::dask==2021.2.0=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::matplotlib-base==3.3.4=py36hd391965_0\n",
- " - conda-forge/noarch::anaconda-client==1.7.2=py_0\n",
- " - conda-forge/noarch::anaconda-project==0.9.1=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::importlib-metadata==3.6.0=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::pytest==6.2.2=py36h5fab9bb_0\n",
- "failed with initial frozen solve. Retrying with flexible solve.\n",
- "Collecting package metadata (repodata.json): done\n",
- "Solving environment: - \n",
- "The environment is inconsistent, please check the package plan carefully\n",
- "The following packages are causing the inconsistency:\n",
- "\n",
- " - conda-forge/noarch::seaborn-base==0.11.1=pyhd8ed1ab_1\n",
- " - conda-forge/noarch::nbclassic==0.2.6=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::typing-extensions==3.7.4.3=0\n",
- " - conda-forge/linux-64::pluggy==0.13.1=py36h5fab9bb_4\n",
- " - conda-forge/linux-64::blaze==0.11.3=py36_0\n",
- " - conda-forge/linux-64::matplotlib==3.3.4=py36h5fab9bb_0\n",
- " - conda-forge/noarch::python-language-server==0.36.2=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::jupyterlab_server==2.3.0=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::pyls-black==0.4.6=pyh9f0ad1d_0\n",
- " - conda-forge/linux-64::scikit-image==0.16.2=py36hb3f55d8_0\n",
- " - conda-forge/noarch::path.py==12.5.0=0\n",
- " - conda-forge/noarch::qdarkstyle==2.8.1=pyhd8ed1ab_2\n",
- " - conda-forge/noarch::ipywidgets==7.6.3=pyhd3deb0d_0\n",
- " - conda-forge/noarch::black==20.8b1=py_1\n",
- " - conda-forge/linux-64::anyio==2.1.0=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::jupyter_server==1.4.1=py36h5fab9bb_0\n",
- " - conda-forge/noarch::nbclient==0.5.2=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::widgetsnbextension==3.5.1=py36h5fab9bb_4\n",
- " - conda-forge/linux-64::bokeh==2.2.3=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::keyring==22.0.1=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::nbconvert==6.0.7=py36h5fab9bb_3\n",
- " - conda-forge/noarch::numpydoc==1.1.0=py_1\n",
- " - conda-forge/linux-64::spyder==4.2.0=py36h5fab9bb_0\n",
- " - conda-forge/noarch::flake8==3.8.4=py_0\n",
- " - conda-forge/noarch::pyls-spyder==0.3.2=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::nbformat==5.1.2=pyhd8ed1ab_1\n",
- " - conda-forge/noarch::importlib_metadata==3.6.0=hd8ed1ab_0\n",
- " - conda-forge/noarch::aioitertools==0.7.1=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::jupyterlab_launcher==0.13.1=py_2\n",
- " - conda-forge/noarch::odo==0.5.1=py_1\n",
- " - conda-forge/noarch::imageio==2.9.0=py_0\n",
- " - conda-forge/noarch::helpdev==0.7.1=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::path==15.1.2=py36h5fab9bb_0\n",
- " - conda-forge/noarch::jsonschema==3.2.0=py_2\n",
- " - conda-forge/linux-64::yarl==1.6.3=py36h8f6f2f9_1\n",
- " - conda-forge/noarch::sphinx==3.5.1=pyhd8ed1ab_0\n",
- " - conda-forge/noarch::seaborn==0.11.1=hd8ed1ab_1\n",
- " - conda-forge/linux-64::jupyter==1.0.0=py36h5fab9bb_6\n",
- " - conda-forge/linux-64::nb_conda==2.2.1=py36h5fab9bb_4\n",
- " - conda-forge/noarch::dask==2021.2.0=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::matplotlib-base==3.3.4=py36hd391965_0\n",
- " - conda-forge/noarch::anaconda-client==1.7.2=py_0\n",
- " - conda-forge/noarch::anaconda-project==0.9.1=pyhd8ed1ab_0\n",
- " - conda-forge/linux-64::importlib-metadata==3.6.0=py36h5fab9bb_0\n",
- " - conda-forge/linux-64::pytest==6.2.2=py36h5fab9bb_0\n",
- "done\n",
- "\n",
- "\n",
- "==> WARNING: A newer version of conda exists. <==\n",
- " current version: 4.8.4\n",
- " latest version: 4.11.0\n",
- "\n",
- "Please update conda by running\n",
- "\n",
- " $ conda update -n base -c defaults conda\n",
- "\n",
- "\n",
- "\n",
- "## Package Plan ##\n",
- "\n",
- " environment location: /home/ec2-user/anaconda3/envs/amazonei_mxnet_p36\n",
- "\n",
- " added / updated specs:\n",
- " - ipyrad\n",
- "\n",
- "\n",
- "The following packages will be downloaded:\n",
- "\n",
- " package | build\n",
- " ---------------------------|-----------------\n",
- " astroid-2.7.3 | py36h5fab9bb_0 330 KB conda-forge\n",
- " bedtools-2.30.0 | h7d7f7ad_1 17.9 MB bioconda\n",
- " cutadapt-3.4 | py36hc5360cc_1 197 KB bioconda\n",
- " dataclasses-0.8 | pyh787bdff_2 22 KB conda-forge\n",
- " dnaio-0.5.1 | py36hc5360cc_0 137 KB bioconda\n",
- " flask-cors-3.0.10 | pyhd8ed1ab_0 15 KB conda-forge\n",
- " fsspec-2021.11.1 | pyhd8ed1ab_0 91 KB conda-forge\n",
- " htslib-1.11 | hd3b49d5_1 1.7 MB bioconda\n",
- " jupyter_console-5.2.0 | py36_1 34 KB conda-forge\n",
- " libdeflate-1.6 | h516909a_0 60 KB conda-forge\n",
- " mpi4py-3.0.3 | py36he1a1962_7 696 KB conda-forge\n",
- " notebook-6.3.0 | py36h5fab9bb_0 6.3 MB conda-forge\n",
- " perl-5.32.1 | 0_h7f98852_perl5 14.5 MB conda-forge\n",
- " pillow-8.2.0 | py36ha6010c0_1 688 KB conda-forge\n",
- " platformdirs-2.3.0 | pyhd8ed1ab_0 14 KB conda-forge\n",
- " pylint-2.10.2 | pyhd8ed1ab_0 255 KB conda-forge\n",
- " pysam-0.16.0.1 | py36h4c34d4e_1 2.5 MB bioconda\n",
- " python-isal-0.11.0 | py36h8f6f2f9_0 136 KB conda-forge\n",
- " reportlab-3.5.68 | py36h3e18861_0 2.4 MB conda-forge\n",
- " samtools-1.11 | h6270b1f_0 383 KB bioconda\n",
- " typing_extensions-3.7.4.3 | py_0 25 KB conda-forge\n",
- " vsearch-2.17.1 | h95f258a_0 1.4 MB bioconda\n",
- " xopen-1.2.0 | py36h5fab9bb_0 22 KB conda-forge\n",
- " ------------------------------------------------------------\n",
- " Total: 49.7 MB\n",
- "\n",
- "The following NEW packages will be INSTALLED:\n",
- "\n",
- " arrow conda-forge/noarch::arrow-1.2.1-pyhd8ed1ab_0\n",
- " astroid conda-forge/linux-64::astroid-2.7.3-py36h5fab9bb_0\n",
- " bedtools bioconda/linux-64::bedtools-2.30.0-h7d7f7ad_1\n",
- " bwa bioconda/linux-64::bwa-0.7.17-h5bf99c6_8\n",
- " charset-normalizer conda-forge/noarch::charset-normalizer-2.0.9-pyhd8ed1ab_0\n",
- " colorama conda-forge/noarch::colorama-0.4.4-pyh9f0ad1d_0\n",
- " custom-inherit conda-forge/noarch::custom-inherit-2.4.0-pyhd8ed1ab_0\n",
- " cutadapt bioconda/linux-64::cutadapt-3.4-py36hc5360cc_1\n",
- " dataclasses conda-forge/noarch::dataclasses-0.8-pyh787bdff_2\n",
- " dnaio bioconda/linux-64::dnaio-0.5.1-py36hc5360cc_0\n",
- " docutils conda-forge/linux-64::docutils-0.16-py36h5fab9bb_3\n",
- " flask-cors conda-forge/noarch::flask-cors-3.0.10-pyhd8ed1ab_0\n",
- " fsspec conda-forge/noarch::fsspec-2021.11.1-pyhd8ed1ab_0\n",
- " htslib bioconda/linux-64::htslib-1.11-hd3b49d5_1\n",
- " ipyrad bioconda/noarch::ipyrad-0.9.81-pyh5e36f6f_0\n",
- " isa-l conda-forge/linux-64::isa-l-2.30.0-ha770c72_4\n",
- " jupyter_console conda-forge/linux-64::jupyter_console-5.2.0-py36_1\n",
- " libdeflate conda-forge/linux-64::libdeflate-1.6-h516909a_0\n",
- " mpi conda-forge/linux-64::mpi-1.0-openmpi\n",
- " mpi4py conda-forge/linux-64::mpi4py-3.0.3-py36he1a1962_7\n",
- " muscle bioconda/linux-64::muscle-3.8.1551-h7d875b9_6\n",
- " notebook conda-forge/linux-64::notebook-6.3.0-py36h5fab9bb_0\n",
- " openjpeg conda-forge/linux-64::openjpeg-2.4.0-hb52868f_1\n",
- " openmpi conda-forge/linux-64::openmpi-4.1.1-hbfc84c5_0\n",
- " pbzip2 conda-forge/linux-64::pbzip2-1.1.13-0\n",
- " perl conda-forge/linux-64::perl-5.32.1-0_h7f98852_perl5\n",
- " pigz conda-forge/linux-64::pigz-2.6-h27826a3_0\n",
- " pillow conda-forge/linux-64::pillow-8.2.0-py36ha6010c0_1\n",
- " platformdirs conda-forge/noarch::platformdirs-2.3.0-pyhd8ed1ab_0\n",
- " pylint conda-forge/noarch::pylint-2.10.2-pyhd8ed1ab_0\n",
- " pypng conda-forge/noarch::pypng-0.0.20-py_0\n",
- " pysam bioconda/linux-64::pysam-0.16.0.1-py36h4c34d4e_1\n",
- " python-isal conda-forge/linux-64::python-isal-0.11.0-py36h8f6f2f9_0\n",
- " reportlab conda-forge/linux-64::reportlab-3.5.68-py36h3e18861_0\n",
- " requests conda-forge/noarch::requests-2.26.0-pyhd8ed1ab_1\n",
- " samtools bioconda/linux-64::samtools-1.11-h6270b1f_0\n",
- " toyplot conda-forge/noarch::toyplot-0.19.0-pyh9f0ad1d_0\n",
- " typing_extensions conda-forge/noarch::typing_extensions-3.7.4.3-py_0\n",
- " urllib3 conda-forge/noarch::urllib3-1.26.7-pyhd8ed1ab_0\n",
- " vsearch bioconda/linux-64::vsearch-2.17.1-h95f258a_0\n",
- " xopen conda-forge/linux-64::xopen-1.2.0-py36h5fab9bb_0\n",
- "\n",
- "The following packages will be DOWNGRADED:\n",
- "\n",
- " libgcc-ng 11.2.0-h1d223b6_11 --> 9.3.0-h2828fa1_18\n",
- " libgomp 11.2.0-h1d223b6_11 --> 9.3.0-h2828fa1_18\n",
- " openssl 1.1.1l-h7f98852_0 --> 1.1.1k-h7f98852_0\n",
- "\n",
- "\n",
- "\n",
- "Downloading and Extracting Packages\n",
- "reportlab-3.5.68 | 2.4 MB | ##################################### | 100% \n",
- "dnaio-0.5.1 | 137 KB | ##################################### | 100% \n",
- "htslib-1.11 | 1.7 MB | ##################################### | 100% \n",
- "cutadapt-3.4 | 197 KB | ##################################### | 100% \n",
- "libdeflate-1.6 | 60 KB | ##################################### | 100% \n",
- "flask-cors-3.0.10 | 15 KB | ##################################### | 100% \n",
- "typing_extensions-3. | 25 KB | ##################################### | 100% \n",
- "samtools-1.11 | 383 KB | ##################################### | 100% \n",
- "fsspec-2021.11.1 | 91 KB | ##################################### | 100% \n",
- "bedtools-2.30.0 | 17.9 MB | ##################################### | 100% \n",
- "perl-5.32.1 | 14.5 MB | ##################################### | 100% \n",
- "python-isal-0.11.0 | 136 KB | ##################################### | 100% \n",
- "dataclasses-0.8 | 22 KB | ##################################### | 100% \n",
- "pillow-8.2.0 | 688 KB | ##################################### | 100% \n",
- "astroid-2.7.3 | 330 KB | ##################################### | 100% \n",
- "pylint-2.10.2 | 255 KB | ##################################### | 100% \n",
- "pysam-0.16.0.1 | 2.5 MB | ##################################### | 100% \n",
- "vsearch-2.17.1 | 1.4 MB | ##################################### | 100% \n",
- "jupyter_console-5.2. | 34 KB | ##################################### | 100% \n",
- "xopen-1.2.0 | 22 KB | ##################################### | 100% \n",
- "mpi4py-3.0.3 | 696 KB | ##################################### | 100% \n",
- "platformdirs-2.3.0 | 14 KB | ##################################### | 100% \n",
- "notebook-6.3.0 | 6.3 MB | ##################################### | 100% \n",
- "Preparing transaction: done\n",
- "Verifying transaction: done\n",
- "Executing transaction: - \n",
- "For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default.\n",
- "To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before\n",
- "launching your MPI processes. Equivalently, you can set the MCA parameter in the command line:\n",
- "mpiexec --mca opal_cuda_support 1 ...\n",
- " \n",
- "In addition, the UCX support is also built but disabled by default.\n",
- "To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment\n",
- "variables OMPI_MCA_pml=\"ucx\" OMPI_MCA_osc=\"ucx\" before launching your MPI processes.\n",
- "Equivalently, you can set the MCA parameters in the command line:\n",
- "mpiexec --mca pml ucx --mca osc ucx ...\n",
- "Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX.\n",
- "Please consult UCX's documentation for detail.\n",
- " \n",
- "\n",
- "done\n"
- ]
- }
- ],
- "source": [
- "!conda install ipyrad -y -c conda-forge -c bioconda"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2d0f27ee",
- "metadata": {},
- "source": [
- "Now we want to create a conda/mamba env that has all of our necessary dependencies"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 23,
- "id": "4ba6fae7",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "name: pangolin\n",
- "channels:\n",
- " - bioconda\n",
- " - conda-forge\n",
- " - defaults\n",
- " - eaton-lab\n",
- " \n",
- "dependencies:\n",
- " - sra-tools\n",
- " - ipyrad\n",
- " - toytree\n",
- " - pangolin\n",
- " - iqtree\n"
- ]
- }
- ],
- "source": [
- "#you can look at the yaml file that specifies which programs we want to install\n",
- "#you can also specify specific versions, here we just use the latest conda versionå\n",
- "#for example, - sra-tools=2.11.0\n",
- "!cat pangolin.yaml"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 25,
- "id": "49a20dc5",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "usage: mamba [-h] {create,export,list,remove,update,config} ...\n",
- "mamba: error: unrecognized arguments: -y\n"
- ]
- }
- ],
- "source": [
- "#create the environment. Here we use mamba because it is faster than conda\n",
- "!mamba env create -f pangolin.yaml -y"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 33,
- "id": "fd23abbd",
- "metadata": {},
- "outputs": [],
- "source": [
- "#give it the whole path to the env because otherwise it can't find the env\n",
- "#if you want to play with it add a cell and type 'conda activate pangolin' \n",
- "#or 'source activate pangolin'\n",
- "!source activate /home/ec2-user/mambaforge/envs/pangolin\n",
- "#!mamba info --envs"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 35,
- "id": "96dd7966",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\n",
- " __ __ __ __\n",
- " / \\ / \\ / \\ / \\\n",
- " / \\/ \\/ \\/ \\\n",
- "███████████████/ /██/ /██/ /██/ /████████████████████████\n",
- " / / \\ / \\ / \\ / \\ \\____\n",
- " / / \\_/ \\_/ \\_/ \\ o \\__,\n",
- " / _/ \\_____/ `\n",
- " |/\n",
- " ███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗\n",
- " ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗\n",
- " ██╔████╔██║███████║██╔████╔██║██████╔╝███████║\n",
- " ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║\n",
- " ██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║\n",
- " ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝\n",
- "\n",
- " mamba (0.19.0) supported by @QuantStack\n",
- "\n",
- " GitHub: https://github.com/mamba-org/mamba\n",
- " Twitter: https://twitter.com/QuantStack\n",
- "\n",
- "█████████████████████████████████████████████████████████████\n",
- "\n",
- "\n",
- "Looking for: ['iqtree']\n",
- "\n",
- "bioconda/linux-64 Using cache\n",
- "bioconda/noarch Using cache\n",
- "conda-forge/linux-64 Using cache\n",
- "conda-forge/noarch Using cache\n",
- "pkgs/main/linux-64 Using cache\n",
- "pkgs/main/noarch Using cache\n",
- "pkgs/r/linux-64 Using cache\n",
- "pkgs/r/noarch Using cache\n",
- "\n",
- "Pinned packages:\n",
- " - python 3.6.*\n",
- "\n",
- "\n",
- "Transaction\n",
- "\n",
- " Prefix: /home/ec2-user/anaconda3/envs/amazonei_mxnet_p36\n",
- "\n",
- " All requested packages already installed\n",
- "\n"
- ]
- }
- ],
- "source": [
- "!mamba install -c bioconda iqtree -y"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 40,
- "id": "5a99cf0d",
- "metadata": {},
- "outputs": [
- {
- "ename": "ModuleNotFoundError",
- "evalue": "No module named 'ipyrad'",
- "output_type": "error",
- "traceback": [
- "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
- "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mBio\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mSeqIO\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mBio\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mEntrez\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mipyrad\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0manalysis\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mipa\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtoytree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
- "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'ipyrad'"
- ]
- }
- ],
- "source": [
- "#import libraries\n",
- "import os\n",
- "from Bio import SeqIO\n",
- "from Bio import Entrez\n",
- "import ipyrad.analysis as ipa\n",
- "import toytree"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "dc694629",
- "metadata": {},
- "source": [
- "### Set up your directory structure and remove files from previous runs if they exist"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 28,
- "id": "0f0e81f3",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "/home/jupyter/cloud-lab-training/GCP/notebooks/pangolin\n"
- ]
- }
- ],
- "source": [
- "cd /home/jupyter/cloud-lab-training/GCP/notebooks/pangolin/"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 29,
- "id": "8f831fca",
- "metadata": {},
- "outputs": [],
- "source": [
- "if not os.path.exists('pangolin_analysis'):\n",
- " os.mkdir('pangolin_analysis')\n",
- "os.chdir('pangolin_analysis')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 30,
- "id": "6423ca5d",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "rm: cannot remove 'sarscov2_*': No such file or directory\n",
- "rm: cannot remove 'lineage_report.csv': No such file or directory\n"
- ]
- }
- ],
- "source": [
- "if os.path.exists('sarscov2_sequences.fasta'):\n",
- " os.remove('sarscov2_sequences.fasta')\n",
- "!rm sarscov2_*\n",
- "!rm lineage_report.csv"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "9d7015e6",
- "metadata": {},
- "source": [
- "### Fetch viral sequences using a list of accession IDs"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 31,
- "id": "16824bcf",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "the number of sequences we will analyze = 18\n"
- ]
- }
- ],
- "source": [
- "#give a list of accession number for covid sequences\n",
- "acc_nums=['NC_045512','LR757995','LR757996','OL698718','OL677199','OL672836','MZ914912','MZ916499','MZ908464','MW580573','MW580574','MW580576','MW991906','MW931310','MW932027','MW424864','MW453109','MW453110']\n",
- "print('the number of sequences we will analyze = ',len(acc_nums))"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "9e382d33",
- "metadata": {},
- "source": [
- "Let this block run without going to the next until it finishes, otherwise you may get an error about too many requests. If that happens, reset your kernel and just rerun everything (except installing software)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 32,
- "id": "a28a7122",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Saved NC_045512\n",
- "Saved LR757995\n",
- "Saved LR757996\n",
- "Saved OL698718\n",
- "Saved OL677199\n",
- "Saved OL672836\n",
- "Saved MZ914912\n",
- "Saved MZ916499\n",
- "Saved MZ908464\n",
- "Saved MW580573\n",
- "Saved MW580574\n",
- "Saved MW580576\n",
- "Saved MW991906\n",
- "Saved MW931310\n",
- "Saved MW932027\n",
- "Saved MW424864\n",
- "Saved MW453109\n",
- "Saved MW453110\n"
- ]
- }
- ],
- "source": [
- "#use the bio.entrez toolkit within biopython to download the accession numbers\n",
- "#save those sequences to a single fasta file\n",
- "Entrez.email = \"email@example.com\" # Always tell NCBI who you are\n",
- "filename = \"sarscov2_seqs.fasta\"\n",
- "if not os.path.isfile(filename):\n",
- " # Downloading...\n",
- " for acc in acc_nums:\n",
- " net_handle = Entrez.efetch(\n",
- " db=\"nucleotide\", id=acc, rettype=\"fasta\", retmode=\"text\"\n",
- " )\n",
- " out_handle = open(filename, \"a\")\n",
- " out_handle.write(net_handle.read())\n",
- " out_handle.close()\n",
- " net_handle.close()\n",
- " print(\"Saved\",acc)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "id": "56acb7cc",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "the number of seqs in our fasta file: \n",
- "18\n"
- ]
- }
- ],
- "source": [
- "#make sure our fasta file has the same number of seqs as the acc_nums list\n",
- "print('the number of seqs in our fasta file: ')\n",
- "!grep '>' sarscov2_seqs.fasta | wc -l"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 16,
- "id": "8606c352",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- ">NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome\n",
- "ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA\n",
- "CGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAAC\n",
- "TAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTG\n",
- "TTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTC\n",
- "CCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTAC\n",
- "GTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG\n",
- "CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAAACGTTCGGAT\n",
- "GCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTC\n",
- "GTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCT\n"
- ]
- }
- ],
- "source": [
- "#let's peek at our new fasta file\n",
- "!head sarscov2_seqs.fasta"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "2db37b4e",
- "metadata": {
- "tags": []
- },
- "source": [
- "### Run pangolin to identify lineages and output alignment\n",
- "Here we call pangolin, give it our input sequences and the number of threads. We also tell it to output the alignment. The full list of pangolin parameters can be found in the [docs](https://cov-lineages.org/resources/pangolin/usage.html)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 33,
- "id": "f1a17a74",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\u001b[32mAll dependencies satisfied.\u001b[0m\n",
- "\u001b[32mThe query file is:\u001b[0m/home/jupyter/cloud-lab-training/GCP/notebooks/pangolin/pangolin_analysis/sarscov2_seqs.fasta\n",
- "\u001b[32m** Running sequence QC **\u001b[0m\n",
- "\u001b[32mNumber of sequences detected: \u001b[0m18\n",
- "\u001b[32mTotal passing QC: \u001b[0m18\n",
- "\u001b[32m\n",
- "Data files found:\u001b[0m\n",
- "Trained model:\t/opt/conda/lib/python3.7/site-packages/pangoLEARN/data/decisionTree_v1.joblib\n",
- "Header file:\t/opt/conda/lib/python3.7/site-packages/pangoLEARN/data/decisionTreeHeaders_v1.joblib\n",
- "Designated hash:\t/opt/conda/lib/python3.7/site-packages/pangoLEARN/data/lineages.hash.csv\n",
- "\u001b[33mJob stats:\n",
- "job count min threads max threads\n",
- "-------------------- ------- ------------- -------------\n",
- "add_failed_seqs 1 1 1\n",
- "align_to_reference 1 1 1\n",
- "all 1 1 1\n",
- "generate_report 1 1 1\n",
- "get_constellations 1 1 1\n",
- "hash_sequence_assign 1 1 1\n",
- "pangolearn 1 1 1\n",
- "scorpio 1 4 4\n",
- "total 8 1 4\n",
- "\u001b[0m\n",
- "loading model 12/04/2021, 00:00:50\n",
- "/opt/conda/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.24.2 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.\n",
- " UserWarning)\n",
- "processing block of 6 sequences 12/04/2021, 00:00:51\n",
- "complete 12/04/2021, 00:00:51\n",
- "\u001b[32mOutput file written to: \u001b[0m/home/jupyter/cloud-lab-training/GCP/notebooks/pangolin/pangolin_analysis/lineage_report.csv\n",
- "\u001b[32mOutput alignment written to: \u001b[0m/home/jupyter/cloud-lab-training/GCP/notebooks/pangolin/pangolin_analysis/sequences.aln.fasta\n"
- ]
- }
- ],
- "source": [
- "!pangolin sarscov2_seqs.fasta --alignment --threads $CPU"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b0e56a4b",
- "metadata": {},
- "source": [
- "You can view the output file from pangolin called lineage_report.csv (within pangolin_analysis folder) by double clicking on the file, or by right clicking and downloading. What lineages are present in the dataset? Is Omicron in there?"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "37e6efbe",
- "metadata": {},
- "source": [
- "### Run iqtree to estimate maximum likelihood tree for our sequences\n",
- "iqtree can find the best nucleotide model for the data, but here we are going to assign a model to save time (HKY) and just estimate the phylogeny without any bootstrap support values. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 18,
- "id": "f2782855",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "IQ-TREE multicore version 2.1.4-beta COVID-edition for Linux 64-bit built Jun 24 2021\n",
- "Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,\n",
- "Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.\n",
- "\n",
- "Host: cloud-lab-notebook (AVX2, FMA3, 14 GB RAM)\n",
- "Command: iqtree -s sequences.aln.fasta -nt 4 -m HKY --prefix sarscov2_tree --redo-tree\n",
- "Seed: 719057 (Using SPRNG - Scalable Parallel Random Number Generator)\n",
- "Time: Fri Dec 3 23:53:05 2021\n",
- "Kernel: AVX+FMA - 4 threads (4 CPU cores detected)\n",
- "\n",
- "Reading alignment file sequences.aln.fasta ... Fasta format detected\n",
- "Alignment most likely contains DNA/RNA sequences\n",
- "WARNING: 494 sites contain only gaps or ambiguous characters.\n",
- "Alignment has 18 sequences with 29903 columns, 193 distinct patterns\n",
- "109 parsimony-informative, 33 singleton sites, 29761 constant sites\n",
- "WARNING: Some sequence names are changed as follows:\n",
- "LR757995.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome:_whole_genome -> LR757995.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome__whole_genome\n",
- "LR757996.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome:_whole_genome -> LR757996.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome__whole_genome\n",
- "OL698718.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MN-MDH-18236/2021_ORF1ab_polyprotein_(ORF1ab)__ORF1a_polyprotein_(ORF1ab)__surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__ORF6_protein_(ORF6)__ORF7a_protein_(ORF7a)__ORF7b_(ORF7b)__ORF8_protein_(ORF8)__nucleocapsid_phosphoprotein_(N)__and_ORF10_protein_(ORF10)_genes__complete_cds -> OL698718.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MN-MDH-18236/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds\n",
- "OL677199.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/CAN/ON-NML-249359/2021_ORF1ab_polyprotein_(ORF1ab)__ORF1a_polyprotein_(ORF1ab)__surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__ORF6_protein_(ORF6)__and_ORF7a_protein_(ORF7a)_genes__complete_cds;_ORF7b_gene__complete_sequence;_and_ORF8_protein_(ORF8)__nucleocapsid_phosphoprotein_(N)__and_ORF10_protein_(ORF10)_genes__complete_cds -> OL677199.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/CAN/ON-NML-249359/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___and_ORF7a_protein__ORF7a__genes__complete_cds__ORF7b_gene__complete_sequence__and_ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds\n",
- "MZ914912.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG796484/2020_ORF1ab_polyprotein_(ORF1ab)__ORF1a_polyprotein_(ORF1ab)__surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__ORF6_protein_(ORF6)__ORF7a_protein_(ORF7a)__ORF7b_(ORF7b)__ORF8_protein_(ORF8)__nucleocapsid_phosphoprotein_(N)__and_ORF10_protein_(ORF10)_genes__complete_cds -> MZ914912.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG796484/2020_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds\n",
- "MZ916499.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG841289/2020_ORF1ab_polyprotein_(ORF1ab)_and_ORF1a_polyprotein_(ORF1ab)_genes__partial_cds;_and_surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__ORF6_protein_(ORF6)__ORF7a_protein_(ORF7a)__ORF7b_(ORF7b)__ORF8_protein_(ORF8)__nucleocapsid_phosphoprotein_(N)__and_ORF10_protein_(ORF10)_genes__complete_cds -> MZ916499.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG841289/2020_ORF1ab_polyprotein__ORF1ab__and_ORF1a_polyprotein__ORF1ab__genes__partial_cds__and_surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds\n",
- "MZ908464.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG769681/2020_ORF1ab_polyprotein_(ORF1ab)__ORF1a_polyprotein_(ORF1ab)__surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__and_ORF6_protein_(ORF6)_genes__complete_cds;_ORF7a_protein_(ORF7a)_and_ORF7b_(ORF7b)_genes__partial_cds;_and_ORF8_protein_(ORF8)__nucleocapsid_phosphoprotein_(N)__and_ORF10_protein_(ORF10)_genes__complete_cds -> MZ908464.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG769681/2020_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___and_ORF6_protein__ORF6__genes__complete_cds__ORF7a_protein__ORF7a__and_ORF7b__ORF7b__genes__partial_cds__and_ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds\n",
- "MW580573.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0830/2021_ORF1ab_polyprotein_(ORF1ab)__ORF1a_polyprotein_(ORF1ab)__surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__ORF6_protein_(ORF6)__ORF7a_protein_(ORF7a)__ORF7b_(ORF7b)__ORF8_protein_(ORF8)__nucleocapsid_phosphoprotein_(N)__and_ORF10_protein_(ORF10)_genes__complete_cds -> MW580573.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0830/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds\n",
- "MW991906.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-CDC-FG-021330/2021_ORF1ab_polyprotein_(ORF1ab)_and_ORF1a_polyprotein_(ORF1ab)_genes__partial_cds;_surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__ORF6_protein_(ORF6)__ORF7a_protein_(ORF7a)__and_ORF7b_(ORF7b)_genes__complete_cds;_ORF8_gene__complete_sequence;_and_nucleocapsid_phosphoprotein_(N)_and_ORF10_protein_(ORF10)_genes__complete_cds -> MW991906.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-CDC-FG-021330/2021_ORF1ab_polyprotein__ORF1ab__and_ORF1a_polyprotein__ORF1ab__genes__partial_cds__surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___and_ORF7b__ORF7b__genes__complete_cds__ORF8_gene__complete_sequence__and_nucleocapsid_phosphoprotein__N__and_ORF10_protein__ORF10__genes__complete_cds\n",
- "MW932027.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MA-CDC-STM-000044850/2021_ORF1ab_polyprotein_(ORF1ab)__ORF1a_polyprotein_(ORF1ab)__surface_glycoprotein_(S)__ORF3a_protein_(ORF3a)__envelope_protein_(E)__membrane_glycoprotein_(M)__ORF6_protein_(ORF6)__ORF7a_protein_(ORF7a)__and_ORF7b_(ORF7b)_genes__complete_cds;_ORF8_gene__complete_sequence;_and_nucleocapsid_phosphoprotein_(N)_and_ORF10_protein_(ORF10)_genes__complete_cds -> MW932027.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MA-CDC-STM-000044850/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___and_ORF7b__ORF7b__genes__complete_cds__ORF8_gene__complete_sequence__and_nucleocapsid_phosphoprotein__N__and_ORF10_protein__ORF10__genes__complete_cds\n",
- "\n",
- " Gap/Ambiguity Composition p-value\n",
- " 1 NC_045512.2_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_Wuhan-Hu-1__complete_genome 1.65% passed 99.98%\n",
- " 2 LR757995.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome__whole_genome 1.65% passed 99.98%\n",
- " 3 LR757996.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome__whole_genome 1.65% passed 99.98%\n",
- " 4 OL698718.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MN-MDH-18236/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds 3.28% passed 99.65%\n",
- " 5 OL677199.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/CAN/ON-NML-249359/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___and_ORF7a_protein__ORF7a__genes__complete_cds__ORF7b_gene__complete_sequence__and_ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds 1.82% passed 99.91%\n",
- " 6 OL672836.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/BEL/rega-20174/2021__complete_genome 1.78% passed 99.96%\n",
- " 7 MZ914912.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG796484/2020_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds 2.69% passed 99.99%\n",
- " 8 MZ916499.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG841289/2020_ORF1ab_polyprotein__ORF1ab__and_ORF1a_polyprotein__ORF1ab__genes__partial_cds__and_surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds 7.75% passed 98.28%\n",
- " 9 MZ908464.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG769681/2020_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___and_ORF6_protein__ORF6__genes__complete_cds__ORF7a_protein__ORF7a__and_ORF7b__ORF7b__genes__partial_cds__and_ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds 5.93% passed 96.00%\n",
- " 10 MW580573.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0830/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds 2.26% passed 99.99%\n",
- " 11 MW580574.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0831/2021__complete_genome 2.02% passed 99.95%\n",
- " 12 MW580576.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0833/2021__complete_genome 1.98% passed 99.93%\n",
- " 13 MW991906.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-CDC-FG-021330/2021_ORF1ab_polyprotein__ORF1ab__and_ORF1a_polyprotein__ORF1ab__genes__partial_cds__surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___and_ORF7b__ORF7b__genes__complete_cds__ORF8_gene__complete_sequence__and_nucleocapsid_phosphoprotein__N__and_ORF10_protein__ORF10__genes__complete_cds 2.19% passed 99.82%\n",
- " 14 MW931310.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/IN-CDC-STM-000045992/2021__complete_genome 1.68% passed 100.00%\n",
- " 15 MW932027.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MA-CDC-STM-000044850/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___and_ORF7b__ORF7b__genes__complete_cds__ORF8_gene__complete_sequence__and_nucleocapsid_phosphoprotein__N__and_ORF10_protein__ORF10__genes__complete_cds 1.70% passed 99.98%\n",
- " 16 MW424864.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-LACPHL-AF00051/2020__complete_genome 1.91% passed 99.99%\n",
- " 17 MW453109.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-LACPHL-AF00094/2020__complete_genome 2.13% passed 99.99%\n",
- " 18 MW453110.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-LACPHL-AF00093/2020__complete_genome 1.93% passed 99.98%\n",
- "**** TOTAL 2.56% 0 sequences failed composition chi2 test (p-value<5%; df=3)\n",
- "NOTE: LR757996.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome__whole_genome is identical to NC_045512.2_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_Wuhan-Hu-1__complete_genome but kept for subsequent analysis\n",
- "Creating fast initial parsimony tree by random order stepwise addition...\n",
- "0.003 seconds, parsimony score: 154 (based on 142 sites)\n",
- "\n",
- "NOTE: 0 MB RAM (0 GB) is required!\n",
- "WARNING: Number of threads seems too high for short alignments. Use -T AUTO to determine best number of threads.\n",
- "Estimate model parameters (epsilon = 0.100)\n",
- "1. Initial log-likelihood: -41361.570\n",
- "2. Current log-likelihood: -41335.547\n",
- "3. Current log-likelihood: -41330.199\n",
- "4. Current log-likelihood: -41321.132\n",
- "5. Current log-likelihood: -41320.973\n",
- "Optimal log-likelihood: -41320.963\n",
- "Rate parameters: A-C: 1.00000 A-G: 4.58202 A-T: 1.00000 C-G: 1.00000 C-T: 4.58202 G-T: 1.00000\n",
- "Base frequencies: A: 0.299 C: 0.183 G: 0.196 T: 0.322\n",
- "Parameters optimization took 5 rounds (0.015 sec)\n",
- "Computing ML distances based on estimated model parameters...\n",
- "Computing ML distances took 0.001753 sec (of wall-clock time) 0.005914 sec(of CPU time)\n",
- "Computing RapidNJ tree took 0.009550 sec (of wall-clock time) 0.009811 sec (of CPU time)\n",
- "Log-likelihood of RapidNJ tree: -41382.805\n",
- "--------------------------------------------------------------------\n",
- "| INITIALIZING CANDIDATE TREE SET |\n",
- "--------------------------------------------------------------------\n",
- "Generating 98 parsimony trees... 0.079 second\n",
- "Computing log-likelihood of 98 initial trees ... 0.073 seconds\n",
- "Current best score: -41316.721\n",
- "\n",
- "Do NNI search on 20 best initial trees\n",
- "Estimate model parameters (epsilon = 0.100)\n",
- "BETTER TREE FOUND at iteration 1: -41316.682\n",
- "Iteration 10 / LogL: -41316.682 / Time: 0h:0m:0s\n",
- "Iteration 20 / LogL: -41316.682 / Time: 0h:0m:0s\n",
- "Finish initializing candidate tree set (1)\n",
- "Current best tree score: -41316.682 / CPU time: 0.246\n",
- "Number of iterations: 20\n",
- "--------------------------------------------------------------------\n",
- "| OPTIMIZING CANDIDATE TREE SET |\n",
- "--------------------------------------------------------------------\n",
- "UPDATE BEST LOG-LIKELIHOOD: -41316.682\n",
- "Iteration 30 / LogL: -41316.732 / Time: 0h:0m:0s (0h:0m:0s left)\n",
- "Iteration 40 / LogL: -41316.719 / Time: 0h:0m:0s (0h:0m:0s left)\n",
- "Iteration 50 / LogL: -41316.716 / Time: 0h:0m:0s (0h:0m:0s left)\n",
- "Iteration 60 / LogL: -41341.534 / Time: 0h:0m:0s (0h:0m:0s left)\n",
- "Iteration 70 / LogL: -41316.803 / Time: 0h:0m:0s (0h:0m:0s left)\n",
- "UPDATE BEST LOG-LIKELIHOOD: -41316.682\n",
- "Iteration 80 / LogL: -41327.750 / Time: 0h:0m:0s (0h:0m:0s left)\n",
- "Iteration 90 / LogL: -41316.734 / Time: 0h:0m:1s (0h:0m:0s left)\n",
- "UPDATE BEST LOG-LIKELIHOOD: -41316.682\n",
- "Iteration 100 / LogL: -41316.803 / Time: 0h:0m:1s (0h:0m:0s left)\n",
- "TREE SEARCH COMPLETED AFTER 102 ITERATIONS / Time: 0h:0m:1s\n",
- "\n",
- "--------------------------------------------------------------------\n",
- "| FINALIZING TREE SEARCH |\n",
- "--------------------------------------------------------------------\n",
- "Performs final model parameters optimization\n",
- "Estimate model parameters (epsilon = 0.010)\n",
- "1. Initial log-likelihood: -41316.682\n",
- "Optimal log-likelihood: -41316.677\n",
- "Rate parameters: A-C: 1.00000 A-G: 4.46795 A-T: 1.00000 C-G: 1.00000 C-T: 4.46795 G-T: 1.00000\n",
- "Base frequencies: A: 0.299 C: 0.183 G: 0.196 T: 0.322\n",
- "Parameters optimization took 1 rounds (0.002 sec)\n",
- "BEST SCORE FOUND : -41316.677\n",
- "Total tree length: 0.005\n",
- "\n",
- "Total number of iterations: 102\n",
- "CPU time used for tree search: 4.260 sec (0h:0m:4s)\n",
- "Wall-clock time used for tree search: 1.142 sec (0h:0m:1s)\n",
- "Total CPU time used: 4.401 sec (0h:0m:4s)\n",
- "Total wall-clock time used: 1.190 sec (0h:0m:1s)\n",
- "\n",
- "Analysis results written to: \n",
- " IQ-TREE report: sarscov2_tree.iqtree\n",
- " Maximum-likelihood tree: sarscov2_tree.treefile\n",
- " Likelihood distances: sarscov2_tree.mldist\n",
- " Screen log file: sarscov2_tree.log\n",
- "\n",
- "Date and Time: Fri Dec 3 23:53:06 2021\n"
- ]
- }
- ],
- "source": [
- "#run iqtree with threads = $CPU variable, if you exclude the -m it will do a phylogenetic model search before tree search\n",
- "!iqtree -s sequences.aln.fasta -nt $CPU -m HKY --prefix sarscov2_tree --redo-tree"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c7197dd4",
- "metadata": {},
- "source": [
- "### Visualize the tree with toytree"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 19,
- "id": "cef2ba18",
- "metadata": {},
- "outputs": [],
- "source": [
- "#Define the tree file\n",
- "tre = toytree.tree('sarscov2_tree.treefile')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 23,
- "id": "842af165",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "MW453110.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-LACPHL-AF00093/2020__complete_genome MW453109.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-LACPHL-AF00094/2020__complete_genome MW424864.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-LACPHL-AF00051/2020__complete_genome MW580576.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0833/2021__complete_genome MW580574.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0831/2021__complete_genome MW580573.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MD-MDH-0830/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds MZ908464.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG769681/2020_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___and_ORF6_protein__ORF6__genes__complete_cds__ORF7a_protein__ORF7a__and_ORF7b__ORF7b__genes__partial_cds__and_ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds MZ914912.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG796484/2020_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds MZ916499.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/TG841289/2020_ORF1ab_polyprotein__ORF1ab__and_ORF1a_polyprotein__ORF1ab__genes__partial_cds__and_surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds LR757996.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome__whole_genome NC_045512.2_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_Wuhan-Hu-1__complete_genome LR757995.1_Severe_acute_respiratory_syndrome_coronavirus_2_genome_assembly__chromosome__whole_genome MW932027.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MA-CDC-STM-000044850/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___and_ORF7b__ORF7b__genes__complete_cds__ORF8_gene__complete_sequence__and_nucleocapsid_phosphoprotein__N__and_ORF10_protein__ORF10__genes__complete_cds MW991906.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/CA-CDC-FG-021330/2021_ORF1ab_polyprotein__ORF1ab__and_ORF1a_polyprotein__ORF1ab__genes__partial_cds__surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___and_ORF7b__ORF7b__genes__complete_cds__ORF8_gene__complete_sequence__and_nucleocapsid_phosphoprotein__N__and_ORF10_protein__ORF10__genes__complete_cds MW931310.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/IN-CDC-STM-000045992/2021__complete_genome OL672836.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/BEL/rega-20174/2021__complete_genome OL677199.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/CAN/ON-NML-249359/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___and_ORF7a_protein__ORF7a__genes__complete_cds__ORF7b_gene__complete_sequence__and_ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds OL698718.1_Severe_acute_respiratory_syndrome_coronavirus_2_isolate_SARS-CoV-2/human/USA/MN-MDH-18236/2021_ORF1ab_polyprotein__ORF1ab___ORF1a_polyprotein__ORF1ab___surface_glycoprotein__S___ORF3a_protein__ORF3a___envelope_protein__E___membrane_glycoprotein__M___ORF6_protein__ORF6___ORF7a_protein__ORF7a___ORF7b__ORF7b___ORF8_protein__ORF8___nucleocapsid_phosphoprotein__N___and_ORF10_protein__ORF10__genes__complete_cds
"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "#draw the tree\n",
- "rtre = tre.root(wildcard=\"OL\")\n",
- "rtre.draw(tip_labels_align=True);"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "52d9389f",
- "metadata": {},
- "source": [
- "You can also visualize the tree by downloading it and opening in figtree."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "88457512",
- "metadata": {},
- "source": [
- "And that is all! You now know how to run workflows in notebooks in Cloud Lab"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "e417cb1a",
- "metadata": {},
- "outputs": [],
- "source": []
- }
- ],
- "metadata": {
- "environment": {
- "kernel": "python3",
- "name": "r-cpu.4-1.m87",
- "type": "gcloud",
- "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-1:m87"
- },
- "kernelspec": {
- "display_name": "conda_amazonei_mxnet_p36",
- "language": "python",
- "name": "conda_amazonei_mxnet_p36"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.13"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/tutorials/notebooks/rnaseq-myco-tutorial-main/RNAseq_pipeline.ipynb b/tutorials/notebooks/rnaseq-myco-tutorial-main/RNAseq_pipeline.ipynb
deleted file mode 100644
index efaa963..0000000
--- a/tutorials/notebooks/rnaseq-myco-tutorial-main/RNAseq_pipeline.ipynb
+++ /dev/null
@@ -1,556 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "source": [
- "# RNA-Seq Analysis Training Demo on Azure"
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Overview"
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "This short tutorial demonstrates how to run an RNA-Seq workflow using a prokaryotic data set. Steps in the workflow include read trimming, read QC, read mapping, and counting mapped reads per gene to quantitative gene expression."
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 1: Setup Environment"
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "Note that within Jupyter you can run a bash comman either by using the magic '!' in front of your command, or by adding %%bash to the top of your cell."
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "For example\n",
- "```\n",
- "%%bash\n",
- "example command\n",
- "```\n",
- "Or\n",
- "```\n",
- "!example command\n",
- "```"
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "The first step is to install mamba forge, which is the newer and faster version of the conda package manager."
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "!curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
- "!bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\n 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\n100 82.9M 100 82.9M 0 0 115M 0 --:--:-- --:--:-- --:--:-- 198M\nERROR: File or directory already exists: '/home/azureuser/mambaforge'\nIf you want to update an existing installation, use the -u option.\n"
- }
- ],
- "execution_count": 1,
- "metadata": {
- "tags": []
- }
- },
- {
- "cell_type": "code",
- "source": [
- "#add to your path\n",
- "import os\n",
- "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
- ],
- "outputs": [],
- "execution_count": 2,
- "metadata": {
- "gather": {
- "logged": 1682515170386
- }
- }
- },
- {
- "cell_type": "code",
- "source": [
- "! mamba info --envs"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "\n __ __ __ __\n / \\ / \\ / \\ / \\\n / \\/ \\/ \\/ \\\n███████████████/ /██/ /██/ /██/ /████████████████████████\n / / \\ / \\ / \\ / \\ \\____\n / / \\_/ \\_/ \\_/ \\ o \\__,\n / _/ \\_____/ `\n |/\n ███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗\n ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗\n ██╔████╔██║███████║██╔████╔██║██████╔╝███████║\n ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║\n ██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║\n ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝\n\n mamba (1.1.0) supported by @QuantStack\n\n GitHub: https://github.com/mamba-org/mamba\n Twitter: https://twitter.com/QuantStack\n\n█████████████████████████████████████████████████████████████\n\n# conda environments:\n#\n /anaconda\nbase /home/azureuser/mambaforge\n\n"
- }
- ],
- "execution_count": 3,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "Next, we will install the necessary packages into the current environment."
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "! mamba install -c conda-forge -c bioconda -c defaults -y sra-tools pigz pbzip2 fastp fastqc multiqc salmon"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "\n __ __ __ __\n / \\ / \\ / \\ / \\\n / \\/ \\/ \\/ \\\n███████████████/ /██/ /██/ /██/ /████████████████████████\n / / \\ / \\ / \\ / \\ \\____\n / / \\_/ \\_/ \\_/ \\ o \\__,\n / _/ \\_____/ `\n |/\n ███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗\n ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗\n ██╔████╔██║███████║██╔████╔██║██████╔╝███████║\n ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║\n ██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║\n ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝\n\n mamba (1.1.0) supported by @QuantStack\n\n GitHub: https://github.com/mamba-org/mamba\n Twitter: https://twitter.com/QuantStack\n\n█████████████████████████████████████████████████████████████\n\n\nLooking for: ['sra-tools', 'pigz=2.6', 'pbzip2=1.1', 'fastp=0.23.2', 'fastqc=0.11.9', 'multiqc', 'salmon=1.5.1']\n\n\u001b[?25l\u001b[2K\u001b[0G[+] 0.0s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\nconda-forge/noarch \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\nbioconda/linux-64 \u001b[33m━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\nbioconda/noarch \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\npkgs/main/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0Gpkgs/main/linux-64 No change\nbioconda/linux-64 No change\npkgs/r/noarch No change\nbioconda/noarch No change\npkgs/main/noarch No change\npkgs/r/linux-64 No change\n[+] 0.2s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 123.0kB / ??.?MB @ 756.3kB/s 0.2s\nconda-forge/noarch \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 182.2kB / ??.?MB @ 1.1MB/s 0.2s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.3s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 539.1kB / ??.?MB @ 2.0MB/s 0.3s\nconda-forge/noarch \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 734.4kB / ??.?MB @ 2.8MB/s 0.3s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.4s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 1.0MB / ??.?MB @ 2.8MB/s 0.4s\nconda-forge/noarch \u001b[90m━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━\u001b[0m 1.3MB / ??.?MB @ 3.6MB/s 0.4s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.5s\nconda-forge/linux-64 \u001b[90m━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━\u001b[0m 1.5MB / ??.?MB @ 3.1MB/s 0.5s\nconda-forge/noarch \u001b[33m━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━\u001b[0m 1.9MB / ??.?MB @ 4.1MB/s 0.5s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.6s\nconda-forge/linux-64 \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━\u001b[0m 2.0MB / ??.?MB @ 3.5MB/s 0.6s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 2.4MB / ??.?MB @ 4.2MB/s 0.6s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.7s\nconda-forge/linux-64 \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 2.4MB / ??.?MB @ 3.6MB/s 0.7s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 2.9MB / ??.?MB @ 4.3MB/s 0.7s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.8s\nconda-forge/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 2.9MB / ??.?MB @ 3.8MB/s 0.8s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 3.4MB / ??.?MB @ 4.4MB/s 0.8s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.9s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 3.4MB / ??.?MB @ 3.9MB/s 0.9s\nconda-forge/noarch \u001b[90m━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━\u001b[0m 3.9MB / ??.?MB @ 4.5MB/s 0.9s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.0s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━\u001b[0m 3.9MB / ??.?MB @ 4.0MB/s 1.0s\nconda-forge/noarch \u001b[90m━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━\u001b[0m 4.4MB / ??.?MB @ 4.6MB/s 1.0s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━\u001b[0m 4.3MB / ??.?MB @ 4.0MB/s 1.1s\nconda-forge/noarch \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 4.8MB / ??.?MB @ 4.5MB/s 1.1s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.2s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 4.8MB / ??.?MB @ 4.1MB/s 1.2s\nconda-forge/noarch \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 5.1MB / ??.?MB @ 4.3MB/s 1.2s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.3s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 5.2MB / ??.?MB @ 4.1MB/s 1.3s\nconda-forge/noarch \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 5.5MB / ??.?MB @ 4.3MB/s 1.3s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.4s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 5.7MB / ??.?MB @ 4.1MB/s 1.4s\nconda-forge/noarch \u001b[90m━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━\u001b[0m 6.0MB / ??.?MB @ 4.4MB/s 1.4s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.5s\nconda-forge/linux-64 \u001b[90m━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━\u001b[0m 6.2MB / ??.?MB @ 4.2MB/s 1.5s\nconda-forge/noarch \u001b[33m━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━\u001b[0m 6.5MB / ??.?MB @ 4.4MB/s 1.5s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.6s\nconda-forge/linux-64 \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━\u001b[0m 6.7MB / ??.?MB @ 4.2MB/s 1.6s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 7.0MB / ??.?MB @ 4.4MB/s 1.6s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.7s\nconda-forge/linux-64 \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 7.1MB / ??.?MB @ 4.3MB/s 1.7s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 7.5MB / ??.?MB @ 4.4MB/s 1.7s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.8s\nconda-forge/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 7.7MB / ??.?MB @ 4.3MB/s 1.8s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 7.8MB / ??.?MB @ 4.4MB/s 1.8s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 1.9s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 8.1MB / ??.?MB @ 4.3MB/s 1.9s\nconda-forge/noarch \u001b[90m━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━\u001b[0m 8.3MB / ??.?MB @ 4.4MB/s 1.9s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.0s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━\u001b[0m 8.6MB / ??.?MB @ 4.3MB/s 2.0s\nconda-forge/noarch \u001b[90m━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━\u001b[0m 8.7MB / ??.?MB @ 4.4MB/s 2.0s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━\u001b[0m 9.1MB / ??.?MB @ 4.4MB/s 2.1s\nconda-forge/noarch \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 9.1MB / ??.?MB @ 4.4MB/s 2.1s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.2s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 9.6MB / ??.?MB @ 4.4MB/s 2.2s\nconda-forge/noarch \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 9.4MB / ??.?MB @ 4.3MB/s 2.2s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.3s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 10.1MB / ??.?MB @ 4.4MB/s 2.3s\nconda-forge/noarch \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 9.9MB / ??.?MB @ 4.3MB/s 2.3s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.4s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 10.6MB / ??.?MB @ 4.5MB/s 2.4s\nconda-forge/noarch \u001b[90m━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━\u001b[0m 10.4MB / ??.?MB @ 4.3MB/s 2.4s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.5s\nconda-forge/linux-64 \u001b[90m━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━\u001b[0m 11.1MB / ??.?MB @ 4.5MB/s 2.5s\nconda-forge/noarch \u001b[33m━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━\u001b[0m 10.8MB / ??.?MB @ 4.4MB/s 2.5s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.6s\nconda-forge/linux-64 \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━\u001b[0m 11.6MB / ??.?MB @ 4.5MB/s 2.6s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 11.3MB / ??.?MB @ 4.4MB/s 2.6s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.7s\nconda-forge/linux-64 \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 12.1MB / ??.?MB @ 4.5MB/s 2.7s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 11.8MB / ??.?MB @ 4.4MB/s 2.7s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.8s\nconda-forge/linux-64 \u001b[90m━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 12.4MB / ??.?MB @ 4.5MB/s 2.8s\nconda-forge/noarch \u001b[33m━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━\u001b[0m 12.0MB / ??.?MB @ 4.4MB/s 2.8s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 2.9s\nconda-forge/linux-64 \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 12.4MB @ 4.5MB/s 2.9s\nconda-forge/noarch ━━━━━━━━━━━━━━━━━━━━━━ 12.0MB @ 4.4MB/s Finalizing 2.9s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.0s\nconda-forge/linux-64 \u001b[90m━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 12.4MB / ??.?MB @ 4.5MB/s 3.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.1s\nconda-forge/linux-64 \u001b[90m━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 12.4MB / ??.?MB @ 4.5MB/s 3.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.2s\nconda-forge/linux-64 \u001b[90m━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 12.4MB / ??.?MB @ 4.5MB/s 3.2s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.3s\nconda-forge/linux-64 \u001b[90m━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 12.4MB / ??.?MB @ 4.5MB/s 3.3s\u001b[2K\u001b[1A\u001b[2K\u001b[0Gconda-forge/noarch @ 4.4MB/s 2.9s\n[+] 3.4s\nconda-forge/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 12.4MB / ??.?MB @ 3.7MB/s 3.4s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.5s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 13.7MB / ??.?MB @ 4.0MB/s 3.5s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.6s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━\u001b[0m 14.3MB / ??.?MB @ 4.0MB/s 3.6s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.7s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━\u001b[0m 14.8MB / ??.?MB @ 4.0MB/s 3.7s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.8s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 15.3MB / ??.?MB @ 4.1MB/s 3.8s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 3.9s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 15.8MB / ??.?MB @ 4.1MB/s 3.9s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.0s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 16.4MB / ??.?MB @ 4.1MB/s 4.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.1s\nconda-forge/linux-64 \u001b[90m━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━\u001b[0m 16.9MB / ??.?MB @ 4.2MB/s 4.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.2s\nconda-forge/linux-64 \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━\u001b[0m 17.4MB / ??.?MB @ 4.2MB/s 4.2s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.3s\nconda-forge/linux-64 \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 17.9MB / ??.?MB @ 4.2MB/s 4.3s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.4s\nconda-forge/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 18.4MB / ??.?MB @ 4.2MB/s 4.4s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.5s\nconda-forge/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 18.4MB / ??.?MB @ 4.2MB/s 4.5s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.6s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 19.4MB / ??.?MB @ 4.3MB/s 4.6s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.7s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━\u001b[0m 19.9MB / ??.?MB @ 4.3MB/s 4.7s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.8s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━\u001b[0m 20.4MB / ??.?MB @ 4.3MB/s 4.8s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 4.9s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 20.9MB / ??.?MB @ 4.3MB/s 4.9s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.0s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 21.4MB / ??.?MB @ 4.3MB/s 5.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 21.9MB / ??.?MB @ 4.3MB/s 5.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.2s\nconda-forge/linux-64 \u001b[90m━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━\u001b[0m 22.3MB / ??.?MB @ 4.3MB/s 5.2s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.3s\nconda-forge/linux-64 \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━\u001b[0m 22.8MB / ??.?MB @ 4.3MB/s 5.3s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.4s\nconda-forge/linux-64 \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━━\u001b[0m 23.3MB / ??.?MB @ 4.3MB/s 5.4s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.5s\nconda-forge/linux-64 \u001b[90m━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━\u001b[0m 23.8MB / ??.?MB @ 4.3MB/s 5.5s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.6s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 24.3MB / ??.?MB @ 4.4MB/s 5.6s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.7s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━\u001b[0m 24.5MB / ??.?MB @ 4.3MB/s 5.7s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.8s\nconda-forge/linux-64 \u001b[33m━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━━\u001b[0m 25.0MB / ??.?MB @ 4.3MB/s 5.8s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 5.9s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━\u001b[0m 25.4MB / ??.?MB @ 4.3MB/s 5.9s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.0s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━\u001b[0m 25.9MB / ??.?MB @ 4.3MB/s 6.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━\u001b[0m 26.4MB / ??.?MB @ 4.3MB/s 6.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.2s\nconda-forge/linux-64 \u001b[90m╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━\u001b[0m 26.9MB / ??.?MB @ 4.3MB/s 6.2s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.3s\nconda-forge/linux-64 \u001b[90m━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━\u001b[0m 27.3MB / ??.?MB @ 4.3MB/s 6.3s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.4s\nconda-forge/linux-64 \u001b[90m━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━\u001b[0m 27.8MB / ??.?MB @ 4.4MB/s 6.4s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.5s\nconda-forge/linux-64 \u001b[90m━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 28.2MB / ??.?MB @ 4.3MB/s 6.5s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.6s\nconda-forge/linux-64 \u001b[90m━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━\u001b[0m 28.2MB / ??.?MB @ 4.3MB/s 6.6s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.7s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━\u001b[0m 28.9MB / ??.?MB @ 4.3MB/s 6.7s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.8s\nconda-forge/linux-64 \u001b[90m━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━\u001b[0m 29.3MB / ??.?MB @ 4.3MB/s 6.8s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 6.9s\nconda-forge/linux-64 \u001b[33m━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━━\u001b[0m 29.8MB / ??.?MB @ 4.3MB/s 6.9s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.0s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━\u001b[0m 30.3MB / ??.?MB @ 4.3MB/s 7.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━\u001b[0m 30.7MB / ??.?MB @ 4.3MB/s 7.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.2s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.2s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.3s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.3s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.4s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.4s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.5s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.5s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.6s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.6s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.7s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.7s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.8s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.8s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 7.9s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 7.9s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.0s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.2s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.2s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.3s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.3s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.4s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.4s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.5s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.5s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.6s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.6s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.7s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.7s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.8s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.8s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 8.9s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 8.9s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.0s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.1s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.2s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.2s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.3s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.3s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.4s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.4s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.5s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.5s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.6s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.6s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.7s\nconda-forge/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 30.9MB / ??.?MB @ 4.3MB/s 9.7s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.8s\nconda-forge/linux-64 ━━━━━━━━━━━━━━━━━━━━━━ 31.1MB @ 4.3MB/s Finalizing 9.8s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 9.9s\nconda-forge/linux-64 ━━━━━━━━━━━━━━━━━━━━━━ 31.1MB @ 4.3MB/s Finalizing 9.9s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.0s\nconda-forge/linux-64 ━━━━━━━━━━━━━━━━━━━━━━ 31.1MB @ 4.3MB/s Finalizing 10.0s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.1s\nconda-forge/linux-64 ━━━━━━━━━━━━━━━━━━━━━━ 31.1MB @ 4.3MB/s Finalizing 10.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.2s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.3s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.4s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.5s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.6s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.7s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.8s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 10.9s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 11.0s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 11.1s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 11.2s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 11.3s\n\u001b[2K\u001b[1A\u001b[2K\u001b[0Gconda-forge/linux-64 @ 4.3MB/s 10.1s\n\u001b[?25h\nPinned packages:\n - python 3.10.*\n\n\nTransaction\n\n Prefix: /home/azureuser/mambaforge\n\n All requested packages already installed\n\n\u001b[?25l\u001b[2K\u001b[0G\u001b[?25h"
- }
- ],
- "execution_count": 17,
- "metadata": {
- "scrolled": true,
- "tags": []
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "Create a set of directories to store the reads, reference sequence files, and output files.\n"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "%%bash\n",
- "mkdir -p data\n",
- "mkdir -p data/raw_fastq\n",
- "mkdir -p data/trimmed\n",
- "mkdir -p data/fastqc\n",
- "mkdir -p data/aligned\n",
- "mkdir -p data/reference\n",
- "mkdir -p data/quants"
- ],
- "outputs": [],
- "execution_count": 33,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 2: Copy FASTQ Files\n",
- "In order for this tutorial to run quickly, we will only analyze 50,000 reads from a sample from both sample groups instead of analyzing all the reads from all six samples. These files have been posted on a Azure Blob storage containers that we made publicly accessible."
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349122_1.fastq --output data/raw_fastq/SRR13349122_1.fastq\n",
- "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349122_2.fastq --output data/raw_fastq/SRR13349122_2.fastq\n",
- "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349128_1.fastq --output data/raw_fastq/SRR13349128_1.fastq\n",
- "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/raw_fastq/SRR13349128_2.fastq --output data/raw_fastq/SRR13349128_2.fastq"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 8452k 100 8452k 0 0 10.4M 0 --:--:-- --:--:-- --:--:-- 10.4M\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 8452k 100 8452k 0 0 9328k 0 --:--:-- --:--:-- --:--:-- 9319k\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 8452k 100 8452k 0 0 11.1M 0 --:--:-- --:--:-- --:--:-- 11.1M\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 8452k 100 8452k 0 0 12.7M 0 --:--:-- --:--:-- --:--:-- 12.7M\n"
- }
- ],
- "execution_count": 6,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 3: Copy reference transcriptome files that will be used by Salmon\n",
- "Salmon is a tool that aligns RNA-Seq reads to a set of transcripts rather than the entire genome."
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/reference/M_chelonae_transcripts.fasta --output data/reference/M_chelonae_transcripts.fasta\n",
- "!curl https://storeshare.blob.core.windows.net/publicdata/testsample/RNAseq/reference/decoys.txt --output data/reference/decoys.txt"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 9599k 100 9599k 0 0 12.3M 0 --:--:-- --:--:-- --:--:-- 12.3M\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n100 14 100 14 0 0 76 0 --:--:-- --:--:-- --:--:-- 76\n"
- }
- ],
- "execution_count": 27,
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "ls data/raw_fastq"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "\u001b[0m\u001b[01;32mSRR13349122_1.fastq\u001b[0m* \u001b[01;32mSRR13349128_1.fastq\u001b[0m*\r\n\u001b[01;32mSRR13349122_2.fastq\u001b[0m* \u001b[01;32mSRR13349128_2.fastq\u001b[0m*\r\n"
- }
- ],
- "execution_count": 38,
- "metadata": {
- "jupyter": {
- "source_hidden": false,
- "outputs_hidden": false
- },
- "nteract": {
- "transient": {
- "deleting": false
- }
- },
- "gather": {
- "logged": 1682517580413
- }
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 4: Trim our data with Fastp"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "! fastp -i data/raw_fastq/SRR13349122_1.fastq -I data/raw_fastq/SRR13349122_2.fastq -o data/trimmed/SRR13349122_1_trimmed.fastq -O data/trimmed/SRR13349122_2_trimmed.fastq\n",
- "! fastp -i data/raw_fastq/SRR13349128_1.fastq -I data/raw_fastq/SRR13349128_2.fastq -o data/trimmed/SRR13349128_1_trimmed.fastq -O data/trimmed/SRR13349128_2_trimmed.fastq"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "Read1 before filtering:\ntotal reads: 50000\ntotal bases: 2550000\nQ20 bases: 2451900(96.1529%)\nQ30 bases: 2370275(92.952%)\n\nRead2 before filtering:\ntotal reads: 50000\ntotal bases: 2550000\nQ20 bases: 2376817(93.2085%)\nQ30 bases: 2255260(88.4416%)\n\nRead1 after filtering:\ntotal reads: 49849\ntotal bases: 2542226\nQ20 bases: 2444408(96.1523%)\nQ30 bases: 2363088(92.9535%)\n\nRead2 after filtering:\ntotal reads: 49849\ntotal bases: 2542226\nQ20 bases: 2374927(93.4192%)\nQ30 bases: 2253977(88.6616%)\n\nFiltering result:\nreads passed filter: 99698\nreads failed due to low quality: 246\nreads failed due to too many N: 56\nreads failed due to too short: 0\nreads with adapter trimmed: 18\nbases trimmed due to adapters: 146\n\nDuplication rate: 23.57%\n\nInsert size peak (evaluated by paired-end reads): 33\n\nJSON report: fastp.json\nHTML report: fastp.html\n\nfastp -i data/raw_fastq/SRR13349122_1.fastq -I data/raw_fastq/SRR13349122_2.fastq -o data/trimmed/SRR13349122_1_trimmed.fastq -O data/trimmed/SRR13349122_2_trimmed.fastq \nfastp v0.23.2, time used: 2 seconds\nRead1 before filtering:\ntotal reads: 50000\ntotal bases: 2550000\nQ20 bases: 2447617(95.985%)\nQ30 bases: 2363073(92.6695%)\n\nRead2 before filtering:\ntotal reads: 50000\ntotal bases: 2550000\nQ20 bases: 2379063(93.2966%)\nQ30 bases: 2258898(88.5842%)\n\nRead1 after filtering:\ntotal reads: 49831\ntotal bases: 2541263\nQ20 bases: 2439163(95.9823%)\nQ30 bases: 2354964(92.669%)\n\nRead2 after filtering:\ntotal reads: 49831\ntotal bases: 2541263\nQ20 bases: 2377253(93.5461%)\nQ30 bases: 2257594(88.8375%)\n\nFiltering result:\nreads passed filter: 99662\nreads failed due to low quality: 284\nreads failed due to too many N: 54\nreads failed due to too short: 0\nreads with adapter trimmed: 26\nbases trimmed due to adapters: 236\n\nDuplication rate: 24.244%\n\nInsert size peak (evaluated by paired-end reads): 70\n\nJSON report: fastp.json\nHTML report: fastp.html\n\nfastp -i data/raw_fastq/SRR13349128_1.fastq -I data/raw_fastq/SRR13349128_2.fastq -o data/trimmed/SRR13349128_1_trimmed.fastq -O data/trimmed/SRR13349128_2_trimmed.fastq \nfastp v0.23.2, time used: 1 seconds\n"
- }
- ],
- "execution_count": 39,
- "metadata": {
- "jupyter": {
- "source_hidden": false,
- "outputs_hidden": false
- },
- "nteract": {
- "transient": {
- "deleting": false
- }
- }
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 6: Run FastQC\n",
- "FastQC is an invaluable tool that allows you to evaluate whether there are problems with a set of reads. For example, it will provide a report of whether there is any bias in the sequence composition of the reads."
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "Once FastQC is done running, look at the outputs in data/fastqc. What can you say about the quality of the two samples we are looking at here? "
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "%%bash\n",
- "fastqc -o data/fastqc data/trimmed/SRR13349122_1_trimmed.fastq\n",
- "fastqc -o data/fastqc data/trimmed/SRR13349128_1_trimmed.fastq"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stderr",
- "text": "Started analysis of SRR13349122_1_trimmed.fastq\nApprox 5% complete for SRR13349122_1_trimmed.fastq\nApprox 10% complete for SRR13349122_1_trimmed.fastq\nApprox 15% complete for SRR13349122_1_trimmed.fastq\nApprox 20% complete for SRR13349122_1_trimmed.fastq\nApprox 25% complete for SRR13349122_1_trimmed.fastq\nApprox 30% complete for SRR13349122_1_trimmed.fastq\nApprox 35% complete for SRR13349122_1_trimmed.fastq\nApprox 40% complete for SRR13349122_1_trimmed.fastq\nApprox 45% complete for SRR13349122_1_trimmed.fastq\nApprox 50% complete for SRR13349122_1_trimmed.fastq\nApprox 55% complete for SRR13349122_1_trimmed.fastq\nApprox 60% complete for SRR13349122_1_trimmed.fastq\nApprox 65% complete for SRR13349122_1_trimmed.fastq\nApprox 70% complete for SRR13349122_1_trimmed.fastq\nApprox 75% complete for SRR13349122_1_trimmed.fastq\nApprox 80% complete for SRR13349122_1_trimmed.fastq\nApprox 85% complete for SRR13349122_1_trimmed.fastq\nApprox 90% complete for SRR13349122_1_trimmed.fastq\nApprox 95% complete for SRR13349122_1_trimmed.fastq\nSkipping 'data/trimmed/SRR13349128_1_trimmed.fastq' which didn't exist, or couldn't be read\n"
- },
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "Analysis complete for SRR13349122_1_trimmed.fastq\n"
- }
- ],
- "execution_count": 15,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 7: Run MultiQC\n",
- "MultiQC reads in the FastQQ reports and generate a compiled report for all the analyzed FASTQ files.\n",
- "Just as with fastqc, we can look at the mulitqc results after it finishes at data/multiqc_data"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "! multiqc -f data/fastqc -f\n",
- "#! mv multiqc_data/ data/"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "\u001b[1;30m[WARNING]\u001b[0m multiqc : \u001b[33mMultiQC Version v1.14 now available!\u001b[0m\n\u001b[1;30m[INFO ]\u001b[0m multiqc : This is MultiQC v1.10.1\n\u001b[1;30m[INFO ]\u001b[0m multiqc : Template : default\n\u001b[1;30m[INFO ]\u001b[0m multiqc : Searching : /mnt/batch/tasks/shared/LS_root/mounts/clusters/cloud-lab-notebooks/code/Users/oconnellka/NIHCloudLabAzure-main 2/tutorials/notebooks/rnaseq-myco-tutorial-main/data/fastqc\n\u001b[2KSearching \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[35m100%\u001b[0m \u001b[32m2/2\u001b[0m [2mdata/fastqc/SRR13349122_1_trimmed_fastqc.html\u001b[0m\n\u001b[?25h\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'custom_content' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule custom_content raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/custom_content/custom_content.py\", line 87, in custom_module_classes\n bm = BaseMultiqcModule()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'ngsderive' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule ngsderive raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/ngsderive/ngsderive.py\", line 29, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'purple' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule purple raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/purple/purple.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'conpair' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule conpair raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/conpair/conpair.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'peddy' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule peddy raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/peddy/peddy.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'somalier' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule somalier raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/somalier/somalier.py\", line 29, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'methylQA' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule methylQA raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/methylQA/methylQA.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'mosdepth' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule mosdepth raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/mosdepth/mosdepth.py\", line 74, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'phantompeakqualtools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule phantompeakqualtools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/phantompeakqualtools/phantompeakqualtools.py\", line 20, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'qualimap' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule qualimap raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/qualimap/qualimap.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'preseq' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule preseq raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/preseq/preseq.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'quast' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule quast raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/quast/quast.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'qorts' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule qorts raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/qorts/qorts.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'rna_seqc' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule rna_seqc raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/rna_seqc/rna_seqc.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'rockhopper' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule rockhopper raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/rockhopper/rockhopper.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'rsem' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule rsem raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/rsem/rsem.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'rseqc' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule rseqc raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/rseqc/rseqc.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'busco' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule busco raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/busco/busco.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'goleft_indexcov' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule goleft_indexcov raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/goleft_indexcov/goleft_indexcov.py\", line 19, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'disambiguate' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule disambiguate raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/disambiguate/disambiguate.py\", line 16, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'supernova' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule supernova raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/supernova/supernova.py\", line 19, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'deeptools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule deeptools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/deeptools/deeptools.py\", line 38, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'sargasso' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule sargasso raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/sargasso/sargasso.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'verifybamid' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule verifybamid raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/verifybamid/verifybamid.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'mirtrace' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule mirtrace raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/mirtrace/mirtrace.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'happy' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule happy raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/happy/happy.py\", line 32, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'mirtop' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule mirtop raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/mirtop/mirtop.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'homer' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule homer raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/homer/homer.py\", line 26, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'hops' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule hops raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/hops/hops.py\", line 18, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'macs2' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule macs2 raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/macs2/macs2.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'theta2' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule theta2 raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/theta2/theta2.py\", line 20, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'snpeff' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule snpeff raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/snpeff/snpeff.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'gatk' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule gatk raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/gatk/gatk.py\", line 27, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'htseq' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule htseq raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/htseq/htseq.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'bcftools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule bcftools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/bcftools/bcftools.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'featureCounts' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule featureCounts raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/featureCounts/feature_counts.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'fgbio' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule fgbio raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/fgbio/fgbio.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'dragen' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule dragen raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/dragen/dragen.py\", line 42, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'dedup' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule dedup raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/dedup/dedup.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'damageprofiler' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule damageprofiler raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/damageprofiler/damageprofiler.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'biobambam2' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule biobambam2 raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/biobambam2/biobambam2.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'jcvi' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule jcvi raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/jcvi/jcvi.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'mtnucratio' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule mtnucratio raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/mtnucratio/mtnucratio.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'picard' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule picard raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/picard/picard.py\", line 44, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'sentieon' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule sentieon raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/sentieon/sentieon.py\", line 30, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'prokka' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule prokka raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/prokka/prokka.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'qc3C' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule qc3C raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/qc3C/qc3C.py\", line 137, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'samblaster' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule samblaster raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/samblaster/samblaster.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'samtools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule samtools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/samtools/samtools.py\", line 28, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'sexdeterrmine' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule sexdeterrmine raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/sexdeterrmine/sexdeterrmine.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'eigenstratdatabasetools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule eigenstratdatabasetools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/eigenstratdatabasetools/eigenstratdatabasetools.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'bamtools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule bamtools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/bamtools/bamtools.py\", line 26, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'jellyfish' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule jellyfish raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/jellyfish/jellyfish.py\", line 18, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'vcftools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule vcftools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/vcftools/vcftools.py\", line 19, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'longranger' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule longranger raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/longranger/longranger.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'stacks' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule stacks raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/stacks/stacks.py\", line 19, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'varscan2' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule varscan2 raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/varscan2/varscan2.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'bbmap' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule bbmap raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/bbmap/bbmap.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'bismark' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule bismark raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/bismark/bismark.py\", line 67, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'biscuit' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule biscuit raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/biscuit/biscuit.py\", line 30, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'hicexplorer' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule hicexplorer raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/hicexplorer/hicexplorer.py\", line 17, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'hicup' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule hicup raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/hicup/hicup.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'hicpro' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule hicpro raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/hicpro/hicpro.py\", line 26, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'salmon' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule salmon raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/salmon/salmon.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'kallisto' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule kallisto raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/kallisto/kallisto.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'slamdunk' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule slamdunk raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/slamdunk/slamdunk.py\", line 26, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'star' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule star raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/star/star.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'hisat2' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule hisat2 raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/hisat2/hisat2.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'tophat' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule tophat raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/tophat/tophat.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'bowtie2' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule bowtie2 raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/bowtie2/bowtie2.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'bowtie1' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule bowtie1 raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/bowtie1/bowtie1.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'snpsplit' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule snpsplit raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/snpsplit/snpsplit.py\", line 18, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'kat' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule kat raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/kat/kat.py\", line 18, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'leehom' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule leehom raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/leehom/leehom.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'adapterRemoval' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule adapterRemoval raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/adapterRemoval/adapterRemoval.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'clipandmerge' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule clipandmerge raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/clipandmerge/clipandmerge.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'cutadapt' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule cutadapt raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/cutadapt/cutadapt.py\", line 28, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'flexbar' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule flexbar raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/flexbar/flexbar.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'kaiju' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule kaiju raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/kaiju/kaiju.py\", line 20, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'kraken' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule kraken raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/kraken/kraken.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'malt' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule malt raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/malt/malt.py\", line 20, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'trimmomatic' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule trimmomatic raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/trimmomatic/trimmomatic.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'sickle' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule sickle raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/sickle/sickle.py\", line 17, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'skewer' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule skewer raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/skewer/skewer.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'sortmerna' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule sortmerna raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/sortmerna/sortmerna.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'biobloomtools' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule biobloomtools raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/biobloomtools/biobloomtools.py\", line 20, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'fastq_screen' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule fastq_screen raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/fastq_screen/fastq_screen.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'afterqc' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule afterqc raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/afterqc/afterqc.py\", line 26, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'fastp' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule fastp raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/fastp/fastp.py\", line 26, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'fastqc' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule fastqc raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/fastqc/fastqc.py\", line 36, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'pychopper' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule pychopper raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/pychopper/pychopper.py\", line 21, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'pycoqc' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule pycoqc raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/pycoqc/pycoqc.py\", line 19, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'minionqc' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule minionqc raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/minionqc/minionqc.py\", line 22, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'multivcfanalyzer' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule multivcfanalyzer raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/multivcfanalyzer/multivcfanalyzer.py\", line 25, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'clusterflow' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule clusterflow raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/clusterflow/clusterflow.py\", line 28, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'bcl2fastq' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule bcl2fastq raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/bcl2fastq/bcl2fastq.py\", line 18, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'interop' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule interop raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/interop/interop.py\", line 14, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'ivar' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule ivar raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/ivar/ivar.py\", line 23, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'flash' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule flash raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/flash/flash.py\", line 26, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'seqyclean' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule seqyclean raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/seqyclean/seqyclean.py\", line 18, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[ERROR ]\u001b[0m multiqc : \u001b[31mOops! The 'optitype' MultiQC module broke... \n Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues \n If possible, please include a log file that triggers the error - the last file found was:\n None\n============================================================\nModule optitype raised an exception: Traceback (most recent call last):\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/multiqc.py\", line 594, in run\n output = mod()\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/optitype/optitype.py\", line 24, in __init__\n super(MultiqcModule, self).__init__(\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/modules/base_module.py\", line 45, in __init__\n config.update({anchor: mod_cust_config.get(\"custom_config\", {})})\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 250, in update\n return update_dict(globals(), u)\n File \"/home/azureuser/mambaforge/lib/python3.10/site-packages/multiqc/utils/config.py\", line 256, in update_dict\n if isinstance(val, collections.Mapping):\nAttributeError: module 'collections' has no attribute 'Mapping'\n============================================================\u001b[0m\n\u001b[1;30m[WARNING]\u001b[0m multiqc : \u001b[33mNo analysis results found. Cleaning up..\u001b[0m\n\u001b[1;30m[INFO ]\u001b[0m multiqc : MultiQC complete\n"
- }
- ],
- "execution_count": 25,
- "metadata": {
- "gather": {
- "logged": 1682517201690
- }
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 8: Index the Transcriptome so that Trimmed Reads Can Be Mapped Using Salmon"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "! salmon index -t data/reference/M_chelonae_transcripts.fasta -p 8 -i data/reference/transcriptome_index --decoys data/reference/decoys.txt -k 31 --keepDuplicates"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "Version Info: ### PLEASE UPGRADE SALMON ###\n### A newer version of salmon with important bug fixes and improvements is available. ####\n###\nThe newest version, available at https://github.com/COMBINE-lab/salmon/releases\ncontains new features, improvements, and bug fixes; please upgrade at your\nearliest convenience.\n###\nSign up for the salmon mailing list to hear about new versions, features and updates at:\nhttps://oceangenomics.com/subscribe\n###index [\"data/reference/transcriptome_index\"] did not previously exist . . . creating it\n[2023-04-26 13:54:40.001] [jLog] [info] building index\nout : data/reference/transcriptome_index\n\u001b[00m[2023-04-26 13:54:40.023] [puff::index::jointLog] [info] Running fixFasta\n\u001b[00m\n[Step 1 of 4] : counting k-mers\n\n\u001b[35m[2023-04-26 13:54:40.424] [puff::index::jointLog] [warning] There were 2 transcripts that would need to be removed to avoid duplicates.\n\u001b[00m\u001b[00m[2023-04-26 13:54:40.454] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides\n\u001b[00m\u001b[00m[2023-04-26 13:54:40.454] [puff::index::jointLog] [info] Clipped poly-A tails from 0 transcripts\n\u001b[00mwrote 4868 cleaned references\n\u001b[00m[2023-04-26 13:54:40.706] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers\n\u001b[00m\u001b[00m[2023-04-26 13:54:40.820] [puff::index::jointLog] [info] ntHll estimated 4966944 distinct k-mers, setting filter size to 2^27\n\u001b[00mThreads = 8\nVertex length = 31\nHash functions = 5\nFilter size = 134217728\nCapacity = 2\nFiles: \ndata/reference/transcriptome_index/ref_k31_fixed.fa\n--------------------------------------------------------------------------------\nRound 0, 0:134217728\nPass\tFilling\tFiltering\n1\t1\t3\t\n2\t0\t0\nTrue junctions count = 10131\nFalse junctions count = 7124\nHash table size = 17255\nCandidate marks count = 34884\n--------------------------------------------------------------------------------\nReallocating bifurcations time: 0\nTrue marks count: 21319\nEdges construction time: 22\n--------------------------------------------------------------------------------\nDistinct junctions = 10131\n\nallowedIn: 13\nMax Junction ID: 10174\nseen.size():81401 kmerInfo.size():10175\napproximateContigTotalLength: 4947048\ncounters for complex kmers:\n(prec>1 & succ>1)=8 | (succ>1 & isStart)=0 | (prec>1 & isEnd)=0 | (isStart & isEnd)=2\ncontig count: 10353 element count: 5328123 complex nodes: 10\n# of ones in rank vector: 10352\n\u001b[00m[2023-04-26 13:55:08.167] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file.\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.167] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory data/reference/transcriptome_index\n\u001b[00msize = 5328123\n-----------------------------------------\n| Loading contigs | Time = 35.212 ms\n-----------------------------------------\nsize = 5328123\n-----------------------------------------\n| Loading contig boundaries | Time = 24.882 ms\n-----------------------------------------\nNumber of ones: 10352\nNumber of ones per inventory item: 512\nInventory entries filled: 21\n10352\n\u001b[00m[2023-04-26 13:55:08.237] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.237] [puff::index::jointLog] [info] contig count for validation: 10,352\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.261] [puff::index::jointLog] [info] Total # of Contigs : 10,352\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.261] [puff::index::jointLog] [info] Total # of numerical Contigs : 10,352\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.261] [puff::index::jointLog] [info] Total # of contig vec entries: 16,484\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.261] [puff::index::jointLog] [info] bits per offset entry 15\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.261] [puff::index::jointLog] [info] Done constructing the contig vector. 10353\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.389] [puff::index::jointLog] [info] # segments = 10,352\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.389] [puff::index::jointLog] [info] total length = 5,328,123\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.390] [puff::index::jointLog] [info] Reading the reference files ...\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.579] [puff::index::jointLog] [info] positional integer width = 23\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.579] [puff::index::jointLog] [info] seqSize = 5,328,123\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.579] [puff::index::jointLog] [info] rankSize = 5,328,123\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.579] [puff::index::jointLog] [info] edgeVecSize = 0\n\u001b[00m\u001b[00m[2023-04-26 13:55:08.579] [puff::index::jointLog] [info] num keys = 5,017,563\n\u001b[00mfor info, total work write each : 2.331 total work inram from level 3 : 4.322 total work raw : 25.000 \n[Building BooPHF] 100 % elapsed: 0 min 3 sec remaining: 0 min 0 sec\nBitarray 26296128 bits (100.00 %) (array + ranks )\nfinal hash 0 bits (0.00 %) (nb in final hash 0)\n\u001b[00m[2023-04-26 13:55:11.448] [puff::index::jointLog] [info] mphf size = 3.13474 MB\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk size = 666,016\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 0 = [0, 666,016)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 1 = [666,016, 1,332,032)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 2 = [1,332,032, 1,998,048)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 3 = [1,998,048, 2,664,064)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 4 = [2,664,064, 3,330,080)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 5 = [3,330,080, 3,996,096)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 6 = [3,996,096, 4,662,112)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.453] [puff::index::jointLog] [info] chunk 7 = [4,662,112, 5,328,093)\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.635] [puff::index::jointLog] [info] finished populating pos vector\n\u001b[00m\u001b[00m[2023-04-26 13:55:11.635] [puff::index::jointLog] [info] writing index components\n\u001b[00m\u001b[00m[2023-04-26 13:55:12.061] [puff::index::jointLog] [info] finished writing dense pufferfish index\n\u001b[00m[2023-04-26 13:55:12.184] [jLog] [info] done building index\n"
- }
- ],
- "execution_count": 28,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 9: Run Salmon to Map Reads to Transcripts and Quantify Expression Levels\n",
- "Salmon aligns the trimmed reads to the reference transcriptome and generates the read counts per transcript. In this analysis, each gene has a single transcript."
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "%%bash\n",
- "salmon quant -i data/reference/transcriptome_index -l SR -r data/trimmed/SRR13349122_1_trimmed.fastq -p 8 --validateMappings -o data/quants/SRR13349122_quant\n",
- "salmon quant -i data/reference/transcriptome_index -l SR -r data/trimmed/SRR13349128_1_trimmed.fastq -p 8 --validateMappings -o data/quants/SRR13349128_quant"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stderr",
- "text": "Version Info: ### PLEASE UPGRADE SALMON ###\n### A newer version of salmon with important bug fixes and improvements is available. ####\n###\nThe newest version, available at https://github.com/COMBINE-lab/salmon/releases\ncontains new features, improvements, and bug fixes; please upgrade at your\nearliest convenience.\n###\nSign up for the salmon mailing list to hear about new versions, features and updates at:\nhttps://oceangenomics.com/subscribe\n###### salmon (selective-alignment-based) v1.5.1\n### [ program ] => salmon \n### [ command ] => quant \n### [ index ] => { data/reference/transcriptome_index }\n### [ libType ] => { SR }\n### [ unmatedReads ] => { data/trimmed/SRR13349122_1_trimmed.fastq }\n### [ threads ] => { 8 }\n### [ validateMappings ] => { }\n### [ output ] => { data/quants/SRR13349122_quant }\nLogs will be written to data/quants/SRR13349122_quant/logs\n[2023-04-26 14:00:23.857] [jointLog] [info] setting maxHashResizeThreads to 8\n[2023-04-26 14:00:23.857] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.\n[2023-04-26 14:00:23.857] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65\n[2023-04-26 14:00:23.857] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.\n[2023-04-26 14:00:23.857] [jointLog] [info] parsing read library format\n[2023-04-26 14:00:23.857] [jointLog] [info] There is 1 library.\n[2023-04-26 14:00:24.001] [jointLog] [info] Loading pufferfish index\n[2023-04-26 14:00:24.029] [jointLog] [info] Loading dense pufferfish index.\n-----------------------------------------\n| Loading contig table | Time = 49.107 ms\n-----------------------------------------\nsize = 10353\n-----------------------------------------\n| Loading contig offsets | Time = 25.831 ms\n-----------------------------------------\n-----------------------------------------\n| Loading reference lengths | Time = 4.4126 ms\n-----------------------------------------\n-----------------------------------------\n| Loading mphf table | Time = 124.39 ms\n-----------------------------------------\nsize = 5328123\nNumber of ones: 10352\nNumber of ones per inventory item: 512\nInventory entries filled: 21\n-----------------------------------------\n| Loading contig boundaries | Time = 56.826 ms\n-----------------------------------------\nsize = 5328123\n-----------------------------------------\n| Loading sequence | Time = 86.304 ms\n-----------------------------------------\nsize = 5017563\n-----------------------------------------\n| Loading positions | Time = 283.15 ms\n-----------------------------------------\nsize = 9684800\n-----------------------------------------\n| Loading reference sequence | Time = 126.8 ms\n-----------------------------------------\n-----------------------------------------\n| Loading reference accumulative lengths | Time = 8.7525 ms\n-----------------------------------------\n\n\n\n\n\n\n\n\n\n\n\n\n[2023-04-26 14:00:24.865] [jointLog] [info] done\n[2023-04-26 14:00:24.865] [jointLog] [info] Index contained 4,868 targets\n[2023-04-26 14:00:24.867] [jointLog] [info] Number of decoys : 1\n[2023-04-26 14:00:24.867] [jointLog] [info] First decoy index : 4,867 \n[2023-04-26 14:00:25.119] [jointLog] [info] Thread saw mini-batch with a maximum of 0.28% zero probability fragments\n[2023-04-26 14:00:25.119] [jointLog] [info] Thread saw mini-batch with a maximum of 0.30% zero probability fragments\n[2023-04-26 14:00:25.120] [jointLog] [info] Thread saw mini-batch with a maximum of 0.10% zero probability fragments\n[2023-04-26 14:00:25.120] [jointLog] [info] Thread saw mini-batch with a maximum of 0.32% zero probability fragments\n[2023-04-26 14:00:25.126] [jointLog] [info] Thread saw mini-batch with a maximum of 0.41% zero probability fragments\n[2023-04-26 14:00:25.138] [jointLog] [info] Computed 1,145 rich equivalence classes for further processing\n[2023-04-26 14:00:25.138] [jointLog] [info] Counted 2,816 total reads in the equivalence classes \n[2023-04-26 14:00:25.141] [jointLog] [info] Number of mappings discarded because of alignment score : 191\n[2023-04-26 14:00:25.141] [jointLog] [info] Number of fragments entirely discarded because of alignment score : 262\n[2023-04-26 14:00:25.141] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 227\n[2023-04-26 14:00:25.141] [jointLog] [info] Number of fragments discarded because they have only dovetail (discordant) mappings to valid targets : 0\n[2023-04-26 14:00:25.142] [jointLog] [warning] Only 2816 fragments were mapped, but the number of burn-in fragments was set to 5000000.\nThe effective lengths have been computed using the observed mappings.\n\n[2023-04-26 14:00:25.142] [jointLog] [info] Mapping rate = 5.64906%\n\n[2023-04-26 14:00:25.142] [jointLog] [info] finished quantifyLibrary()\n[2023-04-26 14:00:25.197] [jointLog] [info] Starting optimizer\n[2023-04-26 14:00:25.201] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate\n[2023-04-26 14:00:25.201] [jointLog] [info] iteration = 0 | max rel diff. = 0.989424\n[2023-04-26 14:00:25.254] [jointLog] [info] iteration = 100 | max rel diff. = 0\n[2023-04-26 14:00:25.255] [jointLog] [info] Finished optimizer\n[2023-04-26 14:00:25.255] [jointLog] [info] writing output \n\nVersion Info: ### PLEASE UPGRADE SALMON ###\n### A newer version of salmon with important bug fixes and improvements is available. ####\n###\nThe newest version, available at https://github.com/COMBINE-lab/salmon/releases\ncontains new features, improvements, and bug fixes; please upgrade at your\nearliest convenience.\n###\nSign up for the salmon mailing list to hear about new versions, features and updates at:\nhttps://oceangenomics.com/subscribe\n###### salmon (selective-alignment-based) v1.5.1\n### [ program ] => salmon \n### [ command ] => quant \n### [ index ] => { data/reference/transcriptome_index }\n### [ libType ] => { SR }\n### [ unmatedReads ] => { data/trimmed/SRR13349128_1_trimmed.fastq }\n### [ threads ] => { 8 }\n### [ validateMappings ] => { }\n### [ output ] => { data/quants/SRR13349128_quant }\nLogs will be written to data/quants/SRR13349128_quant/logs\n[2023-04-26 14:00:26.693] [jointLog] [info] setting maxHashResizeThreads to 8\n[2023-04-26 14:00:26.693] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.\n[2023-04-26 14:00:26.693] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65\n[2023-04-26 14:00:26.693] [jointLog] [info] Setting consensusSlack to selective-alignment default of 0.35.\n[2023-04-26 14:00:26.693] [jointLog] [info] parsing read library format\n[2023-04-26 14:00:26.694] [jointLog] [info] There is 1 library.\n-----------------------------------------\n| Loading contig table | Time = 1.626 ms\n-----------------------------------------\nsize = 10353\n-----------------------------------------\n| Loading contig offsets | Time = 137.1 us\n-----------------------------------------\n-----------------------------------------\n| Loading reference lengths | Time = 18.9 us\n-----------------------------------------\n-----------------------------------------\n| Loading mphf table | Time = 1.0612 ms\n-----------------------------------------\nsize = 5328123\nNumber of ones: 10352\nNumber of ones per inventory item: 512\n[2023-04-26 14:00:26.761] [jointLog] [info] Loading pufferfish index\n[2023-04-26 14:00:26.761] [jointLog] [info] Loading dense pufferfish index.\nInventory entries filled: 21\n-----------------------------------------\n| Loading contig boundaries | Time = 9.9375 ms\n-----------------------------------------\nsize = 5328123\n-----------------------------------------\n| Loading sequence | Time = 1.0925 ms\n-----------------------------------------\nsize = 5017563\n-----------------------------------------\n| Loading positions | Time = 11.603 ms\n-----------------------------------------\nsize = 9684800\n-----------------------------------------\n| Loading reference sequence | Time = 2.755 ms\n-----------------------------------------\n-----------------------------------------\n| Loading reference accumulative lengths | Time = 34.3 us\n-----------------------------------------\n[2023-04-26 14:00:26.790] [jointLog] [info] done\n[2023-04-26 14:00:26.790] [jointLog] [info] Index contained 4,868 targets\n[2023-04-26 14:00:26.791] [jointLog] [info] Number of decoys : 1\n[2023-04-26 14:00:26.791] [jointLog] [info] First decoy index : 4,867 \n\n\n\n\n\n\n\n\n\n\n\n\n[2023-04-26 14:00:27.121] [jointLog] [info] Thread saw mini-batch with a maximum of 0.22% zero probability fragments\n[2023-04-26 14:00:27.121] [jointLog] [info] Thread saw mini-batch with a maximum of 0.18% zero probability fragments\n[2023-04-26 14:00:27.121] [jointLog] [info] Thread saw mini-batch with a maximum of 0.14% zero probability fragments\n[2023-04-26 14:00:27.121] [jointLog] [info] Thread saw mini-batch with a maximum of 0.26% zero probability fragments\n[2023-04-26 14:00:27.123] [jointLog] [info] Thread saw mini-batch with a maximum of 0.20% zero probability fragments\n[2023-04-26 14:00:27.128] [jointLog] [info] Thread saw mini-batch with a maximum of 0.14% zero probability fragments\n[2023-04-26 14:00:27.138] [jointLog] [info] Computed 850 rich equivalence classes for further processing\n[2023-04-26 14:00:27.138] [jointLog] [info] Counted 1,906 total reads in the equivalence classes \n[2023-04-26 14:00:27.142] [jointLog] [info] Number of mappings discarded because of alignment score : 81\n[2023-04-26 14:00:27.142] [jointLog] [info] Number of fragments entirely discarded because of alignment score : 160\n[2023-04-26 14:00:27.142] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 151\n[2023-04-26 14:00:27.142] [jointLog] [info] Number of fragments discarded because they have only dovetail (discordant) mappings to valid targets : 0\n[2023-04-26 14:00:27.142] [jointLog] [warning] Only 1906 fragments were mapped, but the number of burn-in fragments was set to 5000000.\nThe effective lengths have been computed using the observed mappings.\n\n[2023-04-26 14:00:27.142] [jointLog] [info] Mapping rate = 3.82493%\n\n[2023-04-26 14:00:27.142] [jointLog] [info] finished quantifyLibrary()\n[2023-04-26 14:00:27.182] [jointLog] [info] Starting optimizer\n[2023-04-26 14:00:27.187] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate\n[2023-04-26 14:00:27.187] [jointLog] [info] iteration = 0 | max rel diff. = 0.996302\n[2023-04-26 14:00:27.234] [jointLog] [info] iteration = 100 | max rel diff. = 0\n[2023-04-26 14:00:27.235] [jointLog] [info] Finished optimizer\n[2023-04-26 14:00:27.235] [jointLog] [info] writing output \n\n"
- }
- ],
- "execution_count": 40,
- "metadata": {
- "scrolled": true,
- "tags": []
- }
- },
- {
- "cell_type": "code",
- "source": [
- "ls data/quants/"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "\u001b[0m\u001b[34;42mSRR13349122_quant\u001b[0m/ \u001b[34;42mSRR13349128_quant\u001b[0m/\r\n"
- }
- ],
- "execution_count": 41,
- "metadata": {
- "jupyter": {
- "source_hidden": false,
- "outputs_hidden": false
- },
- "nteract": {
- "transient": {
- "deleting": false
- }
- },
- "gather": {
- "logged": 1682518630201
- }
- }
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 10: Report the top 10 most highly expressed genes in the samples"
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "Top 10 most highly expressed genes in the wild-type sample.\n"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "! sort -nrk 4,4 data/quants/SRR13349122_quant/quant.sf | head -10"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "BB28_RS23830\t213\t10.625\t48612.291220\t5.000\r\nBB28_RS02220\t204\t9.377\t33047.563397\t3.000\r\nBB28_RS05530\t180\t6.996\t29531.286140\t2.000\r\nBB28_RS18945\t222\t12.150\t25504.663975\t3.000\r\nBB28_RS11370\t195\t8.348\t24748.475090\t2.000\r\nBB28_RS12480\t207\t9.766\t21154.305555\t2.000\r\nBB28_RS18745\t300\t51.326\t20125.718383\t10.000\r\nBB28_RS20695\t231\t14.032\t14723.212476\t2.000\r\nBB28_RS19155\t282\t36.744\t14056.208165\t5.000\r\nBB28_RS18020\t189\t7.759\t13312.711241\t1.000\r\nsort: write failed: 'standard output': Broken pipe\r\nsort: write error\r\n"
- }
- ],
- "execution_count": 42,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "Top 10 most highly expressed genes in the double lysogen sample.\n"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "!sort -nrk 4,4 data/quants/SRR13349128_quant/quant.sf | head -10"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "BB28_RS18025\t177\t6.769\t47953.929601\t2.000\r\nBB28_RS02220\t204\t9.377\t34613.921846\t2.000\r\nBB28_RS13585\t243\t17.264\t28200.832626\t3.000\r\nBB28_RS01170\t225\t12.734\t25489.885138\t2.000\r\nBB28_RS20695\t231\t14.032\t23131.574929\t2.000\r\nBB28_RS19045\t183\t7.236\t22428.250651\t1.000\r\nBB28_RS04995\t192\t8.045\t20173.388438\t1.000\r\nBB28_RS14885\t195\t8.348\t19441.110656\t1.000\r\nBB28_RS18745\t300\t51.326\t18971.657043\t6.000\r\nBB28_RS23535\t201\t9.012\t18007.533576\t1.000\r\nsort: write failed: 'standard output': Broken pipe\r\nsort: write error\r\n"
- }
- ],
- "execution_count": 43,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "### STEP 11: Report the expression of a putative acyl-ACP desaturase (BB28_RS16545) that was downregulated in the double lysogen relative to wild-type\n",
- "A acyl-transferase was reported to be downregulated in the double lysogen as shown in the table of the top 20 upregulated and downregulated genes from the paper describing the study."
- ],
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "Use `grep` to report the expression in the wild-type sample. The fields in the Salmon `quant.sf` file are as follows. The level of expression is reported in the Transcripts Per Million (`TPM`) and number of reads (`NumReads`) fields: \n",
- "`Name Length EffectiveLength TPM NumReads`"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "!grep 'BB28_RS16545' data/quants/SRR13349122_quant/quant.sf"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "BB28_RS16545\t987\t737.000\t560.631139\t4.000\r\n"
- }
- ],
- "execution_count": 44,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "Use `grep` to report the expression in the double lysogen sample. The fields in the Salmon `quant.sf` file are as follows. The level of expression is reported in the Transcripts Per Million (`TPM`) and number of reads (`NumReads`) fields: \n",
- "`Name Length EffectiveLength TPM NumReads`"
- ],
- "metadata": {}
- },
- {
- "cell_type": "code",
- "source": [
- "!grep 'BB28_RS16545' data/quants/SRR13349128_quant/quant.sf"
- ],
- "outputs": [
- {
- "output_type": "stream",
- "name": "stdout",
- "text": "BB28_RS16545\t987\t737.000\t220.201284\t1.000\r\n"
- }
- ],
- "execution_count": 45,
- "metadata": {}
- },
- {
- "cell_type": "markdown",
- "source": [
- "### That's it! "
- ],
- "metadata": {}
- }
- ],
- "metadata": {
- "kernelspec": {
- "name": "python3",
- "language": "python",
- "display_name": "Python 3 (ipykernel)"
- },
- "language_info": {
- "name": "python",
- "version": "3.8.13",
- "mimetype": "text/x-python",
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "pygments_lexer": "ipython3",
- "nbconvert_exporter": "python",
- "file_extension": ".py"
- },
- "microsoft": {
- "ms_spell_check": {
- "ms_spell_check_language": "en"
- }
- },
- "kernel_info": {
- "name": "python3"
- },
- "nteract": {
- "version": "nteract-front-end@1.0.0"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
\ No newline at end of file