Missing information in the tutorial dataset. #4

ZixiangPAN · 2023-12-07T09:04:43Z

Hi, I found that the example dataset in the tutorial (the link below), does not have gene names in the h5ad object so that when running

dc.tl.trajectories( adata, dc.tl.TConfig("Healthy", "AVP", "MPO", "origin", "Healthy"), dc.tl.TConfig("AML1", "AVP", "CD68", "origin", "AML1"), )

dataset link:
https://github.com/azizilab/decipher_data/data_decipher_tutorial.h5ad

the inside code in ../decipher/tools/trajectory_inference.py , the function find_cluster_with_marker will filter all the cells so that the adata will be void.

`def find_cluster_with_marker(
adata,
marker,
subset_column=None,
subset_value=None,
subset_min_percent_per_cluster=0.3,
cluster_key="decipher_clusters",
min_cell_per_cluster=10,
):
"""Find the cluster enriched for a marker gene. Possibly subset the cells before.

Parameters
----------
adata : sc.AnnData
    The annotated data matrix.
marker : str
    The marker gene.
subset_column : str, optional
    The column in `adata.obs` to subset on.
subset_value : str, optional
    The value in subset_column to subset on.
subset_min_percent_per_cluster : float, default 0.3
    When subsetting the cells, each cluster must have at least this proportion of cells from
    the subset to not be discarded. This is useful to remove clusters with too few cells from
    the subset.
cluster_key : str, default "decipher_clusters"
    The key in `adata.obs` where the cluster information is stored.
min_cell_per_cluster : int, default 10
    The minimum number of cells per cluster to consider it.
"""
if subset_column is not None:
    adata = _subset_cells_and_clusters(
        adata,
        subset_column,
        subset_value,
        subset_min_percent_per_cluster=subset_min_percent_per_cluster,
        min_cell_per_cluster=min_cell_per_cluster,
        cluster_key=cluster_key,
    )
marker_data = pd.DataFrame(adata[:, marker].X.toarray())
marker_data["cluster"] = adata.obs[cluster_key].values
# get the proportion of cells in each cluster that are in the subset
marker_data = marker_data.groupby("cluster").mean()
marker_data = marker_data.sort_values(by=0, ascending=False)
return marker_data.index[0]`

please have a check, thank you.

Best

The text was updated successfully, but these errors were encountered:

ANazaret · 2024-04-18T15:33:47Z

Hello, can you give more details about the problem?
The example dataset does have gene names (in the attribute data.var_names).

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing information in the tutorial dataset. #4

Missing information in the tutorial dataset. #4

ZixiangPAN commented Dec 7, 2023

ANazaret commented Apr 18, 2024

Missing information in the tutorial dataset. #4

Missing information in the tutorial dataset. #4

Comments

ZixiangPAN commented Dec 7, 2023

ANazaret commented Apr 18, 2024