Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing information in the tutorial dataset. #4

Open
ZixiangPAN opened this issue Dec 7, 2023 · 1 comment
Open

Missing information in the tutorial dataset. #4

ZixiangPAN opened this issue Dec 7, 2023 · 1 comment

Comments

@ZixiangPAN
Copy link

Hi, I found that the example dataset in the tutorial (the link below), does not have gene names in the h5ad object so that when running

dc.tl.trajectories( adata, dc.tl.TConfig("Healthy", "AVP", "MPO", "origin", "Healthy"), dc.tl.TConfig("AML1", "AVP", "CD68", "origin", "AML1"), )

dataset link:
https://github.com/azizilab/decipher_data/data_decipher_tutorial.h5ad

the inside code in ../decipher/tools/trajectory_inference.py , the function find_cluster_with_marker will filter all the cells so that the adata will be void.

`def find_cluster_with_marker(
adata,
marker,
subset_column=None,
subset_value=None,
subset_min_percent_per_cluster=0.3,
cluster_key="decipher_clusters",
min_cell_per_cluster=10,
):
"""Find the cluster enriched for a marker gene. Possibly subset the cells before.

Parameters
----------
adata : sc.AnnData
    The annotated data matrix.
marker : str
    The marker gene.
subset_column : str, optional
    The column in `adata.obs` to subset on.
subset_value : str, optional
    The value in subset_column to subset on.
subset_min_percent_per_cluster : float, default 0.3
    When subsetting the cells, each cluster must have at least this proportion of cells from
    the subset to not be discarded. This is useful to remove clusters with too few cells from
    the subset.
cluster_key : str, default "decipher_clusters"
    The key in `adata.obs` where the cluster information is stored.
min_cell_per_cluster : int, default 10
    The minimum number of cells per cluster to consider it.
"""
if subset_column is not None:
    adata = _subset_cells_and_clusters(
        adata,
        subset_column,
        subset_value,
        subset_min_percent_per_cluster=subset_min_percent_per_cluster,
        min_cell_per_cluster=min_cell_per_cluster,
        cluster_key=cluster_key,
    )
marker_data = pd.DataFrame(adata[:, marker].X.toarray())
marker_data["cluster"] = adata.obs[cluster_key].values
# get the proportion of cells in each cluster that are in the subset
marker_data = marker_data.groupby("cluster").mean()
marker_data = marker_data.sort_values(by=0, ascending=False)
return marker_data.index[0]`

please have a check, thank you.

Best

@ANazaret
Copy link
Contributor

Hello, can you give more details about the problem?
The example dataset does have gene names (in the attribute data.var_names).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants