commot.tl.communication_deg_detection

commot.tl.communication_deg_detection(adata, n_var_genes=None, var_genes=None, database_name=None, pathway_name=None, summary='receiver', lr_pair=('total', 'total'), nknots=6, n_deg_genes=None, n_points=50, deg_pvalue_cutoff=0.05)

Identify signaling dependent genes

This function depends on tradeSeq [Van_den_Berge2020]. Currently, tradeSeq version 1.0.1 with R version 3.6.3 has been tested to work. For the R-python interface, rpy2==3.4.2 and anndata2ri==1.0.6 have been tested to work.

Here, the total received or sent signal for the spots are considered as a “gene expression” where tradeSeq is used to find the correlated genes.

Parameters

adata (AnnData) – The data matrix of shape n_obs × n_var. Rows correspond to cells or positions and columns to genes. The count data should be available through adata.layers[‘count’]. For example, when examining the received signal through the ligand-receptor pair “ligA” and “RecA” infered with the LR database “databaseX”, the signaling inference result should be available in adata.obsm['commot-databaseX-sum-receiver']['r-ligA-recA']
n_var_genes (Optional[int]) – The number of most variable genes to test.
var_genes – The genes to test. n_var_genes will be ignored if given.
n_deg_genes (Optional[int]) – The number of top deg genes to evaluate yhat.
pathway_name (Optional[str]) – Name of the signaling pathway (choose from the third column of .uns['commot-databaseX-info']['df_ligrec']). If pathway_name is specified, lr_pair will be ignored.
summary (str) – ‘sender’ or ‘receiver’
lr_pair (tuple) – A tuple of the ligand-receptor pair. If pathway_name is specified, lr_pair will be ignored.
nknots (int) – Number of knots in spline when constructing GAM.
n_points (int) – Number of points on which to evaluate the fitted GAM for downstream clustering and visualization.
deg_pvalue_cutoff (float) – The p-value cutoff of genes for obtaining the fitted gene expression patterns.

Returns

df_deg (pd.DataFrame) – A data frame of deg analysis results, including Wald statistics, degree of freedom, and p-value.
df_yhat (pd.DataFrame) – A data frame of smoothed gene expression values.

References

Van_den_Berge2020: Van den Berge, K., Roux de Bézieux, H., Street, K., Saelens, W., Cannoodt, R., Saeys, Y., … & Clement, L. (2020). Trajectory-based differential expression analysis for single-cell sequencing data. Nature communications, 11(1), 1-13.