This R script is used to identify statistically significant spatial features i.e. positive or negative cell-cell colocalizations using the colocation quotient (CLQ) analysis. Here we describe how to calculate the CLQ, and create a null distribution of CLQ values and normalize the data. The normalization process considers the number of cells within each subpopulation. Subpopulations with a low cell count were more likely to yield a broader distribution of CLQ values during the permutation analysis. This broader distribution resulted from the substantial impact of random label sampling on CLQ value calculations.
-
get_CLQ()The colocation quotient (CLQ) quantifies how a cell subpopulation colocates spatially with another cell subpopulation among a set of nearest neighbors, defined here as 20. We calculated the colocation quotient for the pairwise cell types identified with CELESTA (Zhang et al., 2022, Nature Methods) under naïve and treatment conditions using the following equation: CLQb→a = (Cb→a/Na) / (Nb/(N− 1)) where C is the number of cells of cell type b among the defined nearest neighbors of cell type a, N is the total number of cells and Na and Nb are the numbers of cells for cell type a and cell type b. -
KNN_neighborsFunction to find N-nearest neighboring cells -
find_cell_type_neighborsThis step intends to find the cell types for neighboring cells -
CLQ_permutated_matrix_gen1This function intends to assess the significance of the CLQ values obtained by randomly permuting 500 times the cell labels (cell types) while preserving the subpopulation proportions. -
write_countsThis function intends to count the number of cells for each subpopulation. It generates a summary table with cell type number, corresponding names and the cell counts in the sample. -
CLQ_matrix_genThis function will read the CELESTA cell assignment file, and will generate the original CLQ matrix. -
CLQ_permutated_matrix_gen2: This function retrieves the output of the permutated matrix of each sample. -
read_countsThis function retrieves the output of the subpopulation counts for each sample. -
significance_matrix_genThis function identifies statistically significant CLQ values. The CLQ values falling outside or at the tail of the distribution generated by the permutation analysis are considered significant, whereas values within the distribution are deemed non-significant, as they can be reproduced after spatial randomization. Percentile values < 0.05 or > 0.95 are considered as significant. The normalization achieved through the permutation analysis facilitates not only spatial feature comparisons but also enables the comparison of different conditions from the same, or independent experiments. CLQs were normalized according to the following formula: (Observed CLQ - Mean CLQ)/(Max CLQ – Mean CLQ).
-plot_gen This function plots the distribution of all the permutation
CLQ values for each cell pair. The blue bar is the normalized CLQ value
and the red bar is the original CLQ value.
-CLQ_normalization_by_sample This function requires (1) a named vector
with the original CLQ values for one sample before normalization, each
element need to have a name, which is the two cell types in the cell
pair, connected by “_“, (2) A cell count file with the number of cells
for each cell type in that sample, (3) Number of nearest neighbors in
the CLQ calculation, (4) A threshold value cell count for rare cell
populations, default is 5, (5) CELESTA input prior cell type signature
matrix and (6) Clipping parameters, default to 0.05. but a warning
message will suggest clipping more as needed. The original CLQ
distribution is bell-shaped, but is skewed on the rail. The clipping
parameter allows for better visualization when normalizing the data.
See example.R for a full run.
The spatial permutation analysis requires two inputs:
1. CELESTA cell subpopulations:
A dataframe with one column
named cell_types with all the user-defined CELESTA cell subpopulations.
See file example: “cell_types_celesta.csv”
2. Segmented imaging data with CELESTA cell assignment:
The
_cell_type_assigment.csv output dataframe from the CELESTA algorithm
available to download at https://github.com/plevritis-lab/CELESTA.
See file example: “TAFs1_cell_type_assignment.csv”
Spatial permutation outputs: 1. After running the write_counts()
function, the script will output a .csv file with the number of cells
for each cell subpopulation.
See file example: “TAFs1_CellCounts.csv”
- After running the
CLQ_matrix_genfunction, the script will output a .csv file with the original CLQ values for each cell pair.
See file example: “TAFs1_CLQ.csv”
- After running the
CLQ_permutated_matrix_genfunction, the script will output a .csv file of 500 CLQ values obtained by randomly permuting 500 times the cell labels (cell subpopulations) while preserving the proportions. These values will be plotted in
See file example: “TAFs1_CLQ_Permutated.csv”
- After running the
significance_matrix_genfunction, the script will output a .csv file with the script will output a .csv file with the sample name, the identity of and count of each cell subpopulation, the original CLQ value, the percentile and if the value is deemed significant.
Note that the original CLQ of value zero smay be caused by insufficient cell numbers of respective cell types. These are filtered out in the post-process prior to colocatome generation.
See file example: “TAFs1_CLQ_data_full.csv” and .png images in the PA_figures_TAFs1 folder.
5.After running the CLQ_normalization_by_sample functions, the cript
will output a .csv file with normalized values.
See file examples: “TAFs1_CLQ_Normalized_L0_R0.05”. L0 = left clipping parameter at 0 (no clipping) and R0.05 = clipping parameter at 0.05.
Note that the folder contains only a subset of the distribution plots.
Core publications and existing software releases from the authors of the the Co-Location Quotient (CLQ) can be found here: https://seg.gmu.edu/
If you encounter a bug, please file an issue with a minimal reproducible example on GitHub. For questions and other discussion, please use community.rstudio.com.