Combine and Compare Correlation Data¶

This tool uses 1.0 compute credits per hour.

Overview¶

This tool combines cell-cell correlation data generated by the Compare Neural Circuit Correlations Across States tool from multiple recordings. It focuses on specific states defined by the user. The tool calculates and compares several correlation metrics across recordings, states, and experimental groups:

New version changes

Input Format Change: This tool now requires correlation data in HDF5 (.h5) format, specifically the file generated by the Compare Neural Circuit Correlations Across States. This is a change from previous versions which have used CSV outputs.
Benefits of H5 Input: The H5 format stores the full raw correlation matrices for each state. This enables:
- Single-cell Level Analysis: Calculation and comparison of cell-specific statistics like the maximum, minimum, or mean correlation (statistic parameter).
- Detailed Average Correlations: Separate calculation and comparison of the average positive and average negative correlations across all cell pairs within a state.
For Users of the Previous Correlation Tool: If you previously used the Compare Neural Circuit Correlations Across States and saved its outputs, you can use this updated combine-and-compare tool by providing the *.h5 file generated by that tool as input. The previous CSV outputs are no longer the primary input for this combined analysis tool.

Parameters¶

Parameter	Required?	Default	Description
Group 1 Correlation Data Files	True	N/A	Select correlation data from the first group to use for analysis
Group 1 Name	True	group1	Name of the first group
Group 1 Color	True	tab:red	Color of the first group
Group 2 Correlation Data Files	False	N/A	Select correlation data from the second group to use for analysis
Group 2 Name	False	N/A	Name of the second group
Group 2 Color	True	tab:orange	Color of the second group
State Names	True	N/A	Names of analyzed states
State Colors	True	N/A	Colors of analyzed states
Comparison Type	False	N/A	Type of statistical test to perform
Multiple Comparison Correction method	True	N/A	Method for correcting for multiple comparisons
Effect Size Method	True	N/A	Method for calculating the effect size
Data Pairing	False	unpaired	Indicates whether observations should be paired for statistical comparison
Subject Matching Method	False	order	Method for matching subjects between groups in paired analysis
Significance Threshold	True	0.05	p-value threshold for classifying neurons as up- or down-modulated

Input Files¶

Source Parameter	File Type	File Format
Group 1 Correlation Data Files	correlation_data	h5
Group 2 Correlation Data Files	correlation_data	h5

The input files have the following requirements:

File Type: Input files must be HDF5 files (.h5) generated by the Compare Neural Circuit Correlations Across States.
Group Size: If data for a group is provided, that group must contain at least two H5 files.
State Matching: The State Names parameter provided to this tool must be a comma-separated list of strings that exactly match (case-insensitive) the names of the datasets within the input H5 files that you wish to analyze.
State Consistency: All input H5 files should contain datasets for the states specified in State Names.
State Colors: The number of state_colors provided must be equal to the number of State Names provided.

Additionally, each input H5 file is expected to contain top-level datasets where:

The name of each dataset corresponds to a state (e.g., "immobile", "mobile", "other").
The value of each dataset is a 2D NumPy array representing the cell-cell Pearson correlation matrix for that state. The diagonal elements are expected to be zero.

Algorithm Description¶

The workflow involves reading the correlation data, calculating derived metrics, performing statistical comparisons, and generating outputs. These steps are demonstrated in the following diagram.

graph TD subgraph Data Preparation A[Read H5 files for Group 1] --> B(Filter states based on user input); C[Read H5 files for Group 2 Optional] --> D(Filter states based on user input); B --> E{Calculate Metrics}; D --> E; E -- Average Correlations --> F[Calculate Avg Positive/Negative Correlations per recording/state]; E -- Cell Statistics --> G[Calculate Cell Max/Min/Mean Correlations per recording/state/cell]; end subgraph Statistical Analysis F --> H[Analyze Average Correlations ANOVA: RM / Mixed]; G --> I[Analyze Cell Correlations LMM for State, ANOVA for Group]; H --> J(Perform Pairwise Tests); I --> J; end subgraph Output Generation F --> K[Save Combined Average Correlation CSV]; G --> L[Save Combined Cell Statistic CSV]; J --> M[Save ANOVA & Pairwise Results CSVs]; F & G & J --> N[Generate Preview Plots CDFs, Comparison Plots]; end

Data Loading and Preparation¶

Read H5: For each group, the tool reads the provided H5 files (raw_correlations_h5.h5).
Filter States: It retains only the datasets (correlation matrices) corresponding to the State Names specified by the user.
Calculate Average Correlations: For each recording and state, it calculates the mean of positive off-diagonal correlations and the mean of negative off-diagonal correlations. This results in subject-level data.
Calculate Cell Statistics: For each recording, state, and cell, it calculates the user-specified statistic ("max", "min", or "mean") of that cell's correlations with all other cells using the measure_cells function. This results in cell-level data.

Statistical Comparisons¶

The tool uses functions from the pingouin package for statistical analysis.

Average Correlation Analysis:
- Single Group: Compares average positive/negative correlations across states using a one-way Repeated Measures ANOVA (pingouin.rm_anova).
- Two Groups (Paired): Compares average positive/negative correlations across states and groups using a two-way Repeated Measures ANOVA (pingouin.rm_anova).
- Two Groups (Unpaired): Compares average positive/negative correlations across states (within-subject factor) and groups (between-subject factor) using a Mixed ANOVA (pingouin.mixed_anova).
Cell-level Statistic Analysis:
- State Comparison: Uses Linear Mixed Models (LMM) (pingouin.linear_regression) to compare the cell-level statistic across states, accounting for within-subject variability and potentially group differences. This handles the nested structure (cells within subjects).
- Group Comparison: First, averages the cell-level statistic per subject/state/group. Then, compares these subject-level averages between groups using ANOVA (RM-ANOVA for paired, Mixed ANOVA for unpaired).
Pairwise Comparisons: Following significant ANOVA or LMM results, pairwise tests (pingouin.pairwise_tests) are performed to pinpoint differences between specific states or groups. The user selects the multiple comparison correction method and effect size calculation. Normality is checked, and non-parametric tests may be used if assumptions are violated.

Effect Size Calculation Options ¶

Method	Description
Cohen's d	A measure of effect size that expresses the difference between two means in standard deviation units.
Hedges' g	Similar to Cohen's d, but includes a correction for small sample sizes, providing a more accurate estimate of effect size.
r eta-squared (η²)	A measure of the proportion of variance in a dependent variable that is associated with one or more independent variables.
Odds ratio	A measure of association between two binary variables, representing the odds of an event occurring in one group versus another.
Area under the curve (AUC)	Typically refers to the AUC of a receiver operating characteristic (ROC) curve, used to evaluate the performance of a binary classifier by measuring its ability to distinguish between classes.
Common Language Effect Size	A probability-based effect size that expresses the likelihood that a randomly chosen score from one group will be higher than a randomly chosen score from another group.

Multiple Comparison Correction Options ¶

Method	Description
bonf (Bonferroni)	Adjusts the p-value threshold by dividing it by the number of comparisons. Controls Family-Wise Error Rate (FWER).
sidak (Sidak)	Similar to Bonferroni but slightly less conservative, assuming tests are independent. Controls FWER.
holm (Holm-Bonferroni)	A step-down method that is uniformly more powerful than Bonferroni. Controls FWER.
fdr_bh (Benjamini-Hochberg)	Controls the False Discovery Rate (FDR) by ranking p-values. Often preferred when many tests are performed.
fdr_by (Benjamini-Yekutieli)	Controls FDR under arbitrary dependency assumptions, more conservative than Benjamini-Hochberg.
none	No correction is applied. Increases the risk of Type I errors (false positives).

Outputs¶

Combination Outputs¶

Combined Data (CSV)¶

Two types of combined data files are generated per group:

Average Correlations: A csv file (e.g., group1_combined_average_correlation.csv) containing the calculated average positive and negative correlations for each recording and state.
- Columns: file, state, positive_correlation, negative_correlation, subject_id, group_name.
Cell Statistic Correlations: A csv file (e.g., group1_combined_max_correlation.csv) containing the calculated cell-level statistic (max, min, or mean) for each cell, state, and recording.
- Columns: file, state, cell, {statistic}_correlation (e.g., max_correlation), subject_id, group_name.

Example Average Correlation Data:

file	state	positive_correlation	negative_correlation	subject_id	group_name
rec1_correlations.h5	immobile	0.08	-0.07	1	group1
rec1_correlations.h5	mobile	0.09	-0.08	1	group1
rec2_correlations.h5	immobile	0.07	-0.06	2	group1
rec2_correlations.h5	mobile	0.10	-0.09	2	group1

Example Cell Statistic Data (statistic='max'):

file	state	cell	max_correlation	subject_id	group_name
rec1_correlations.h5	immobile	0	0.36	1	group1
rec1_correlations.h5	immobile	1	0.45	1	group1
rec1_correlations.h5	mobile	0	0.39	1	group1
rec1_correlations.h5	mobile	1	0.48	1	group1

Combination Previews (SVG)¶

For each group, Cumulative Distribution Function (CDF) plots are generated as previews for the combined data:

One plot for average positive correlations ({group_name}_avg_positive_correlation_cdf.svg).
One plot for average negative correlations ({group_name}_avg_negative_correlation_cdf.svg).
One plot for the cell-level statistic correlations ({group_name}_{statistic}_correlation_cdf.svg).

These plots show the cumulative distribution of the respective correlation values across all recordings in the group, with separate lines colored by state.

*Example: Max Correlation CDF for Group 1*

*Example: Max Correlation CDF for Group 2*

Statistical Comparison Outputs¶

Statistical Results (CSV)¶

Two CSV files summarize the statistical tests:

ANOVA Results (ANOVA_comparisons.csv): Contains summary results from the ANOVA (RM-ANOVA, Mixed ANOVA) and LMM tests performed. The exact columns depend on the specific test (pingouin function) used for each comparison type (average vs. cell-level, state vs. group). Key columns typically include:
- Comparison: The type of data being compared (e.g., "Average Correlation", "Max Correlation").
- Measure: The specific metric column name (e.g., "average_correlation", "max_correlation").
- analysis_level: The level at which the analysis was performed ("subject" or "cell").
- comparison_type: The nature of the comparison ("state", "group", "state-group").
- Source: The factor being tested (e.g., "state", "group_name", "Interaction").
- SS, MS: Sum of Squares, Mean Squares.
- DF1, DF2: Degrees of Freedom.
- F: F-statistic (for ANOVA).
- p-unc: Uncorrected p-value.
- np2: Partial eta-squared effect size (for ANOVA).
- eps: Greenhouse-Geisser epsilon (sphericity correction).
Pairwise Comparisons (pairwise_comparisons.csv): Contains detailed results from post-hoc pairwise tests. The exact columns depend on the specific test (pingouin.pairwise_tests with different parameters). Key columns typically include:
- Comparison, Measure, analysis_level, comparison_type: As above.
- Contrast: The factor levels being contrasted (e.g., "state", "group_name", "state * group_name").
- A, B: The specific levels/groups being compared.
- Paired: Boolean indicating if the test was paired.
- Parametric: Boolean indicating if a parametric test was used.
- T / U / W-val: Test statistic (t-test, Mann-Whitney U, Wilcoxon).
- dof: Degrees of Freedom.
- alternative: Hypothesis tested (e.g., "two-sided").
- p-unc: Uncorrected p-value.
- p-corr: Corrected p-value (if applicable).
- p-adjust: Correction method used.
- BF10: Bayes Factor.
- Effect size column (e.g., Cohen-d, CLES) based on user selection.

Note: The example tables below are illustrative and may not exactly match all columns generated by every possible test configuration.

Example ANOVA Output Snippet:

Comparison	Measure	analysis_level	comparison_type	Source	SS	DF1	DF2	MS	F	p-unc	np2	eps
Average Correlation	average_correlation	subject	state-group	state	0.10	2	4	0.05	271.23	0.00	0.99	0.59
Max Correlation	max_correlation	cell	state	state	...	...	...	...	...	...	...	...
Max Correlation	max_correlation	subject	group	group_name	...	...	...	...	...	...	...	...

Example Pairwise Output Snippet:

Comparison	Measure	analysis_level	comparison_type	Contrast	A	B	Paired	Parametric	T	dof	p-unc	p-corr	p-adjust	Cohen	BF10
Average Correlation	average_correlation	subject	state-group	state	immobile	mobile	TRUE	TRUE	0.64	3	0.57	1.0	bonf	0.51	0.50
Max Correlation	max_correlation	cell	state	state	immobile	mobile	...	...	...	...	...	...	...	...	...
Max Correlation	max_correlation	subject	group	group_name	group1	group2	FALSE	TRUE	...	...	...	...	...	...	...

Statistical Comparison Previews (SVG)¶

Several SVG plots visualize the data and statistical results:

Average Correlation Distribution (average_correlation_distribution.svg): Shows the distribution of average positive and negative correlations across states (and groups, if applicable) in separate panels. Points represent individual recordings (subjects), connected by lines if data is paired. Mean lines are shown for each group/state. Significance annotations from pairwise tests may be included.

*Example: Average Correlation Distribution*

{Statistic} Correlation State LMM ({statistic}_correlation_state_lmm.svg): Visualizes the cell-level statistic data (e.g., max correlation) compared across states (and groups). Often uses boxplots or similar representations showing the distribution per state/group. Significance annotations from the LMM pairwise comparisons may be included.

{Statistic} Correlation Group ANOVA ({statistic}_correlation_group_rm_anova.svg or ..._mixed_anova.svg): Shows the subject-averaged statistic data compared between groups across states. Uses plots similar to the State LMM plot (e.g., boxplots) but represents subject averages rather than individual cells. Significance annotations from the group ANOVA pairwise comparisons may be included.

*Example: Max Correlation Group RM ANOVA*