Combine and Compare Correlation Data¶
This tool uses 1.0 compute credits per hour.
Overview¶
This tool combines cell-cell correlation data generated by the Compare Neural Circuit Correlations Across States tool from multiple recordings. It focuses on specific states defined by the user. The tool calculates and compares several correlation metrics across recordings, states, and experimental groups:
New version changes
-
Input Format Change: This tool now requires correlation data in HDF5 (
.h5
) format, specifically the file generated by the Compare Neural Circuit Correlations Across States. This is a change from previous versions which have used CSV outputs. -
Benefits of H5 Input: The H5 format stores the full raw correlation matrices for each state. This enables:
- Single-cell Level Analysis: Calculation and comparison of cell-specific statistics like the maximum, minimum, or mean correlation (
statistic
parameter). - Detailed Average Correlations: Separate calculation and comparison of the average positive and average negative correlations across all cell pairs within a state.
- Single-cell Level Analysis: Calculation and comparison of cell-specific statistics like the maximum, minimum, or mean correlation (
-
For Users of the Previous Correlation Tool: If you previously used the Compare Neural Circuit Correlations Across States and saved its outputs, you can use this updated combine-and-compare tool by providing the
*.h5
file generated by that tool as input. The previous CSV outputs are no longer the primary input for this combined analysis tool.
Parameters¶
Parameter | Required? | Default | Description |
---|---|---|---|
Group 1 Correlation Data Files | True | N/A | Select correlation data from the first group to use for analysis |
Group 1 Name | True | group1 | Name of the first group |
Group 1 Color | True | tab:red | Color of the first group |
Group 2 Correlation Data Files | False | N/A | Select correlation data from the second group to use for analysis |
Group 2 Name | False | N/A | Name of the second group |
Group 2 Color | True | tab:orange | Color of the second group |
State Names | True | N/A | Names of analyzed states |
State Colors | True | N/A | Colors of analyzed states |
Comparison Type | False | N/A | Type of statistical test to perform |
Multiple Comparison Correction method | True | N/A | Method for correcting for multiple comparisons |
Effect Size Method | True | N/A | Method for calculating the effect size |
Data Pairing | False | unpaired | Indicates whether observations should be paired for statistical comparison |
Subject Matching Method | False | order | Method for matching subjects between groups in paired analysis |
Input Files¶
Source Parameter | File Type | File Format |
---|---|---|
Group 1 Correlation Data Files | correlation_data | h5 |
Group 2 Correlation Data Files | correlation_data | h5 |
The input files have the following requirements:
- File Type: Input files must be HDF5 files (
.h5
) generated by the Compare Neural Circuit Correlations Across States. - Group Size: If data for a group is provided, that group must contain at least two H5 files.
- State Matching: The
State Names
parameter provided to this tool must be a comma-separated list of strings that exactly match (case-insensitive) the names of the datasets within the input H5 files that you wish to analyze. - State Consistency: All input H5 files should contain datasets for the states specified in
State Names
. - State Colors: The number of
state_colors
provided must be equal to the number ofState Names
provided.
Additionally, each input H5 file is expected to contain top-level datasets where:
- The name of each dataset corresponds to a state (e.g., "immobile", "mobile", "other").
- The value of each dataset is a 2D NumPy array representing the cell-cell Pearson correlation matrix for that state. The diagonal elements are expected to be zero.
Algorithm Description¶
The workflow involves reading the correlation data, calculating derived metrics, performing statistical comparisons, and generating outputs. These steps are demonstrated in the following diagram.
Data Loading and Preparation¶
- Read H5: For each group, the tool reads the provided H5 files (
raw_correlations_h5.h5
). - Filter States: It retains only the datasets (correlation matrices) corresponding to the
State Names
specified by the user. - Calculate Average Correlations: For each recording and state, it calculates the mean of positive off-diagonal correlations and the mean of negative off-diagonal correlations. This results in subject-level data.
- Calculate Cell Statistics: For each recording, state, and cell, it calculates the user-specified statistic ("max", "min", or "mean") of that cell's correlations with all other cells using the
measure_cells
function. This results in cell-level data.
Statistical Comparisons¶
The tool uses functions from the pingouin package for statistical analysis.
-
Average Correlation Analysis:
- Single Group: Compares average positive/negative correlations across states using a one-way Repeated Measures ANOVA (
pingouin.rm_anova
). - Two Groups (Paired): Compares average positive/negative correlations across states and groups using a two-way Repeated Measures ANOVA (
pingouin.rm_anova
). - Two Groups (Unpaired): Compares average positive/negative correlations across states (within-subject factor) and groups (between-subject factor) using a Mixed ANOVA (
pingouin.mixed_anova
).
- Single Group: Compares average positive/negative correlations across states using a one-way Repeated Measures ANOVA (
-
Cell-level Statistic Analysis:
- State Comparison: Uses Linear Mixed Models (LMM) (
pingouin.linear_regression
) to compare the cell-level statistic across states, accounting for within-subject variability and potentially group differences. This handles the nested structure (cells within subjects). - Group Comparison: First, averages the cell-level statistic per subject/state/group. Then, compares these subject-level averages between groups using ANOVA (RM-ANOVA for paired, Mixed ANOVA for unpaired).
- State Comparison: Uses Linear Mixed Models (LMM) (
-
Pairwise Comparisons: Following significant ANOVA or LMM results, pairwise tests (
pingouin.pairwise_tests
) are performed to pinpoint differences between specific states or groups. The user selects the multiple comparison correction method and effect size calculation. Normality is checked, and non-parametric tests may be used if assumptions are violated.
Effect Size Calculation Options¶
Method | Description |
---|---|
Cohen's d | A measure of effect size that expresses the difference between two means in standard deviation units. |
Hedges' g | Similar to Cohen's d, but includes a correction for small sample sizes, providing a more accurate estimate of effect size. |
r eta-squared (η²) | A measure of the proportion of variance in a dependent variable that is associated with one or more independent variables. |
Odds ratio | A measure of association between two binary variables, representing the odds of an event occurring in one group versus another. |
Area under the curve (AUC) | Typically refers to the AUC of a receiver operating characteristic (ROC) curve, used to evaluate the performance of a binary classifier by measuring its ability to distinguish between classes. |
Common Language Effect Size | A probability-based effect size that expresses the likelihood that a randomly chosen score from one group will be higher than a randomly chosen score from another group. |
Multiple Comparison Correction Options¶
Method | Description |
---|---|
bonf (Bonferroni) | Adjusts the p-value threshold by dividing it by the number of comparisons. Controls Family-Wise Error Rate (FWER). |
sidak (Sidak) | Similar to Bonferroni but slightly less conservative, assuming tests are independent. Controls FWER. |
holm (Holm-Bonferroni) | A step-down method that is uniformly more powerful than Bonferroni. Controls FWER. |
fdr_bh (Benjamini-Hochberg) | Controls the False Discovery Rate (FDR) by ranking p-values. Often preferred when many tests are performed. |
fdr_by (Benjamini-Yekutieli) | Controls FDR under arbitrary dependency assumptions, more conservative than Benjamini-Hochberg. |
none | No correction is applied. Increases the risk of Type I errors (false positives). |
Outputs¶
Combination Outputs¶
Combined Data (CSV)¶
Two types of combined data files are generated per group:
-
Average Correlations: A
csv
file (e.g.,group1_combined_average_correlation.csv
) containing the calculated average positive and negative correlations for each recording and state.- Columns:
file
,state
,positive_correlation
,negative_correlation
,subject_id
,group_name
.
- Columns:
-
Cell Statistic Correlations: A
csv
file (e.g.,group1_combined_max_correlation.csv
) containing the calculated cell-level statistic (max, min, or mean) for each cell, state, and recording.- Columns:
file
,state
,cell
,{statistic}_correlation
(e.g.,max_correlation
),subject_id
,group_name
.
- Columns:
Example Average Correlation Data:
file | state | positive_correlation | negative_correlation | subject_id | group_name |
---|---|---|---|---|---|
rec1_correlations.h5 | immobile | 0.08 | -0.07 | 1 | group1 |
rec1_correlations.h5 | mobile | 0.09 | -0.08 | 1 | group1 |
rec2_correlations.h5 | immobile | 0.07 | -0.06 | 2 | group1 |
rec2_correlations.h5 | mobile | 0.10 | -0.09 | 2 | group1 |
Example Cell Statistic Data (statistic='max'
):
file | state | cell | max_correlation | subject_id | group_name |
---|---|---|---|---|---|
rec1_correlations.h5 | immobile | 0 | 0.36 | 1 | group1 |
rec1_correlations.h5 | immobile | 1 | 0.45 | 1 | group1 |
rec1_correlations.h5 | mobile | 0 | 0.39 | 1 | group1 |
rec1_correlations.h5 | mobile | 1 | 0.48 | 1 | group1 |
Combination Previews (SVG)¶
For each group, Cumulative Distribution Function (CDF) plots are generated as previews for the combined data:
- One plot for average positive correlations (
{group_name}_avg_positive_correlation_cdf.svg
). - One plot for average negative correlations (
{group_name}_avg_negative_correlation_cdf.svg
). - One plot for the cell-level statistic correlations (
{group_name}_{statistic}_correlation_cdf.svg
).
These plots show the cumulative distribution of the respective correlation values across all recordings in the group, with separate lines colored by state.
Statistical Comparison Outputs¶
Statistical Results (CSV)¶
Two CSV files summarize the statistical tests:
-
ANOVA Results (
ANOVA_comparisons.csv
): Contains summary results from the ANOVA (RM-ANOVA, Mixed ANOVA) and LMM tests performed. The exact columns depend on the specific test (pingouin
function) used for each comparison type (average vs. cell-level, state vs. group). Key columns typically include:Comparison
: The type of data being compared (e.g., "Average Correlation", "Max Correlation").Measure
: The specific metric column name (e.g., "average_correlation", "max_correlation").analysis_level
: The level at which the analysis was performed ("subject" or "cell").comparison_type
: The nature of the comparison ("state", "group", "state-group").Source
: The factor being tested (e.g., "state", "group_name", "Interaction").SS
,MS
: Sum of Squares, Mean Squares.DF1
,DF2
: Degrees of Freedom.F
: F-statistic (for ANOVA).p-unc
: Uncorrected p-value.np2
: Partial eta-squared effect size (for ANOVA).eps
: Greenhouse-Geisser epsilon (sphericity correction).
-
Pairwise Comparisons (
pairwise_comparisons.csv
): Contains detailed results from post-hoc pairwise tests. The exact columns depend on the specific test (pingouin.pairwise_tests
with different parameters). Key columns typically include:Comparison
,Measure
,analysis_level
,comparison_type
: As above.Contrast
: The factor levels being contrasted (e.g., "state", "group_name", "state * group_name").A
,B
: The specific levels/groups being compared.Paired
: Boolean indicating if the test was paired.Parametric
: Boolean indicating if a parametric test was used.T
/U
/W-val
: Test statistic (t-test, Mann-Whitney U, Wilcoxon).dof
: Degrees of Freedom.alternative
: Hypothesis tested (e.g., "two-sided").p-unc
: Uncorrected p-value.p-corr
: Corrected p-value (if applicable).p-adjust
: Correction method used.BF10
: Bayes Factor.- Effect size column (e.g.,
Cohen-d
,CLES
) based on user selection.
Note: The example tables below are illustrative and may not exactly match all columns generated by every possible test configuration.
Example ANOVA Output Snippet:
Comparison | Measure | analysis_level | comparison_type | Source | SS | DF1 | DF2 | MS | F | p-unc | np2 | eps |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Average Correlation | average_correlation | subject | state-group | state | 0.10 | 2 | 4 | 0.05 | 271.23 | 0.00 | 0.99 | 0.59 |
Max Correlation | max_correlation | cell | state | state | ... | ... | ... | ... | ... | ... | ... | ... |
Max Correlation | max_correlation | subject | group | group_name | ... | ... | ... | ... | ... | ... | ... | ... |
Example Pairwise Output Snippet:
Comparison | Measure | analysis_level | comparison_type | Contrast | A | B | Paired | Parametric | T | dof | p-unc | p-corr | p-adjust | Cohen | BF10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Average Correlation | average_correlation | subject | state-group | state | immobile | mobile | TRUE | TRUE | 0.64 | 3 | 0.57 | 1.0 | bonf | 0.51 | 0.50 |
Max Correlation | max_correlation | cell | state | state | immobile | mobile | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Max Correlation | max_correlation | subject | group | group_name | group1 | group2 | FALSE | TRUE | ... | ... | ... | ... | ... | ... | ... |
Statistical Comparison Previews (SVG)¶
Several SVG plots visualize the data and statistical results:
- Average Correlation Distribution (
average_correlation_distribution.svg
): Shows the distribution of average positive and negative correlations across states (and groups, if applicable) in separate panels. Points represent individual recordings (subjects), connected by lines if data is paired. Mean lines are shown for each group/state. Significance annotations from pairwise tests may be included.
- {Statistic} Correlation State LMM (
{statistic}_correlation_state_lmm.svg
): Visualizes the cell-level statistic data (e.g., max correlation) compared across states (and groups). Often uses boxplots or similar representations showing the distribution per state/group. Significance annotations from the LMM pairwise comparisons may be included.
- {Statistic} Correlation Group ANOVA (
{statistic}_correlation_group_rm_anova.svg
or..._mixed_anova.svg
): Shows the subject-averaged statistic data compared between groups across states. Uses plots similar to the State LMM plot (e.g., boxplots) but represents subject averages rather than individual cells. Significance annotations from the group ANOVA pairwise comparisons may be included.