Skip to content

Combine and Compare Correlation Data

Compute Credits

This tool uses 1.0 compute credits per hour.

Overview

This tool combines cell-cell correlation data generated by the Compare Neural Circuit Correlations Across States tool from multiple recordings. It focuses on specific states defined by the user. The tool calculates and compares several correlation metrics across recordings, states, and experimental groups:

New version changes

  • Input Format Change: This tool now requires correlation data in HDF5 (.h5) format, specifically the file generated by the Compare Neural Circuit Correlations Across States. This is a change from previous versions which have used CSV outputs.

  • Benefits of H5 Input: The H5 format stores the full raw correlation matrices for each state. This enables:

    • Single-cell Level Analysis: Calculation and comparison of cell-specific statistics like the maximum, minimum, or mean correlation (statistic parameter).
    • Detailed Average Correlations: Separate calculation and comparison of the average positive and average negative correlations across all cell pairs within a state.
  • For Users of the Previous Correlation Tool: If you previously used the Compare Neural Circuit Correlations Across States and saved its outputs, you can use this updated combine-and-compare tool by providing the *.h5 file generated by that tool as input. The previous CSV outputs are no longer the primary input for this combined analysis tool.

Parameters

Parameter Required? Default Description
Group 1 Correlation Data Files True N/A Select correlation data from the first group to use for analysis
Group 1 Name True group1 Name of the first group
Group 1 Color True tab:red Color of the first group
Group 2 Correlation Data Files False N/A Select correlation data from the second group to use for analysis
Group 2 Name False N/A Name of the second group
Group 2 Color True tab:orange Color of the second group
State Names True N/A Names of analyzed states
State Colors True N/A Colors of analyzed states
Comparison Type False N/A Type of statistical test to perform
Multiple Comparison Correction method True N/A Method for correcting for multiple comparisons
Effect Size Method True N/A Method for calculating the effect size
Data Pairing False unpaired Indicates whether observations should be paired for statistical comparison
Subject Matching Method False order Method for matching subjects between groups in paired analysis

Input Files

Source Parameter File Type File Format
Group 1 Correlation Data Files correlation_data h5
Group 2 Correlation Data Files correlation_data h5

The input files have the following requirements:

  • File Type: Input files must be HDF5 files (.h5) generated by the Compare Neural Circuit Correlations Across States.
  • Group Size: If data for a group is provided, that group must contain at least two H5 files.
  • State Matching: The State Names parameter provided to this tool must be a comma-separated list of strings that exactly match (case-insensitive) the names of the datasets within the input H5 files that you wish to analyze.
  • State Consistency: All input H5 files should contain datasets for the states specified in State Names.
  • State Colors: The number of state_colors provided must be equal to the number of State Names provided.

Additionally, each input H5 file is expected to contain top-level datasets where:

  • The name of each dataset corresponds to a state (e.g., "immobile", "mobile", "other").
  • The value of each dataset is a 2D NumPy array representing the cell-cell Pearson correlation matrix for that state. The diagonal elements are expected to be zero.

Algorithm Description

The workflow involves reading the correlation data, calculating derived metrics, performing statistical comparisons, and generating outputs. These steps are demonstrated in the following diagram.

graph TD subgraph Data Preparation A[Read H5 files for Group 1] --> B(Filter states based on user input); C[Read H5 files for Group 2 Optional] --> D(Filter states based on user input); B --> E{Calculate Metrics}; D --> E; E -- Average Correlations --> F[Calculate Avg Positive/Negative Correlations per recording/state]; E -- Cell Statistics --> G[Calculate Cell Max/Min/Mean Correlations per recording/state/cell]; end subgraph Statistical Analysis F --> H[Analyze Average Correlations ANOVA: RM / Mixed]; G --> I[Analyze Cell Correlations LMM for State, ANOVA for Group]; H --> J(Perform Pairwise Tests); I --> J; end subgraph Output Generation F --> K[Save Combined Average Correlation CSV]; G --> L[Save Combined Cell Statistic CSV]; J --> M[Save ANOVA & Pairwise Results CSVs]; F & G & J --> N[Generate Preview Plots CDFs, Comparison Plots]; end

Data Loading and Preparation

  1. Read H5: For each group, the tool reads the provided H5 files (raw_correlations_h5.h5).
  2. Filter States: It retains only the datasets (correlation matrices) corresponding to the State Names specified by the user.
  3. Calculate Average Correlations: For each recording and state, it calculates the mean of positive off-diagonal correlations and the mean of negative off-diagonal correlations. This results in subject-level data.
  4. Calculate Cell Statistics: For each recording, state, and cell, it calculates the user-specified statistic ("max", "min", or "mean") of that cell's correlations with all other cells using the measure_cells function. This results in cell-level data.

Statistical Comparisons

The tool uses functions from the pingouin package for statistical analysis.

  1. Average Correlation Analysis:

    • Single Group: Compares average positive/negative correlations across states using a one-way Repeated Measures ANOVA (pingouin.rm_anova).
    • Two Groups (Paired): Compares average positive/negative correlations across states and groups using a two-way Repeated Measures ANOVA (pingouin.rm_anova).
    • Two Groups (Unpaired): Compares average positive/negative correlations across states (within-subject factor) and groups (between-subject factor) using a Mixed ANOVA (pingouin.mixed_anova).
  2. Cell-level Statistic Analysis:

    • State Comparison: Uses Linear Mixed Models (LMM) (pingouin.linear_regression) to compare the cell-level statistic across states, accounting for within-subject variability and potentially group differences. This handles the nested structure (cells within subjects).
    • Group Comparison: First, averages the cell-level statistic per subject/state/group. Then, compares these subject-level averages between groups using ANOVA (RM-ANOVA for paired, Mixed ANOVA for unpaired).
  3. Pairwise Comparisons: Following significant ANOVA or LMM results, pairwise tests (pingouin.pairwise_tests) are performed to pinpoint differences between specific states or groups. The user selects the multiple comparison correction method and effect size calculation. Normality is checked, and non-parametric tests may be used if assumptions are violated.

Effect Size Calculation Options

Method Description
Cohen's d A measure of effect size that expresses the difference between two means in standard deviation units.
Hedges' g Similar to Cohen's d, but includes a correction for small sample sizes, providing a more accurate estimate of effect size.
r eta-squared (η²) A measure of the proportion of variance in a dependent variable that is associated with one or more independent variables.
Odds ratio A measure of association between two binary variables, representing the odds of an event occurring in one group versus another.
Area under the curve (AUC) Typically refers to the AUC of a receiver operating characteristic (ROC) curve, used to evaluate the performance of a binary classifier by measuring its ability to distinguish between classes.
Common Language Effect Size A probability-based effect size that expresses the likelihood that a randomly chosen score from one group will be higher than a randomly chosen score from another group.

Multiple Comparison Correction Options

Method Description
bonf (Bonferroni) Adjusts the p-value threshold by dividing it by the number of comparisons. Controls Family-Wise Error Rate (FWER).
sidak (Sidak) Similar to Bonferroni but slightly less conservative, assuming tests are independent. Controls FWER.
holm (Holm-Bonferroni) A step-down method that is uniformly more powerful than Bonferroni. Controls FWER.
fdr_bh (Benjamini-Hochberg) Controls the False Discovery Rate (FDR) by ranking p-values. Often preferred when many tests are performed.
fdr_by (Benjamini-Yekutieli) Controls FDR under arbitrary dependency assumptions, more conservative than Benjamini-Hochberg.
none No correction is applied. Increases the risk of Type I errors (false positives).

Outputs

Combination Outputs

Combined Data (CSV)

Two types of combined data files are generated per group:

  1. Average Correlations: A csv file (e.g., group1_combined_average_correlation.csv) containing the calculated average positive and negative correlations for each recording and state.

    • Columns: file, state, positive_correlation, negative_correlation, subject_id, group_name.
  2. Cell Statistic Correlations: A csv file (e.g., group1_combined_max_correlation.csv) containing the calculated cell-level statistic (max, min, or mean) for each cell, state, and recording.

    • Columns: file, state, cell, {statistic}_correlation (e.g., max_correlation), subject_id, group_name.

Example Average Correlation Data:

file state positive_correlation negative_correlation subject_id group_name
rec1_correlations.h5 immobile 0.08 -0.07 1 group1
rec1_correlations.h5 mobile 0.09 -0.08 1 group1
rec2_correlations.h5 immobile 0.07 -0.06 2 group1
rec2_correlations.h5 mobile 0.10 -0.09 2 group1

Example Cell Statistic Data (statistic='max'):

file state cell max_correlation subject_id group_name
rec1_correlations.h5 immobile 0 0.36 1 group1
rec1_correlations.h5 immobile 1 0.45 1 group1
rec1_correlations.h5 mobile 0 0.39 1 group1
rec1_correlations.h5 mobile 1 0.48 1 group1

Combination Previews (SVG)

For each group, Cumulative Distribution Function (CDF) plots are generated as previews for the combined data:

  • One plot for average positive correlations ({group_name}_avg_positive_correlation_cdf.svg).
  • One plot for average negative correlations ({group_name}_avg_negative_correlation_cdf.svg).
  • One plot for the cell-level statistic correlations ({group_name}_{statistic}_correlation_cdf.svg).

These plots show the cumulative distribution of the respective correlation values across all recordings in the group, with separate lines colored by state.

Example: Max Correlation CDF for Group 1
Example: Max Correlation CDF for Group 2

Statistical Comparison Outputs

Statistical Results (CSV)

Two CSV files summarize the statistical tests:

  1. ANOVA Results (ANOVA_comparisons.csv): Contains summary results from the ANOVA (RM-ANOVA, Mixed ANOVA) and LMM tests performed. The exact columns depend on the specific test (pingouin function) used for each comparison type (average vs. cell-level, state vs. group). Key columns typically include:

    • Comparison: The type of data being compared (e.g., "Average Correlation", "Max Correlation").
    • Measure: The specific metric column name (e.g., "average_correlation", "max_correlation").
    • analysis_level: The level at which the analysis was performed ("subject" or "cell").
    • comparison_type: The nature of the comparison ("state", "group", "state-group").
    • Source: The factor being tested (e.g., "state", "group_name", "Interaction").
    • SS, MS: Sum of Squares, Mean Squares.
    • DF1, DF2: Degrees of Freedom.
    • F: F-statistic (for ANOVA).
    • p-unc: Uncorrected p-value.
    • np2: Partial eta-squared effect size (for ANOVA).
    • eps: Greenhouse-Geisser epsilon (sphericity correction).
  2. Pairwise Comparisons (pairwise_comparisons.csv): Contains detailed results from post-hoc pairwise tests. The exact columns depend on the specific test (pingouin.pairwise_tests with different parameters). Key columns typically include:

    • Comparison, Measure, analysis_level, comparison_type: As above.
    • Contrast: The factor levels being contrasted (e.g., "state", "group_name", "state * group_name").
    • A, B: The specific levels/groups being compared.
    • Paired: Boolean indicating if the test was paired.
    • Parametric: Boolean indicating if a parametric test was used.
    • T / U / W-val: Test statistic (t-test, Mann-Whitney U, Wilcoxon).
    • dof: Degrees of Freedom.
    • alternative: Hypothesis tested (e.g., "two-sided").
    • p-unc: Uncorrected p-value.
    • p-corr: Corrected p-value (if applicable).
    • p-adjust: Correction method used.
    • BF10: Bayes Factor.
    • Effect size column (e.g., Cohen-d, CLES) based on user selection.

Note: The example tables below are illustrative and may not exactly match all columns generated by every possible test configuration.

Example ANOVA Output Snippet:

Comparison Measure analysis_level comparison_type Source SS DF1 DF2 MS F p-unc np2 eps
Average Correlation average_correlation subject state-group state 0.10 2 4 0.05 271.23 0.00 0.99 0.59
Max Correlation max_correlation cell state state ... ... ... ... ... ... ... ...
Max Correlation max_correlation subject group group_name ... ... ... ... ... ... ... ...

Example Pairwise Output Snippet:

Comparison Measure analysis_level comparison_type Contrast A B Paired Parametric T dof p-unc p-corr p-adjust Cohen BF10
Average Correlation average_correlation subject state-group state immobile mobile TRUE TRUE 0.64 3 0.57 1.0 bonf 0.51 0.50
Max Correlation max_correlation cell state state immobile mobile ... ... ... ... ... ... ... ... ...
Max Correlation max_correlation subject group group_name group1 group2 FALSE TRUE ... ... ... ... ... ... ...

Statistical Comparison Previews (SVG)

Several SVG plots visualize the data and statistical results:

  1. Average Correlation Distribution (average_correlation_distribution.svg): Shows the distribution of average positive and negative correlations across states (and groups, if applicable) in separate panels. Points represent individual recordings (subjects), connected by lines if data is paired. Mean lines are shown for each group/state. Significance annotations from pairwise tests may be included.
Example: Average Correlation Distribution
  1. {Statistic} Correlation State LMM ({statistic}_correlation_state_lmm.svg): Visualizes the cell-level statistic data (e.g., max correlation) compared across states (and groups). Often uses boxplots or similar representations showing the distribution per state/group. Significance annotations from the LMM pairwise comparisons may be included.
Example: Max Correlation State LMM
  1. {Statistic} Correlation Group ANOVA ({statistic}_correlation_group_rm_anova.svg or ..._mixed_anova.svg): Shows the subject-averaged statistic data compared between groups across states. Uses plots similar to the State LMM plot (e.g., boxplots) but represents subject averages rather than individual cells. Significance annotations from the group ANOVA pairwise comparisons may be included.
Example: Max Correlation Group RM ANOVA