Skip to content

Compare State-Epoch Data Between Groups

Compute Credits

This tool uses 1.0 compute credits per hour.

Overview

The Compare State-Epoch Data Between Groups workflow compares state-epoch activity, correlation, and modulation metrics across two experimental groups using the CSV/H5 outputs generated by the Compare State Data Across Epochs tool. It ingests the per-group activity_per_state_epoch_data.csv, correlations_per_state_epoch_data.csv (or raw correlation H5 files), and modulation_vs_baseline_data.csv, then produces harmonized group tables, statistical summaries, and publication-ready previews. The comparison can be performed across states or epochs (one dimension per run) and supports trace-only, event-only, or dual-measure analyses.

Key capabilities:

  • Validates that both groups share the same baseline state-epoch reference, state list, epoch list, and baseline modulation metadata.
  • Supports paired or unpaired comparisons, automatic or user-specified parametric tests, and multiple correction strategies (Bonferroni, FDR, etc.).
  • Calculates per-group descriptive statistics, ANOVA-style summaries, pairwise tests, and optional linear mixed-model (LMM) analyses for cell-level metrics.
  • Reclassifies modulation significance at a new α threshold when requested, preserving the alpha/2 logic used in the baseline tool.
  • Runs either two-group comparisons or a single-group collapse; single-group mode aggregates all provided recordings and reports within-group ANOVA/pairwise statistics while retaining the same output directory layout.

Design Benefits

  • Structured inputs: By reusing the single-tool outputs (activity, correlation, modulation), each group already shares state/epoch labels, baseline metadata, and scaling, so comparison code can stay focused on higher-level logic.
  • Orthogonal flow control: Two explicit branches—trace vs event and state vs epoch—allow users to toggle modalities or comparison axes independently without touching other parameters.
  • Targeted statistics: Limiting the engine to 2-way ANOVAs (state-by-group or epoch-by-group) avoids 3-way complexity while matching how the single tool reports per-dimension summaries; multiple-comparison correction is applied within each tested context.

Potential Future Expansion

Future versions may add a third comparison dimension so that a full three-way ANOVA (e.g., state × epoch × group) can be explored to support the additional complexity. If you’d like this feature, please reach out to inscopix.support@bruker.com.

Note

  • Group 2 inputs are optional; the tool can run in "single group" mode to collapse recordings and produce per-group ANOVA/pairwise outputs. Single-group runs still require at least two subjects so statistical tests remain valid.
  • When correlation data is provided in H5 format, it must include the trace and/or event groups produced by Compare State Data Across Epochs. The tool converts them to the same schema as the CSV files automatically.

Input Data

Compatibility

This tool is designed to work exclusively with outputs from the Compare State Data Across Epochs tool. It is not compatible with outputs from the Compare Neural Activity Across States tool, as the data structures differ between these workflows.

All input files must come directly from Compare State Data Across Epochs and use the same configuration (states, epochs, baseline definitions, scaling choices). Each run must include at least two activity_per_state_epoch_data.csv files for Group 1 because the statistical tests require more than one subject. You may omit Group 2 entirely to run a single-group analysis; when two groups are supplied, every modality present for Group 1 must also be provided for Group 2 with matching file counts so comparisons remain balanced.

Source Parameter File Type File Format
Group 1 Activity CSV Files epoch_activity_data csv
Group 1 Correlation Files correlation_data, correlation_data csv, h5
Group 1 Modulation CSV Files modulation_data csv
Group 2 Activity CSV Files epoch_activity_data csv
Group 2 Correlation Files correlation_data, correlation_data csv, h5
Group 2 Modulation CSV Files modulation_data csv

Minimum recordings & pairing rules

  • Provide at least two activity_per_state_epoch_data.csv files per group; correlation and modulation modalities (when supplied) must also contain ≥2 files so ANOVA/pairwise/LMM tests have sufficient degrees of freedom.
  • When two groups are provided, each modality must have matching file counts across groups. Paired analyses additionally require subject_matching to identify at least two matched subject pairs per modality; otherwise the run aborts with a descriptive error.

Group consistency requirements

  • Baseline match: Both groups must share identical baseline_state and baseline_epoch values embedded in their modulation CSVs.
  • State/Epoch match: The ordered lists of state names and epoch names must match exactly; otherwise execution stops with an informative error.
  • Subject identifiers: Subject IDs (from normalized_subject_id) must follow the standardized format. When data_pairing="paired", subject_matching aligns files using one of the supported strategies: number (match digits in filenames), filename (exact basename), or order (original list order, used as the fallback). At least two matched subjects are required.
  • Modality parity: In two-group runs, correlation and modulation inputs must be present for Group 2 whenever they exist for Group 1 so the tool can build balanced statistics.

Correlation inputs options

  • CSV path: correlations_per_state_epoch_data.csv files for each recording. This is the preferred (future default) format.
  • H5 path: pairwise_correlation_heatmaps.h5 files with trace/<state-epoch> and/or event/<state-epoch> datasets. These remain supported for backward compatibility; the tool automatically converts them into the same column structure as the CSV files.

If neither CSV nor H5 correlation data is provided, correlation analyses and previews are skipped.

Parameters

Parameter Required? Default Description
Group 1 Activity CSV Files True N/A List of activity_per_state_epoch_data.csv files for group 1 (from state_epoch_baseline_analysis outputs)
Group 1 Correlation Files False N/A Optional correlation data files for group 1 (from state_epoch_baseline_analysis outputs). Provide to enable correlation analyses; accepts correlations_per_state_epoch_data.csv or pairwise_correlation_heatmaps.h5 files
Group 1 Modulation CSV Files False N/A Optional modulation_vs_baseline_data.csv files for group 1 (from state_epoch_baseline_analysis outputs). Provide to enable modulation analyses
Group 1 Name True N/A Name for group 1 (e.g., 'Control')
Group 1 Color False N/A Color for group 1 visualizations (e.g., 'blue')
Group 2 Activity CSV Files False N/A List of activity_per_state_epoch_data.csv files for group 2 (optional for two-group comparison)
Group 2 Correlation Files False N/A Correlation data files for group 2 (optional for two-group comparison). Required when group 2 activity files are provided and group 1 correlation files are supplied. Accepts correlations_per_state_epoch_data.csv or pairwise_correlation_heatmaps.h5 files
Group 2 Modulation CSV Files False N/A Modulation_vs_baseline_data.csv files for group 2 (optional for two-group comparison). Required when group 2 activity files are provided and group 1 modulation files are supplied
Group 2 Name False N/A Name for group 2 (e.g., 'Treatment'). Required if group 2 files are provided.
Group 2 Color False N/A Color for group 2 visualizations (e.g., 'red')
Comparison Dimension True N/A Dimension to compare: 'states' (compare across behavioral states) or 'epochs' (compare across time epochs)
Measure Source False N/A Data source to analyze across activity, correlation, and modulation: 'trace' (use trace-based measures, fallback to event if missing), 'event' (use event-based measures exclusively), or 'both' (analyze both trace and event separately)
State Colors False N/A Optional list of hex color codes for states (e.g., ["#FF0000", "#00FF00", "#0000FF"]). Colors will be assigned to states in the order they appear in the data. If not provided, colors will be extracted from CSV files or auto-generated.
Epoch Colors False N/A Optional list of hex color codes for epochs (e.g., ["#0000FF", "#FFFF00", "#FF00FF"]). Colors will be assigned to epochs in the order they appear in the data. If not provided, colors will be extracted from CSV files or auto-generated.
Modulation Colors False N/A Comma-separated list of matplotlib compatible colors representing up-modulated, down-modulated, and non-modulated neurons respectively (e.g., 'green,blue,black'). Default: 'green,blue,black'
Data Pairing True N/A Type of data pairing: 'unpaired' (independent samples) or 'paired' (matched subjects across groups)
Subject Matching False N/A Method for matching subjects between groups (for paired analysis): 'order' (match by file order), 'number' (match by numeric IDs), or 'filename' (match by filename)
Correlation Statistic False N/A Type of per-cell correlation statistic to analyze: 'max' (maximum), 'min' (minimum), or 'mean' (average)
Significance Threshold False N/A Significance threshold for statistical tests (default: 0.05). Leave empty to use pre-computed modulation classifications.
Multiple Comparison Correction False N/A Method for multiple comparison correction
Multiple Comparison Scope False global Scope for multiple comparison correction. Global: correct across ALL tests from all strata (recommended, most conservative). Within-stratum: correct only within each stratum (e.g., each epoch separately when comparing states). Global correction prevents inflation of Type I error when performing stratified analyses.
Effect Size False N/A Method for calculating effect size
Group Comparison Type True N/A Type of statistical test to perform. Two-tailed tests for differences in either direction, one-tailed tests for directional hypotheses.
Parametric False N/A Indicates whether to perform a parametric test. If set to 'auto', a parametric test will be used if the data follows a normal distribution and there are at least 8 observations. Otherwise, a non-parametric test will be used.
Enable LMM Analysis False N/A Enable linear mixed model analysis to support imbalanced designs. Disable to skip LMM processing.
Save LMM Output Files False N/A Persist linear mixed model result tables to CSV outputs. Disable to keep LMM results in memory only.

Highlights:

  • comparison_dimension: choose "states" or "epochs" to drive the aggregation and visualization axis.
  • measure_source: "trace", "event", or "both" to control whether analyses run on traces, events, or both modalities independently.
  • correlation_statistic: "max", "mean", or "min" selects the per-cell correlation metric that feeds the summaries and previews (positive/negative population averages are always added).
  • data_pairing & subject_matching: set "paired" vs "unpaired" plus the matching rule used only in paired mode (number for digits inside filenames, filename/name for exact basename matches, or order to keep the provided ordering). Paired mode requires ≥2 matched subjects.
  • multiple_correction & multiple_correction_scope: configure family-wise error control globally or within strata (within_stratum, with per_condition accepted as an alias).
  • parametric: "auto" (data-driven parametric/non-parametric selection), "True" (force parametric; raises an error if assumptions fail), or "False" (always use non-parametric tests).
  • enable_lmm_analysis: toggles the optional linear mixed-effects pipeline for cell-level measures; save_lmm_outputs adds per-comparison CSVs summarizing model fits.

Workflow

graph TD A[Validate inputs] --> B["Load & harmonize group data\n(activity, correlation, modulation)"]; B --> C[Match subjects & propagate metadata]; C --> D[Compute per-group descriptive tables]; D --> E[Generate group previews]; E --> F["Run statistical tests\n(ANOVA / pairwise / optional LMM)"]; F --> G[Apply multiple correction & effect sizes]; G --> H[Compile comparison summaries]; H --> I[Register outputs & previews];

Processing steps

  1. Validation & metadata alignment
  2. Ensures required CSV/H5 files exist and share the same states, epochs, baseline, and color metadata.
  3. Normalizes group names/colors (or generates defaults) and records them for output labeling.

  4. Subject matching & pairing

  5. Aligns subjects across groups using the configured subject_matching rule (number, filename, or order) whenever data_pairing="paired".
  6. Handles paired vs. unpaired scenarios and logs dropped or unmatched subjects.

  7. Measure selection

  8. Builds unified column sets (activity, per-cell correlations, population correlations, modulation counts) based on measure_source and correlation_statistic.
  9. For modulation data, can reclassify neurons at a custom significance threshold while respecting alpha/2 directionality.

  10. Statistical testing

  11. Produces ANOVA-style summaries and pairwise test tables for each metric.
  12. Optional LMM analysis (when enabled and data are cell-level) uses subject and cell IDs as random effects to detect subtle group-by-state/epoch interactions.
  13. Multiple-comparison corrections (Bonferroni, FDR, etc.) are applied either globally or within each stratum.

  14. Visualization

  15. Generates per-group boxplots/CDFs for activity and correlations, modulation distributions, and comparison-level summaries for the selected dimension.
  16. Preview SVGs are stored alongside the CSV outputs in .previews/ subdirectories.

Outputs

All outputs are organized under the chosen output_dir with descriptive subfolders. Key artifacts include:

  1. Per-group combined data
  2. <group>_combined_activity_data/<comparison>_<group>_combined_activity_data.csv
  3. <group>_combined_trace_correlation_data/<comparison>_<group>_combined_trace_correlation_data.csv
  4. <group>_combined_modulation_data/<comparison>_<group>_combined_modulation_data.csv
  5. Each folder contains matching preview SVGs inside .previews/, including <group>_{trace|event}_activity_boxplot.svg, <stat>_{trace|event}_correlation_{boxplot|cdf}.svg for every requested correlation_statistic, <positive|negative>_{trace|event}_population_boxplot.svg, and <group>_{trace|event}_modulation_distribution.svg (event variants appear only when event data are analyzed).

  6. Trace statistical summaries

  7. trace_aov_comparisons/<comparison>_trace_aov_comparisons.csv
  8. trace_pairwise_comparisons/<comparison>_trace_pairwise_comparisons.csv

  9. Event statistical summaries (when event data are available)

  10. event_aov_comparisons/<comparison>_event_aov_comparisons.csv
  11. event_pairwise_comparisons/<comparison>_event_pairwise_comparisons.csv

  12. Optional LMM outputs

  13. Saved only when enable_lmm_analysis and save_lmm_outputs are True; file names follow the same <comparison>_<measure>_lmm_results.csv convention.

  14. Comparison previews

  15. Trace comparisons: states_comparison_trace_activity.svg, states_comparison_trace_correlation.svg, states_comparison_trace_positive_correlation.svg, states_comparison_trace_negative_correlation.svg, states_comparison_trace_modulation.svg, states_comparison_trace_up_modulated_counts.svg, and states_comparison_trace_down_modulated_counts.svg (automatically switched to the epochs_ prefix when comparison_dimension="epochs").
  16. Event comparisons: the same set of filenames with _event_ instead of _trace_ are generated whenever event measures are part of the analysis.

All CSVs include metadata columns for state/epoch identifiers, group labels, subject IDs, source filenames, and the statistical annotations (test type, effect size, corrected p-values) needed for downstream reporting.

Example Combined Activity Output

state epoch group_name normalized_subject_id filename cell_index mean_trace_activity mean_event_rate
rest baseline Control subj_001 recording_001.isxd 0 0.125 0.018
rest baseline Control subj_001 recording_001.isxd 1 0.142 0.012
rest baseline Treatment subj_005 recording_005.isxd 0 0.163 0.020
rest training Control subj_001 recording_001.isxd 0 0.179 0.024
rest training Treatment subj_005 recording_005.isxd 0 0.211 0.029

Example Combined Correlation Output

state epoch group_name normalized_subject_id filename max_trace_correlation mean_trace_correlation positive_trace_correlation
rest baseline Control subj_001 recording_001.isxd 0.83 0.21 0.34
rest baseline Treatment subj_005 recording_005.isxd 0.87 0.24 0.36
rest training Control subj_001 recording_001.isxd 0.78 0.18 0.30
rest training Treatment subj_005 recording_005.isxd 0.81 0.22 0.33

Example Combined Modulation Output

state epoch group_name filename cell_index trace_modulation_scores trace_p_values trace_modulation trace_up_modulation_number trace_down_modulation_number
rest training Control recording_001.isxd 0 0.18 0.032 1 12 3
rest training Control recording_001.isxd 1 -0.09 0.210 0 12 3
rest training Treatment recording_005.isxd 0 0.31 0.004 1 15 1
rest training Treatment recording_005.isxd 2 -0.22 0.012 -1 15 1

Example Trace ANOVA Output

comparison state_or_epoch effect df1 df2 SS MS F p_unc p_corr effect_size
trace_activity rest group 1 10 0.024 0.024 6.42 0.030 0.060 0.39
trace_activity rest epoch 2 20 0.011 0.005 2.14 0.143 0.286 0.21
trace_activity rest interaction 2 20 0.004 0.002 0.81 0.459 0.918 0.09

Example Trace Pairwise Output

comparison contrast state_or_epoch A B paired parametric statistic dof p_unc p_corr p_adjust effect_size
trace_activity group rest Control Treatment False True -2.53 10 0.030 0.060 bonf -0.72
trace_activity epoch Control baseline training True True -1.61 5 0.164 0.328 bonf -0.37

Example LMM Output (optional)

measure fixed_effect estimate std_error df t_value p_unc group state_or_epoch
trace_activity group -0.041 0.016 96 -2.62 0.010 Treatment rest
trace_activity epoch 0.018 0.007 96 2.57 0.012 Treatment training

Previews

The preview examples below correspond to the epochs comparison mode (Group 1 vs. Group 2 across epochs) using trace metrics. When comparison_dimension="states" or when measure_source includes events ("event" or "both"), the tool generates the same family of figures with state-level layouts and/or event-specific data, following the exact styling shown here.

Additional per-group previews for positive/negative population correlations and the event-based activity/correlation/modulation plots are produced automatically when those modalities are analyzed. Similarly, comparison-level modulation prevalence figures (states/epochs_comparison_{trace|event}_modulation.svg) accompany the up/down count charts even though only the trace-based epoch examples are illustrated below.

Group-level activity and correlation

Group 1 (Control) trace activity distribution across epochs
Group 2 (Treatment) trace activity distribution across epochs
Group 1 per-cell correlation statistics (boxplot)
Group 1 per-cell correlation statistics (CDF)
Group 2 per-cell correlation statistics (boxplot)
Group 2 per-cell correlation statistics (CDF)

Modulation distributions

Group 1 trace modulation vs. baseline
Group 2 trace modulation vs. baseline

Comparison summaries

Epoch-level comparison of trace activity between groups
Epoch-level comparison of correlation statistics
Counts of significantly up-modulated neurons per epoch
Counts of significantly down-modulated neurons per epoch