Compare State-Epoch Data Between Groups¶

This tool uses 1.0 compute credits per hour.

Overview¶

The Compare State-Epoch Data Between Groups workflow compares state-epoch activity, correlation, and modulation metrics across two experimental groups using the CSV/H5 outputs generated by the Compare Neural State Data Across Epochs tool. It ingests the per-group activity_per_state_epoch_data.csv, correlations_per_state_epoch_data.csv (or raw correlation H5 files), and modulation_vs_baseline_data.csv, then produces harmonized group tables, statistical summaries, and publication-ready previews. The comparison can be performed across states or epochs (one dimension per run) and supports trace-only, event-only, or dual-measure analyses.

Key capabilities:

Validates that both groups share the same baseline state-epoch reference, state list, epoch list, and baseline modulation metadata.
Supports paired or unpaired comparisons, automatic or user-specified parametric tests, and multiple correction strategies (Bonferroni, FDR, etc.).
Calculates per-group descriptive statistics, ANOVA-style summaries, pairwise tests, and optional linear mixed-model (LMM) analyses for cell-level metrics.
Reclassifies modulation significance at a new α threshold when requested, preserving the alpha/2 logic used in the Compare Neural State Data Across Epochs tool.
Runs either two-group comparisons or a single-group collapse; single-group mode aggregates all provided recordings and reports within-group ANOVA/pairwise statistics while retaining the same output directory layout.

Design Benefits¶

Structured inputs: By reusing the single-tool outputs (activity, correlation, modulation), each group already shares state/epoch labels, baseline metadata, and scaling, so comparison code can stay focused on higher-level logic.
Orthogonal flow control: Two explicit branches—trace vs event and state vs epoch—allow users to toggle modalities or comparison axes independently without touching other parameters.
Targeted statistics: Limiting the engine to 2-way ANOVAs (state-by-group or epoch-by-group) avoids 3-way complexity while matching how the single tool reports per-dimension summaries; multiple-comparison correction is applied within each tested context.

Potential Future Expansion

Future versions may add a third comparison dimension so that a full three-way ANOVA (e.g., state × epoch × group) can be explored to support the additional complexity. If you’d like this feature, please reach out to support.inscopix@bruker.com.

Note

Group 2 inputs are optional; the tool can run in "single group" mode to collapse recordings and produce per-group ANOVA/pairwise outputs. Single-group runs still require at least two subjects so statistical tests remain valid.
When correlation data is provided in H5 format, it must include the trace and/or event groups produced by the Compare Neural State Data Across Epochs tool. The tool converts them to the same schema as the CSV files automatically.

Input Data¶

Compatibility

This tool is designed to work exclusively with outputs from the Compare Neural State Data Across Epochs tool. It is not compatible with outputs from the Compare Neural Activity Across States tool, as the data structures differ between these workflows.

All input files must come directly from Compare Neural State Data Across Epochs and use the same configuration (states, epochs, baseline definitions, scaling choices). Each run must include at least two activity_per_state_epoch_data.csv files for Group 1 because the statistical tests require more than one subject. You may omit Group 2 entirely to run a single-group analysis; when two groups are supplied, every modality present for Group 1 must also be provided for Group 2 with matching file counts so comparisons remain balanced.

Source Parameter	File Type	File Format
Group 1 Activity CSV Files	epoch_activity_data	csv
Group 1 Correlation Files	correlation_data, correlation_data	csv, h5
Group 1 Modulation CSV Files	modulation_data	csv
Group 2 Activity CSV Files	epoch_activity_data	csv
Group 2 Correlation Files	correlation_data, correlation_data	csv, h5
Group 2 Modulation CSV Files	modulation_data	csv

Minimum recordings & pairing rules

Provide at least two activity_per_state_epoch_data.csv files per group; correlation and modulation modalities (when supplied) must also contain ≥2 files so ANOVA/pairwise/LMM tests have sufficient degrees of freedom.
When two groups are provided, each modality must have matching file counts across groups. Paired analyses additionally require subject_matching to identify at least two matched subject pairs per modality; otherwise the run aborts with a descriptive error.

Group consistency requirements¶

Baseline match: Both groups must share identical baseline_state and baseline_epoch values embedded in their modulation CSVs.
State/Epoch match: The ordered lists of state names and epoch names must match exactly; otherwise execution stops with an informative error.
Subject identifiers: Subject IDs (from normalized_subject_id) must follow the standardized format. When data_pairing="paired", subject_matching aligns files using one of the supported strategies: number (match digits in filenames), filename (exact basename), or order (original list order, used as the fallback). At least two matched subjects are required.
Modality parity: In two-group runs, correlation and modulation inputs must be present for Group 2 whenever they exist for Group 1 so the tool can build balanced statistics.

Correlation inputs options¶

CSV path: correlations_per_state_epoch_data.csv files for each recording. This is the preferred (future default) format.
H5 path: pairwise_correlation_heatmaps.h5 files with trace/<state-epoch> and/or event/<state-epoch> datasets. These remain supported for backward compatibility; the tool automatically converts them into the same column structure as the CSV files.

If neither CSV nor H5 correlation data is provided, correlation analyses and previews are skipped.

Parameters¶

Parameter	Required?	Default	Description
Group 1 Activity CSV Files	True	N/A	List of activity_per_state_epoch_data.csv files for group 1 (from state_epoch_baseline_analysis outputs)
Group 1 Correlation Files	False	N/A	Optional correlation data files for group 1 (from state_epoch_baseline_analysis outputs). Provide to enable correlation analyses; accepts correlations_per_state_epoch_data.csv or pairwise_correlation_heatmaps.h5 files
Group 1 Modulation CSV Files	False	N/A	Optional modulation_vs_baseline_data.csv files for group 1 (from state_epoch_baseline_analysis outputs). Provide to enable modulation analyses
Group 1 Name	True	N/A	Name for group 1 (e.g., 'Control')
Group 1 Color	False	N/A	Color for group 1 visualizations (e.g., 'blue')
Group 2 Activity CSV Files	False	N/A	List of activity_per_state_epoch_data.csv files for group 2 (optional for two-group comparison)
Group 2 Correlation Files	False	N/A	Correlation data files for group 2 (optional for two-group comparison). Required when group 2 activity files are provided and group 1 correlation files are supplied. Accepts correlations_per_state_epoch_data.csv or pairwise_correlation_heatmaps.h5 files
Group 2 Modulation CSV Files	False	N/A	Modulation_vs_baseline_data.csv files for group 2 (optional for two-group comparison). Required when group 2 activity files are provided and group 1 modulation files are supplied
Group 2 Name	False	N/A	Name for group 2 (e.g., 'Treatment'). Required if group 2 files are provided.
Group 2 Color	False	N/A	Color for group 2 visualizations (e.g., 'red')
Comparison Dimension	True	N/A	Dimension to compare: 'states' (compare across behavioral states) or 'epochs' (compare across time epochs)
Measure Source	False	N/A	Data source to analyze across activity, correlation, and modulation: 'trace' (use trace-based measures, fallback to event if missing), 'event' (use event-based measures exclusively), or 'both' (analyze both trace and event separately)
State Colors	False	N/A	Optional list of hex color codes for states (e.g., ["#FF0000", "#00FF00", "#0000FF"]). Colors will be assigned to states in the order they appear in the data. If not provided, colors will be extracted from CSV files or auto-generated.
Epoch Colors	False	N/A	Optional list of hex color codes for epochs (e.g., ["#0000FF", "#FFFF00", "#FF00FF"]). Colors will be assigned to epochs in the order they appear in the data. If not provided, colors will be extracted from CSV files or auto-generated.
Modulation Colors	False	N/A	Comma-separated list of matplotlib compatible colors representing up-modulated, down-modulated, and non-modulated neurons respectively (e.g., 'green,blue,black'). Default: 'green,blue,black'
Data Pairing	True	N/A	Type of data pairing: 'unpaired' (independent samples) or 'paired' (matched subjects across groups)
Subject Matching	False	N/A	Method for matching subjects between groups (for paired analysis): 'order' (match by file order), 'number' (match by numeric IDs), or 'filename' (match by filename)
Correlation Statistic	False	N/A	Type of per-cell correlation statistic to analyze: 'max' (maximum), 'min' (minimum), or 'mean' (average)
Significance Threshold	False	N/A	Significance threshold for statistical tests (default: 0.05). Leave empty to use pre-computed modulation classifications.
Multiple Comparison Correction	False	N/A	Method for multiple comparison correction
Multiple Comparison Scope	False	global	Scope for multiple comparison correction. Global: correct across ALL tests from all strata (recommended, most conservative). Within-stratum: correct only within each stratum (e.g., each epoch separately when comparing states). Global correction prevents inflation of Type I error when performing stratified analyses.
Effect Size	False	N/A	Method for calculating effect size
Group Comparison Type	True	N/A	Type of statistical test to perform. Two-tailed tests for differences in either direction, one-tailed tests for directional hypotheses.
Parametric	False	N/A	Indicates whether to perform a parametric test. If set to 'auto', a parametric test will be used if the data follows a normal distribution and there are at least 8 observations. Otherwise, a non-parametric test will be used.
Enable LMM Analysis	False	N/A	Enable linear mixed model analysis to support imbalanced designs. Disable to skip LMM processing.
Save LMM Output Files	False	N/A	Persist linear mixed model result tables to CSV outputs. Disable to keep LMM results in memory only.

Highlights:

comparison_dimension: choose "states" or "epochs" to drive the aggregation and visualization axis.
measure_source: "trace", "event", or "both" to control whether analyses run on traces, events, or both modalities independently.
correlation_statistic: "max", "mean", or "min" selects the per-cell correlation metric that feeds the summaries and previews (positive/negative population averages are always added).
data_pairing & subject_matching: set "paired" vs "unpaired" plus the matching rule used only in paired mode (number for digits inside filenames, filename/name for exact basename matches, or order to keep the provided ordering). Paired mode requires ≥2 matched subjects.
multiple_correction & multiple_correction_scope: configure error control globally or within strata.
parametric: "auto" (data-driven parametric/non-parametric selection), "True" (force parametric; raises an error if assumptions fail), or "False" (always use non-parametric tests).
enable_lmm_analysis: toggles the optional linear mixed-effects pipeline for cell-level measures; save_lmm_outputs adds per-comparison CSVs summarizing model fits.

Workflow¶

graph TD A[Validate inputs] --> B["Load & harmonize group data\n(activity, correlation, modulation)"]; B --> C[Match subjects & propagate metadata]; C --> D[Compute per-group descriptive tables]; D --> E[Generate group previews]; E --> F["Run statistical tests\n(ANOVA / pairwise / optional LMM)"]; F --> G[Apply multiple correction & effect sizes]; G --> H[Compile comparison summaries]; H --> I[Register outputs & previews];

Processing steps¶

Validation & metadata alignment
- Ensures required CSV/H5 files exist and share the same states, epochs, baseline, and color metadata.
- Normalizes group names/colors (or generates defaults) and records them for output labeling.
Subject matching & pairing
- Aligns subjects across groups using the configured subject_matching rule (number, filename, or order) whenever data_pairing="paired".
- Handles paired vs. unpaired scenarios and logs dropped or unmatched subjects.
Measure selection
- Builds unified column sets (activity, per-cell correlations, population correlations, modulation counts) based on measure_source and correlation_statistic.
- For modulation data, can reclassify neurons at a custom significance threshold while respecting alpha/2 directionality.
Statistical testing
- Produces ANOVA-style summaries and pairwise test tables for each metric.
- Optional LMM analysis (when enabled and data are cell-level) uses subject and cell IDs as random effects to detect subtle group-by-state/epoch interactions.
- Multiple-comparison corrections (Bonferroni, FDR, etc.) are applied either globally or within each stratum.
Visualization
- Generates per-group boxplots/CDFs for activity and correlations, modulation distributions, and comparison-level summaries for the selected dimension.

Outputs¶

Key artifacts include:

Per-group combined data

The analysis tables surface combined activity, correlation, and modulation data for each group. Each table lists state, epoch, group name, normalized subject ID, source filename, cell index, and the associated activity/correlation/modulation metrics so users can download or filter them directly.

Preview figures accompany every combination, covering trace and event activity boxplots, correlation boxplots/CDFs, population correlation summaries, and modulation distributions. These previews mirror the metric selection (trace, event, or both) configured at run time.

Trace statistical summaries

Users receive ANOVA-style summaries and pairwise comparison tables for every trace-level metric. Each summary captures the comparison name, tested effect (group/state/epoch), degrees of freedom, sum of squares, F statistic, uncorrected and corrected p-values, plus the reported effect size.

Pairwise tables expose the exact contrasts (e.g., Control vs Treatment), whether tests were paired, the statistic/d.o.f., and both raw and adjusted p-values so downstream reporting matches what the analysis tables display.

Event statistical summaries (when event data are available)

When event measures are enabled, the analysis tables include parallel ANOVA and pairwise outputs for event activity, following the same column schema as the trace summaries.

Optional LMM outputs

If enable_lmm_analysis and save_lmm_outputs are True, the analysis tables list per-measure mixed-model summaries with fixed-effect estimates, standard errors, degrees of freedom, test statistics, and p-values for every requested state or epoch.

Comparison previews

Comparison-level figures (state- or epoch-focused) highlight trace activity, correlation, positive vs negative correlation trends, modulation fractions, and up/down modulated cell counts. When event data are analyzed, matching event previews are added.

All tables share a consistent schema with explicit metadata columns for states/epochs, group labels, subject IDs, and statistical annotations (test type, effect size, corrected p-values).

Example Combined Activity Output¶

state	epoch	group_name	normalized_subject_id	filename	cell_index	mean_trace_activity	mean_event_rate
rest	baseline	Control	subj_001	recording_001.isxd	0	0.125	0.018
rest	baseline	Control	subj_001	recording_001.isxd	1	0.142	0.012
rest	baseline	Treatment	subj_005	recording_005.isxd	0	0.163	0.020
rest	training	Control	subj_001	recording_001.isxd	0	0.179	0.024
rest	training	Treatment	subj_005	recording_005.isxd	0	0.211	0.029

Example Combined Correlation Output¶

state	epoch	group_name	normalized_subject_id	filename	max_trace_correlation	mean_trace_correlation	positive_trace_correlation
rest	baseline	Control	subj_001	recording_001.isxd	0.83	0.21	0.34
rest	baseline	Treatment	subj_005	recording_005.isxd	0.87	0.24	0.36
rest	training	Control	subj_001	recording_001.isxd	0.78	0.18	0.30
rest	training	Treatment	subj_005	recording_005.isxd	0.81	0.22	0.33

Example Combined Modulation Output¶

state	epoch	group_name	filename	cell_index	trace_modulation_scores	trace_p_values	trace_modulation	trace_up_modulation_number	trace_down_modulation_number
rest	training	Control	recording_001.isxd	0	0.18	0.032	1	12	3
rest	training	Control	recording_001.isxd	1	-0.09	0.210	0	12	3
rest	training	Treatment	recording_005.isxd	0	0.31	0.004	1	15	1
rest	training	Treatment	recording_005.isxd	2	-0.22	0.012	-1	15	1

Example Trace ANOVA Output¶

comparison	state_or_epoch	effect	df1	df2	SS	MS	F	p_unc	p_corr	effect_size
trace_activity	rest	group	1	10	0.024	0.024	6.42	0.030	0.060	0.39
trace_activity	rest	epoch	2	20	0.011	0.005	2.14	0.143	0.286	0.21
trace_activity	rest	interaction	2	20	0.004	0.002	0.81	0.459	0.918	0.09

Example Trace Pairwise Output¶

comparison	contrast	state_or_epoch	A	B	paired	parametric	statistic	dof	p_unc	p_corr	p_adjust	effect_size
trace_activity	group	rest	Control	Treatment	False	True	-2.53	10	0.030	0.060	bonf	-0.72
trace_activity	epoch	Control	baseline	training	True	True	-1.61	5	0.164	0.328	bonf	-0.37

Example LMM Output (optional)¶

measure	fixed_effect	estimate	std_error	df	t_value	p_unc	group	state_or_epoch
trace_activity	group	-0.041	0.016	96	-2.62	0.010	Treatment	rest
trace_activity	epoch	0.018	0.007	96	2.57	0.012	Treatment	training

Previews¶

The preview examples below correspond to the epochs comparison mode (Group 1 vs. Group 2 across epochs) using trace metrics. When comparison_dimension="states" or when measure_source includes events ("event" or "both"), the tool generates the same family of figures with state-level layouts and/or event-specific data, following the exact styling shown here.

Additional per-group previews for positive/negative population correlations and the event-based activity/correlation/modulation plots are produced automatically when those modalities are analyzed. Similarly, comparison-level modulation prevalence figures (states/epochs_comparison_{trace|event}_modulation.svg) accompany the up/down count charts even though only the trace-based epoch examples are illustrated below.

Group-level activity and correlation¶

*Group 1 (Control) trace activity distribution across epochs*

*Group 2 (Treatment) trace activity distribution across epochs*

*Group 1 per-cell correlation statistics (boxplot)*

*Group 1 per-cell correlation statistics (CDF)*

*Group 2 per-cell correlation statistics (boxplot)*

*Group 2 per-cell correlation statistics (CDF)*

Modulation distributions¶

Comparison summaries¶

*Epoch-level comparison of trace activity between groups*

*Epoch-level comparison of correlation statistics*

*Counts of significantly up-modulated neurons per epoch*

*Counts of significantly down-modulated neurons per epoch*