Combine and Compare Population Activity Data¶

This tool uses 1.0 compute credits per hour.

Overview¶

This tool combines population activity metrics (trace activity and optionally event rates) from multiple recordings, potentially across two experimental groups. It performs statistical comparisons to identify differences between defined behavioral states and, if applicable, between the two groups.

The tool employs a multi-level statistical approach:

Cell-Level State Comparison: Utilizes a Linear Mixed Model (LMM) to analyze the activity or event rate of individual cells, accounting for repeated measures within subjects. This compares activity across different states and assesses the interaction between state and group (if two groups are provided).
Subject-Level Modulation Comparison: Performs ANOVA (Repeated Measures or Mixed, depending on pairing) on the number of significantly modulated (up or down) cells per subject per state. This compares the overall modulation counts across states and groups.
Subject-Level Activity/Event Rate Comparison: Calculates the average activity or event rate per subject for each state (and group, if applicable). It then performs ANOVA (Repeated Measures for paired data, Mixed for unpaired data) on these subject averages to compare overall activity levels between groups and states.

Key features include: * Handling of single-group or two-group experimental designs. * Support for paired (e.g., within-subject treatment) or unpaired (independent groups) comparisons between groups, with options for subject matching (number, filename, order). * Separate comparisons for average trace activity and event rate (event rate files are optional). * User-selectable options for multiple comparison correction (e.g., Bonferroni, Benjamini-Hochberg) and effect size calculation (e.g., Cohen's d, eta-squared).

Parameters¶

Parameter	Required?	Default	Description
Group 1 Population Activity Files	True	N/A	Select population activity data from the first group to use for analysis
Group 1 Population Event Rate Files	False	N/A	Select population event rate data from the first group to use for analysis
Group 1 Name	False	group1	Name of the first group
Group 1 Color	False	tab:red	Color for the first group
Group 2 Population Activity Files	False	N/A	Select population activity data from the second group to use for analysis
Group 2 Population Event Rate Files	False	N/A	Select population event rate data from the second group to use for analysis
Group 2 Name	False	N/A	Name of the second group
Group 2 Color	False	N/A	Color for the second group
State Names	False	N/A	Names of the state comparisons in the columns of the input files
State Colors	False	N/A	Colors for the state comparisons
Modulation Colors	False	N/A	Colors that represent up- and down-modulation respectively
Significance Threshold	False	N/A	p-value threshold
Multiple Comparison Correction	True	N/A	Method for correcting for multiple comparisons
Multiple Comparison Correction	True	N/A	Method for calculating the effect size
Data Pairing	False	unpaired	Indicates whether observations should be paired for statistical comparison.
Subject Matching Method	False	order	Method for matching subjects between groups in paired analysis

Input Files¶

The inputs to this tool are population activity metric files generated by the Compare Neural Activity Across States tool. Event rate files are optional but must be provided for both groups if used in a two-group comparison.

Input Requirements¶

Each group must contain at least two recording files (subjects).
All input files (activity and events) must contain data for the same set of states with identical state names. The tool will validate this consistency.
If data_pairing is set to "paired" for a two-group comparison:
- Both groups must have the same number of recording files.
- The subject_matching parameter determines how files are paired between groups. Ensure filenames or numbering allow for correct matching.
If state names are numeric, they might be processed as numbers initially (e.g., by pandas), potentially truncating characters like '+' or '.0'. Ensure consistent naming.
Event files, if provided, must correspond to the activity files (e.g., same number of files per group, matchable subjects for paired analysis).
Consistency from Upstream Tool: All input CSV files must be generated using consistent parameters in the upstream Population Activity tool:
- method: This is a CRITICAL requirement. The comparison method used in the upstream tool MUST NOT be PAIRWISE. You MUST use one of the methods that compares each state to a reference, such as NOT_STATE (state vs. not state), BASELINE (state vs. a specified baseline state), or NOT_DEFINED (state vs. timepoints not belonging to any listed state). This downstream tool requires input columns named like ... in {state}. The PAIRWISE method generates columns named ... in {s1} vs {s2}, which are incompatible.
- state_names: The exact same states must have been analyzed.
- trace_scale_method / event_scale_method: The scaling method applied to traces (or events) must be identical across all respective input files to ensure valid comparisons of the mean activity/mean event rate columns.
- column_name: The same column must have been used to define states.
- n_shuffle / alpha: While the tool can re-evaluate significance based on its own significance_threshold, using consistent parameters for the upstream statistical calculations (n_shuffle, alpha) is recommended for comparable input p-values.

Source Parameter	File Type	File Format
Group 1 Population Activity Files	modulation_data	csv
Group 1 Population Event Rate Files	modulation_data	csv
Group 2 Population Activity Files	modulation_data	csv
Group 2 Population Event Rate Files	modulation_data	csv

Algorithm Description¶

The workflow involves combining data, performing cell-level and subject-level statistical analyses, and generating outputs.

graph TD A[Input Files Group 1] --> C; B[Input Files Group 2]; C[Combine Data Group 1]; subgraph Group 2 Processing B --> D[Combine Data Group 2]; end M[Merge Data]; C --> M; D -- Optional Input --> M; E[Cell Level State Comparison LMM]; F[Cell Level Pairwise Tests]; G[Subject Level Modulation Analysis ANOVA]; H[Subject Level Modulation Pairwise]; I[Subject Level Activity Event Rate ANOVA]; K[Subject Level Activity Event Rate Pairwise]; L[Generate Output Files and Previews]; M --> E; M --> G; M --> I; E --> F; G --> H; I --> K; F --> L; H --> L; K --> L;

The tool executes the following steps:

Data Combination: Concatenates data from all files within each group. Adds metadata columns like file, subject_id, normalized_subject_id, and Comparison (e.g., 'trace_activity', 'event_rate'). Re-classifies neuron modulation based on the provided significance threshold if necessary.
Cell-Level Analysis (LMM): Fits a Linear Mixed Model to the individual cell activity/event rate data. The model typically includes fixed effects for state, group (if applicable), and their interaction, with random intercepts for subjects to account for repeated measures. (statsmodels.regression.mixed_linear_model.MixedLM)
Subject-Level Modulation Analysis (ANOVA): Counts the number of significantly up-modulated and down-modulated cells per subject, per state, per group. Performs ANOVA (pingouin.rm_anova or pingouin.mixed_anova) on these counts to compare modulation across states and groups.
Subject-Level Activity/Event Rate Analysis (ANOVA): Calculates the average activity/event rate per subject, per state, per group. Performs ANOVA (pingouin.rm_anova for paired, pingouin.mixed_anova for unpaired) on these averages.
Pairwise Comparisons: Performs post-hoc pairwise tests (pingouin.pairwise_tests) following LMM and ANOVA analyses to identify specific differences between states or groups. Applies user-selected multiple comparison correction and calculates effect sizes. Normality is checked (pingouin.normality via Shapiro-Wilk) internally by pairwise_tests to choose between parametric (t-test) and non-parametric (Wilcoxon) tests.
Output Generation: Saves statistical results (LMM, ANOVA, pairwise) to CSV files and generates preview plots (histograms, violin plots, scatter plots with means/connections).

Statistical Methods¶

Linear Mixed Model (LMM)¶

Used for cell-level analysis of activity/event rates. LMMs are suitable for hierarchical data (cells within subjects) and repeated measures (multiple states). They model fixed effects (state, group, interaction) while accounting for random variability between subjects. The tool uses statsmodels.regression.mixed_linear_model.MixedLM.

ANOVA (Analysis of Variance)¶

Used for subject-level comparisons (modulation counts and averaged activity/event rates). * Repeated Measures (RM) ANOVA: Used when comparing states within a single group, or when comparing two paired groups (pingouin.rm_anova). Accounts for the dependency between measurements from the same subject. * Mixed ANOVA: Used when comparing two unpaired groups (pingouin.mixed_anova). Includes a within-subject factor (state) and a between-subject factor (group).

Pairwise Comparisons¶

Performed after significant LMM or ANOVA results using pingouin.pairwise_tests. * Normality Check: Data subsets are checked for normality using the Shapiro-Wilk test (pingouin.normality) before choosing the appropriate test. * Tests: Parametric tests (t-tests) are used for normally distributed data, while non-parametric tests (Wilcoxon signed-rank for paired, Mann-Whitney U for unpaired) are used otherwise. * Correction & Effect Size: User-specified multiple comparison correction and effect size methods are applied.

Effect Size Calculation Options ¶

Method	Description
Cohen's d	A measure of effect size that expresses the difference between two means in standard deviation units. Used in pairwise tests.
Hedges' g	Similar to Cohen's d, but includes a correction for small sample sizes. Used in pairwise tests.
eta-squared (η²) / partial η² (np2)	Measures the proportion of variance explained by a factor in ANOVA. `np2` is reported by pingouin ANOVA functions.
Odds ratio	A measure of association between two binary variables. Available in `pairwise_tests`.
Area under the curve (AUC)	Related to ROC curves, available in `pairwise_tests` (often from Mann-Whitney U).
Common Language Effect Size (CLES)	Probability that a random score from one group > random score from another. Available in `pairwise_tests`.
r (correlation coefficient)	Effect size for non-parametric tests like Wilcoxon. Available in `pairwise_tests`.

Multiple Comparison Correction Options ¶

Method	Description
Bonferroni ('bonf')	Adjusts p-value threshold by dividing by the number of comparisons. Controls FWER.
Sidak ('sidak')	Similar to Bonferroni but less conservative, assumes independent tests. Controls FWER.
Holm-Bonferroni ('holm')	Step-down method, more powerful than Bonferroni. Controls FWER.
Benjamini-Hochberg ('fdr_bh')	Controls the False Discovery Rate (FDR). More powerful, especially with many tests.
Benjamini-Yekutieli ('fdr_by')	Controls FDR under dependency assumptions. More conservative than Benjamini-Hochberg.
None ('none')	No correction applied. Increases risk of Type I errors.

Outputs¶

Combination Outputs¶

Combination Data¶

A csv file (e.g., population_activity_data_{group_name}.csv) containing the combined raw data from all recordings in a group, including added metadata.

Example columns: Name, Modulation Scores (State 1), P-Values (State 1), Modulation (State 1), Mean Activity (State 1), ..., File, Comparison (e.g., 'trace_activity'), subject_id, normalized_subject_id, total_cell_count.

Combination Previews¶

(Previews are associated with the combined group data file) * Modulation Distribution: Histogram showing the distribution of modulation scores per state for the group (e.g., activity_modulation_distribution_{group_name}.preview.svg).

Statistical Comparison Outputs¶

ANOVA/LMM Comparison Data (`aov_comparisons.csv`)¶

A csv file containing the results from LMM (cell-level) and ANOVA (subject-level) analyses.

Example Output:

Source	sum_squares	df1	df2	mean_square	F_statistic	p_value	effect_size	sphericity	analysis_level	comparison_type	Comparison	Measure	status	group
state	450.67	1	4.0	450.67	1.72	0.26	0.30	NaN	subject	state	trace_activity	modulation	up_modulated	Group 1
Intercept	123.45	1	150.2	123.45	25.6	0.001	NaN	NaN	cell	state	trace_activity	activity	NaN	Group 1
state	56.78	1	150.2	56.78	11.8	0.002	NaN	NaN	cell	state	trace_activity	activity	NaN	Group 1
group	21.60	1	3.0	21.60	0.07	0.81	0.02	NaN	subject	group	trace_activity	modulation	up_modulated	combined
state:group	48.40	1	3.0	48.40	1.22	0.35	0.29	1.0	subject	interaction	trace_activity	modulation	up_modulated	combined

Column Descriptions:

Column Name	Description
Source	Source of variance (e.g., 'state', 'group', 'state:group', 'Intercept') or Coefficient name for LMM.
sum_squares (SS)	Sum of squares (ANOVA).
df1	Numerator degrees of freedom (ANOVA).
df2	Denominator degrees of freedom (ANOVA) or Residual degrees of freedom (LMM).
mean_square (MS)	Mean square (SS / df1) (ANOVA).
F_statistic (F)	F-statistic (ANOVA).
p_value	P-value (uncorrected or from LMM).
effect_size	Effect size (e.g., partial eta-squared 'np2' from ANOVA).
sphericity	Epsilon value for sphericity correction (ANOVA).
analysis_level	'cell' (LMM) or 'subject' (ANOVA).
comparison_type	'state', 'group', or 'interaction'.
Comparison	Metric compared (e.g., 'trace_activity', 'event_rate').
Measure	'activity', 'modulation'.
status	'up_modulated', 'down_modulated' (for modulation analysis).
group	Group associated with the stats (specific group name or 'combined').
LMM Specific	`coefficient`, `std_error`, `t_value`/`z_value`, `conf_int_lower`, `conf_int_upper` may also be present.

Pairwise Comparison Data (`pairwise_comparisons.csv`)¶

A csv file containing the results from post-hoc pairwise comparisons following LMM and ANOVA.