Skip to content

Combine and Compare Population Activity Data

Compute Credits

This tool uses 1.0 compute credits per hour.

Overview

This tool combines population activity metrics (trace activity and optionally event rates) from multiple recordings, potentially across two experimental groups. It performs statistical comparisons to identify differences between defined behavioral states and, if applicable, between the two groups.

The tool employs a multi-level statistical approach:

  1. Cell-Level State Comparison: Utilizes a Linear Mixed Model (LMM) to analyze the activity or event rate of individual cells, accounting for repeated measures within subjects. This compares activity across different states and assesses the interaction between state and group (if two groups are provided).
  2. Subject-Level Modulation Comparison: Performs ANOVA (Repeated Measures or Mixed, depending on pairing) on the number of significantly modulated (up or down) cells per subject per state. This compares the overall modulation counts across states and groups.
  3. Subject-Level Activity/Event Rate Comparison: Calculates the average activity or event rate per subject for each state (and group, if applicable). It then performs ANOVA (Repeated Measures for paired data, Mixed for unpaired data) on these subject averages to compare overall activity levels between groups and states.

Key features include: * Handling of single-group or two-group experimental designs. * Support for paired (e.g., within-subject treatment) or unpaired (independent groups) comparisons between groups, with options for subject matching (number, filename, order). * Separate comparisons for average trace activity and event rate (event rate files are optional). * User-selectable options for multiple comparison correction (e.g., Bonferroni, Benjamini-Hochberg) and effect size calculation (e.g., Cohen's d, eta-squared).

Parameters

Parameter Required? Default Description
Group 1 Population Activity Files True N/A Select population activity data from the first group to use for analysis
Group 1 Population Event Rate Files False N/A Select population event rate data from the first group to use for analysis
Group 1 Name False group1 Name of the first group
Group 1 Color False tab:red Color for the first group
Group 2 Population Activity Files False N/A Select population activity data from the second group to use for analysis
Group 2 Population Event Rate Files False N/A Select population event rate data from the second group to use for analysis
Group 2 Name False N/A Name of the second group
Group 2 Color False N/A Color for the second group
State Names False N/A Names of the state comparisons in the columns of the input files
State Colors False N/A Colors for the state comparisons
Modulation Colors False N/A Colors that represent up- and down-modulation respectively
Significance Threshold False N/A p-value threshold
Multiple Comparison Correction True N/A Method for correcting for multiple comparisons
Multiple Comparison Correction True N/A Method for calculating the effect size
Data Pairing False unpaired Indicates whether observations should be paired for statistical comparison.
Subject Matching Method False order Method for matching subjects between groups in paired analysis

Input Files

The inputs to this tool are population activity metric files generated by the Compare Neural Activity Across States tool. Event rate files are optional but must be provided for both groups if used in a two-group comparison.

Input Requirements

  • Each group must contain at least two recording files (subjects).
  • All input files (activity and events) must contain data for the same set of states with identical state names. The tool will validate this consistency.
  • If data_pairing is set to "paired" for a two-group comparison:
    • Both groups must have the same number of recording files.
    • The subject_matching parameter determines how files are paired between groups. Ensure filenames or numbering allow for correct matching.
  • If state names are numeric, they might be processed as numbers initially (e.g., by pandas), potentially truncating characters like '+' or '.0'. Ensure consistent naming.
  • Event files, if provided, must correspond to the activity files (e.g., same number of files per group, matchable subjects for paired analysis).
  • Consistency from Upstream Tool: All input CSV files must be generated using consistent parameters in the upstream Population Activity tool:
    • method: This is a CRITICAL requirement. The comparison method used in the upstream tool MUST NOT be PAIRWISE. You MUST use one of the methods that compares each state to a reference, such as NOT_STATE (state vs. not state), BASELINE (state vs. a specified baseline state), or NOT_DEFINED (state vs. timepoints not belonging to any listed state). This downstream tool requires input columns named like ... in {state}. The PAIRWISE method generates columns named ... in {s1} vs {s2}, which are incompatible.
    • state_names: The exact same states must have been analyzed.
    • trace_scale_method / event_scale_method: The scaling method applied to traces (or events) must be identical across all respective input files to ensure valid comparisons of the mean activity/mean event rate columns.
    • column_name: The same column must have been used to define states.
    • n_shuffle / alpha: While the tool can re-evaluate significance based on its own significance_threshold, using consistent parameters for the upstream statistical calculations (n_shuffle, alpha) is recommended for comparable input p-values.
Source Parameter File Type File Format
Group 1 Population Activity Files modulation_data csv
Group 1 Population Event Rate Files modulation_data csv
Group 2 Population Activity Files modulation_data csv
Group 2 Population Event Rate Files modulation_data csv

Algorithm Description

The workflow involves combining data, performing cell-level and subject-level statistical analyses, and generating outputs.

graph TD A[Input Files Group 1] --> C; B[Input Files Group 2]; C[Combine Data Group 1]; subgraph Group 2 Processing B --> D[Combine Data Group 2]; end M[Merge Data]; C --> M; D -- Optional Input --> M; E[Cell Level State Comparison LMM]; F[Cell Level Pairwise Tests]; G[Subject Level Modulation Analysis ANOVA]; H[Subject Level Modulation Pairwise]; I[Subject Level Activity Event Rate ANOVA]; K[Subject Level Activity Event Rate Pairwise]; L[Generate Output Files and Previews]; M --> E; M --> G; M --> I; E --> F; G --> H; I --> K; F --> L; H --> L; K --> L;

The tool executes the following steps:

  1. Data Combination: Concatenates data from all files within each group. Adds metadata columns like file, subject_id, normalized_subject_id, and Comparison (e.g., 'trace_activity', 'event_rate'). Re-classifies neuron modulation based on the provided significance threshold if necessary.
  2. Cell-Level Analysis (LMM): Fits a Linear Mixed Model to the individual cell activity/event rate data. The model typically includes fixed effects for state, group (if applicable), and their interaction, with random intercepts for subjects to account for repeated measures. (statsmodels.regression.mixed_linear_model.MixedLM)
  3. Subject-Level Modulation Analysis (ANOVA): Counts the number of significantly up-modulated and down-modulated cells per subject, per state, per group. Performs ANOVA (pingouin.rm_anova or pingouin.mixed_anova) on these counts to compare modulation across states and groups.
  4. Subject-Level Activity/Event Rate Analysis (ANOVA): Calculates the average activity/event rate per subject, per state, per group. Performs ANOVA (pingouin.rm_anova for paired, pingouin.mixed_anova for unpaired) on these averages.
  5. Pairwise Comparisons: Performs post-hoc pairwise tests (pingouin.pairwise_tests) following LMM and ANOVA analyses to identify specific differences between states or groups. Applies user-selected multiple comparison correction and calculates effect sizes. Normality is checked (pingouin.normality via Shapiro-Wilk) internally by pairwise_tests to choose between parametric (t-test) and non-parametric (Wilcoxon) tests.
  6. Output Generation: Saves statistical results (LMM, ANOVA, pairwise) to CSV files and generates preview plots (histograms, violin plots, scatter plots with means/connections).

Statistical Methods

Linear Mixed Model (LMM)

Used for cell-level analysis of activity/event rates. LMMs are suitable for hierarchical data (cells within subjects) and repeated measures (multiple states). They model fixed effects (state, group, interaction) while accounting for random variability between subjects. The tool uses statsmodels.regression.mixed_linear_model.MixedLM.

ANOVA (Analysis of Variance)

Used for subject-level comparisons (modulation counts and averaged activity/event rates). * Repeated Measures (RM) ANOVA: Used when comparing states within a single group, or when comparing two paired groups (pingouin.rm_anova). Accounts for the dependency between measurements from the same subject. * Mixed ANOVA: Used when comparing two unpaired groups (pingouin.mixed_anova). Includes a within-subject factor (state) and a between-subject factor (group).

Pairwise Comparisons

Performed after significant LMM or ANOVA results using pingouin.pairwise_tests. * Normality Check: Data subsets are checked for normality using the Shapiro-Wilk test (pingouin.normality) before choosing the appropriate test. * Tests: Parametric tests (t-tests) are used for normally distributed data, while non-parametric tests (Wilcoxon signed-rank for paired, Mann-Whitney U for unpaired) are used otherwise. * Correction & Effect Size: User-specified multiple comparison correction and effect size methods are applied.

Effect Size Calculation Options

Method Description
Cohen's d A measure of effect size that expresses the difference between two means in standard deviation units. Used in pairwise tests.
Hedges' g Similar to Cohen's d, but includes a correction for small sample sizes. Used in pairwise tests.
eta-squared (η²) / partial η² (np2) Measures the proportion of variance explained by a factor in ANOVA. np2 is reported by pingouin ANOVA functions.
Odds ratio A measure of association between two binary variables. Available in pairwise_tests.
Area under the curve (AUC) Related to ROC curves, available in pairwise_tests (often from Mann-Whitney U).
Common Language Effect Size (CLES) Probability that a random score from one group > random score from another. Available in pairwise_tests.
r (correlation coefficient) Effect size for non-parametric tests like Wilcoxon. Available in pairwise_tests.

Multiple Comparison Correction Options

Method Description
Bonferroni ('bonf') Adjusts p-value threshold by dividing by the number of comparisons. Controls FWER.
Sidak ('sidak') Similar to Bonferroni but less conservative, assumes independent tests. Controls FWER.
Holm-Bonferroni ('holm') Step-down method, more powerful than Bonferroni. Controls FWER.
Benjamini-Hochberg ('fdr_bh') Controls the False Discovery Rate (FDR). More powerful, especially with many tests.
Benjamini-Yekutieli ('fdr_by') Controls FDR under dependency assumptions. More conservative than Benjamini-Hochberg.
None ('none') No correction applied. Increases risk of Type I errors.

Outputs

Combination Outputs

Combination Data

A csv file (e.g., population_activity_data_{group_name}.csv) containing the combined raw data from all recordings in a group, including added metadata.

Example columns: Name, Modulation Scores (State 1), P-Values (State 1), Modulation (State 1), Mean Activity (State 1), ..., File, Comparison (e.g., 'trace_activity'), subject_id, normalized_subject_id, total_cell_count.

Combination Previews

(Previews are associated with the combined group data file) * Modulation Distribution: Histogram showing the distribution of modulation scores per state for the group (e.g., activity_modulation_distribution_{group_name}.preview.svg).

Statistical Comparison Outputs

ANOVA/LMM Comparison Data (aov_comparisons.csv)

A csv file containing the results from LMM (cell-level) and ANOVA (subject-level) analyses.

Example Output:

Source sum_squares df1 df2 mean_square F_statistic p_value effect_size sphericity analysis_level comparison_type Comparison Measure status group
state 450.67 1 4.0 450.67 1.72 0.26 0.30 NaN subject state trace_activity modulation up_modulated Group 1
Intercept 123.45 1 150.2 123.45 25.6 0.001 NaN NaN cell state trace_activity activity NaN Group 1
state 56.78 1 150.2 56.78 11.8 0.002 NaN NaN cell state trace_activity activity NaN Group 1
group 21.60 1 3.0 21.60 0.07 0.81 0.02 NaN subject group trace_activity modulation up_modulated combined
state:group 48.40 1 3.0 48.40 1.22 0.35 0.29 1.0 subject interaction trace_activity modulation up_modulated combined

Column Descriptions:

Column Name Description
Source Source of variance (e.g., 'state', 'group', 'state:group', 'Intercept') or Coefficient name for LMM.
sum_squares (SS) Sum of squares (ANOVA).
df1 Numerator degrees of freedom (ANOVA).
df2 Denominator degrees of freedom (ANOVA) or Residual degrees of freedom (LMM).
mean_square (MS) Mean square (SS / df1) (ANOVA).
F_statistic (F) F-statistic (ANOVA).
p_value P-value (uncorrected or from LMM).
effect_size Effect size (e.g., partial eta-squared 'np2' from ANOVA).
sphericity Epsilon value for sphericity correction (ANOVA).
analysis_level 'cell' (LMM) or 'subject' (ANOVA).
comparison_type 'state', 'group', or 'interaction'.
Comparison Metric compared (e.g., 'trace_activity', 'event_rate').
Measure 'activity', 'modulation'.
status 'up_modulated', 'down_modulated' (for modulation analysis).
group Group associated with the stats (specific group name or 'combined').
LMM Specific coefficient, std_error, t_value/z_value, conf_int_lower, conf_int_upper may also be present.

Pairwise Comparison Data (pairwise_comparisons.csv)

A csv file containing the results from post-hoc pairwise comparisons following LMM and ANOVA.

Example Output:

Contrast A B Paired Parametric T dof alternative p-unc p-corr p-adjust BF10 cohen Comparison Measure status analysis_level comparison_type group State U-val W-val
state immobile mobile True False NaN NaN two-sided 0.38 0.76 bonf 0.5 -1.07 trace_activity modulation up_modulated subject state Group 1 NaN NaN 2.0
state immobile mobile True True -2.5 150.2 two-sided 0.015 0.03 bonf 3.1 -0.45 trace_activity activity NaN cell state Group 1 NaN NaN NaN
group Group 1 Group 2 False True 1.9 8.0 two-sided 0.09 0.18 bonf 1.2 0.88 trace_activity activity NaN subject group combined immobile NaN NaN
state * group Group 1 Group 2 False True 0.5 8.0 two-sided 0.63 1.0 bonf 0.3 0.25 trace_activity activity NaN subject interaction combined mobile NaN NaN

Column Descriptions: (Common columns shown, specific test columns like T, U-val, W-val vary)

Column Name Description
Contrast Type of comparison (e.g., 'state', 'group', 'state * group').
A First group/condition in the comparison.
B Second group/condition in the comparison.
Paired Whether the test was paired.
Parametric Whether a parametric test (e.g., t-test) was used.
T / U-val / W-val Test statistic (depends on the test run).
dof Degrees of freedom.
alternative Hypothesis direction ('two-sided', 'less', 'greater').
p-unc Uncorrected p-value.
p-corr Corrected p-value (if correction applied).
p-adjust Multiple comparison correction method used.
BF10 Bayes Factor (if calculated).
Effect Size Calculated effect size (column name matches method, e.g., 'cohen', 'AUC', 'CLES').
Comparison Metric compared (e.g., 'trace_activity').
Measure 'activity', 'modulation'.
status 'up_modulated', 'down_modulated' (for modulation analysis).
analysis_level 'cell' or 'subject'.
comparison_type 'state', 'group', 'interaction'.
group Group associated with the stats.
State State involved in the comparison (often for 'group' or 'interaction' contrasts).

Statistical Comparison Previews

(Previews are associated with both aov_comparisons.csv and pairwise_comparisons.csv)

  • State Comparison (LMM based): Violin plot showing cell-level activity/event rate distribution per state, colored by group (if applicable). Title may show significance from LMM/pairwise tests. (e.g., activity_state_lmm.preview.svg)

    Example: State Comparison (LMM)

  • Group Comparison (ANOVA based): Scatter plot showing subject-averaged activity/event rate per state, colored by group. Points for the same subject may be connected for paired analyses. Title may show significance from ANOVA/pairwise tests. (e.g., activity_group_mixed_anova.preview.svg or activity_group_rm_anova.preview.svg)

    Example: Activity Group Comparison (Repeated Measures)

    Example: Events Group Comparison (Repeated Measures)

  • Modulation Comparison (ANOVA based): Scatter plot showing the number of modulated cells per subject per state, colored by group. Similar to the group comparison plot but for modulation counts. Title may show significance. (e.g., activity_modulation_distribution.preview.svg - note: filename might overlap with combination preview, but content shows comparison)

    Example: Modulation Comparison (ANOVA)