Combine and Compare Population Activity Data¶
This tool uses 1.0 compute credits per hour.
Overview¶
This tool combines population activity metrics (trace activity and optionally event rates) from multiple recordings, potentially across two experimental groups. It performs statistical comparisons to identify differences between defined behavioral states and, if applicable, between the two groups.
The tool employs a multi-level statistical approach:
- Cell-Level State Comparison: Utilizes a Linear Mixed Model (LMM) to analyze the activity or event rate of individual cells, accounting for repeated measures within subjects. This compares activity across different states and assesses the interaction between state and group (if two groups are provided).
- Subject-Level Modulation Comparison: Performs ANOVA (Repeated Measures or Mixed, depending on pairing) on the number of significantly modulated (up or down) cells per subject per state. This compares the overall modulation counts across states and groups.
- Subject-Level Activity/Event Rate Comparison: Calculates the average activity or event rate per subject for each state (and group, if applicable). It then performs ANOVA (Repeated Measures for paired data, Mixed for unpaired data) on these subject averages to compare overall activity levels between groups and states.
Key features include:
* Handling of single-group or two-group experimental designs.
* Support for paired (e.g., within-subject treatment) or unpaired (independent groups) comparisons between groups, with options for subject matching (number
, filename
, order
).
* Separate comparisons for average trace activity and event rate (event rate files are optional).
* User-selectable options for multiple comparison correction (e.g., Bonferroni, Benjamini-Hochberg) and effect size calculation (e.g., Cohen's d, eta-squared).
Parameters¶
Parameter | Required? | Default | Description |
---|---|---|---|
Group 1 Population Activity Files | True | N/A | Select population activity data from the first group to use for analysis |
Group 1 Population Event Rate Files | False | N/A | Select population event rate data from the first group to use for analysis |
Group 1 Name | False | group1 | Name of the first group |
Group 1 Color | False | tab:red | Color for the first group |
Group 2 Population Activity Files | False | N/A | Select population activity data from the second group to use for analysis |
Group 2 Population Event Rate Files | False | N/A | Select population event rate data from the second group to use for analysis |
Group 2 Name | False | N/A | Name of the second group |
Group 2 Color | False | N/A | Color for the second group |
State Names | False | N/A | Names of the state comparisons in the columns of the input files |
State Colors | False | N/A | Colors for the state comparisons |
Modulation Colors | False | N/A | Colors that represent up- and down-modulation respectively |
Significance Threshold | False | N/A | p-value threshold |
Multiple Comparison Correction | True | N/A | Method for correcting for multiple comparisons |
Multiple Comparison Correction | True | N/A | Method for calculating the effect size |
Data Pairing | False | unpaired | Indicates whether observations should be paired for statistical comparison. |
Subject Matching Method | False | order | Method for matching subjects between groups in paired analysis |
Input Files¶
The inputs to this tool are population activity metric files generated by the Compare Neural Activity Across States tool. Event rate files are optional but must be provided for both groups if used in a two-group comparison.
Input Requirements¶
- Each group must contain at least two recording files (subjects).
- All input files (activity and events) must contain data for the same set of states with identical state names. The tool will validate this consistency.
- If
data_pairing
is set to "paired" for a two-group comparison:- Both groups must have the same number of recording files.
- The
subject_matching
parameter determines how files are paired between groups. Ensure filenames or numbering allow for correct matching.
- If state names are numeric, they might be processed as numbers initially (e.g., by pandas), potentially truncating characters like '+' or '.0'. Ensure consistent naming.
- Event files, if provided, must correspond to the activity files (e.g., same number of files per group, matchable subjects for paired analysis).
- Consistency from Upstream Tool: All input CSV files must be generated using consistent parameters in the upstream Population Activity tool:
method
: This is a CRITICAL requirement. The comparison method used in the upstream tool MUST NOT bePAIRWISE
. You MUST use one of the methods that compares each state to a reference, such asNOT_STATE
(state vs. not state),BASELINE
(state vs. a specified baseline state), orNOT_DEFINED
(state vs. timepoints not belonging to any listed state). This downstream tool requires input columns named like... in {state}
. ThePAIRWISE
method generates columns named... in {s1} vs {s2}
, which are incompatible.state_names
: The exact same states must have been analyzed.trace_scale_method
/event_scale_method
: The scaling method applied to traces (or events) must be identical across all respective input files to ensure valid comparisons of themean activity
/mean event rate
columns.column_name
: The same column must have been used to define states.n_shuffle
/alpha
: While the tool can re-evaluate significance based on its ownsignificance_threshold
, using consistent parameters for the upstream statistical calculations (n_shuffle
,alpha
) is recommended for comparable input p-values.
Source Parameter | File Type | File Format |
---|---|---|
Group 1 Population Activity Files | modulation_data | csv |
Group 1 Population Event Rate Files | modulation_data | csv |
Group 2 Population Activity Files | modulation_data | csv |
Group 2 Population Event Rate Files | modulation_data | csv |
Algorithm Description¶
The workflow involves combining data, performing cell-level and subject-level statistical analyses, and generating outputs.
The tool executes the following steps:
- Data Combination: Concatenates data from all files within each group. Adds metadata columns like
file
,subject_id
,normalized_subject_id
, andComparison
(e.g., 'trace_activity', 'event_rate'). Re-classifies neuron modulation based on the provided significance threshold if necessary. - Cell-Level Analysis (LMM): Fits a Linear Mixed Model to the individual cell activity/event rate data. The model typically includes fixed effects for state, group (if applicable), and their interaction, with random intercepts for subjects to account for repeated measures. (
statsmodels.regression.mixed_linear_model.MixedLM
) - Subject-Level Modulation Analysis (ANOVA): Counts the number of significantly up-modulated and down-modulated cells per subject, per state, per group. Performs ANOVA (
pingouin.rm_anova
orpingouin.mixed_anova
) on these counts to compare modulation across states and groups. - Subject-Level Activity/Event Rate Analysis (ANOVA): Calculates the average activity/event rate per subject, per state, per group. Performs ANOVA (
pingouin.rm_anova
for paired,pingouin.mixed_anova
for unpaired) on these averages. - Pairwise Comparisons: Performs post-hoc pairwise tests (
pingouin.pairwise_tests
) following LMM and ANOVA analyses to identify specific differences between states or groups. Applies user-selected multiple comparison correction and calculates effect sizes. Normality is checked (pingouin.normality
via Shapiro-Wilk) internally bypairwise_tests
to choose between parametric (t-test) and non-parametric (Wilcoxon) tests. - Output Generation: Saves statistical results (LMM, ANOVA, pairwise) to CSV files and generates preview plots (histograms, violin plots, scatter plots with means/connections).
Statistical Methods¶
Linear Mixed Model (LMM)¶
Used for cell-level analysis of activity/event rates. LMMs are suitable for hierarchical data (cells within subjects) and repeated measures (multiple states). They model fixed effects (state, group, interaction) while accounting for random variability between subjects. The tool uses statsmodels.regression.mixed_linear_model.MixedLM
.
ANOVA (Analysis of Variance)¶
Used for subject-level comparisons (modulation counts and averaged activity/event rates).
* Repeated Measures (RM) ANOVA: Used when comparing states within a single group, or when comparing two paired groups (pingouin.rm_anova
). Accounts for the dependency between measurements from the same subject.
* Mixed ANOVA: Used when comparing two unpaired groups (pingouin.mixed_anova
). Includes a within-subject factor (state) and a between-subject factor (group).
Pairwise Comparisons¶
Performed after significant LMM or ANOVA results using pingouin.pairwise_tests
.
* Normality Check: Data subsets are checked for normality using the Shapiro-Wilk test (pingouin.normality
) before choosing the appropriate test.
* Tests: Parametric tests (t-tests) are used for normally distributed data, while non-parametric tests (Wilcoxon signed-rank for paired, Mann-Whitney U for unpaired) are used otherwise.
* Correction & Effect Size: User-specified multiple comparison correction and effect size methods are applied.
Effect Size Calculation Options¶
Method | Description |
---|---|
Cohen's d | A measure of effect size that expresses the difference between two means in standard deviation units. Used in pairwise tests. |
Hedges' g | Similar to Cohen's d, but includes a correction for small sample sizes. Used in pairwise tests. |
eta-squared (η²) / partial η² (np2) | Measures the proportion of variance explained by a factor in ANOVA. np2 is reported by pingouin ANOVA functions. |
Odds ratio | A measure of association between two binary variables. Available in pairwise_tests . |
Area under the curve (AUC) | Related to ROC curves, available in pairwise_tests (often from Mann-Whitney U). |
Common Language Effect Size (CLES) | Probability that a random score from one group > random score from another. Available in pairwise_tests . |
r (correlation coefficient) | Effect size for non-parametric tests like Wilcoxon. Available in pairwise_tests . |
Multiple Comparison Correction Options¶
Method | Description |
---|---|
Bonferroni ('bonf') | Adjusts p-value threshold by dividing by the number of comparisons. Controls FWER. |
Sidak ('sidak') | Similar to Bonferroni but less conservative, assumes independent tests. Controls FWER. |
Holm-Bonferroni ('holm') | Step-down method, more powerful than Bonferroni. Controls FWER. |
Benjamini-Hochberg ('fdr_bh') | Controls the False Discovery Rate (FDR). More powerful, especially with many tests. |
Benjamini-Yekutieli ('fdr_by') | Controls FDR under dependency assumptions. More conservative than Benjamini-Hochberg. |
None ('none') | No correction applied. Increases risk of Type I errors. |
Outputs¶
Combination Outputs¶
Combination Data¶
A csv
file (e.g., population_activity_data_{group_name}.csv
) containing the combined raw data from all recordings in a group, including added metadata.
Example columns: Name
, Modulation Scores (State 1)
, P-Values (State 1)
, Modulation (State 1)
, Mean Activity (State 1)
, ..., File
, Comparison
(e.g., 'trace_activity'), subject_id
, normalized_subject_id
, total_cell_count
.
Combination Previews¶
(Previews are associated with the combined group data file)
* Modulation Distribution: Histogram showing the distribution of modulation scores per state for the group (e.g., activity_modulation_distribution_{group_name}.preview.svg
).
Statistical Comparison Outputs¶
ANOVA/LMM Comparison Data (aov_comparisons.csv
)¶
A csv
file containing the results from LMM (cell-level) and ANOVA (subject-level) analyses.
Example Output:
Source | sum_squares | df1 | df2 | mean_square | F_statistic | p_value | effect_size | sphericity | analysis_level | comparison_type | Comparison | Measure | status | group |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
state | 450.67 | 1 | 4.0 | 450.67 | 1.72 | 0.26 | 0.30 | NaN | subject | state | trace_activity | modulation | up_modulated | Group 1 |
Intercept | 123.45 | 1 | 150.2 | 123.45 | 25.6 | 0.001 | NaN | NaN | cell | state | trace_activity | activity | NaN | Group 1 |
state | 56.78 | 1 | 150.2 | 56.78 | 11.8 | 0.002 | NaN | NaN | cell | state | trace_activity | activity | NaN | Group 1 |
group | 21.60 | 1 | 3.0 | 21.60 | 0.07 | 0.81 | 0.02 | NaN | subject | group | trace_activity | modulation | up_modulated | combined |
state:group | 48.40 | 1 | 3.0 | 48.40 | 1.22 | 0.35 | 0.29 | 1.0 | subject | interaction | trace_activity | modulation | up_modulated | combined |
Column Descriptions:
Column Name | Description |
---|---|
Source | Source of variance (e.g., 'state', 'group', 'state:group', 'Intercept') or Coefficient name for LMM. |
sum_squares (SS) | Sum of squares (ANOVA). |
df1 | Numerator degrees of freedom (ANOVA). |
df2 | Denominator degrees of freedom (ANOVA) or Residual degrees of freedom (LMM). |
mean_square (MS) | Mean square (SS / df1) (ANOVA). |
F_statistic (F) | F-statistic (ANOVA). |
p_value | P-value (uncorrected or from LMM). |
effect_size | Effect size (e.g., partial eta-squared 'np2' from ANOVA). |
sphericity | Epsilon value for sphericity correction (ANOVA). |
analysis_level | 'cell' (LMM) or 'subject' (ANOVA). |
comparison_type | 'state', 'group', or 'interaction'. |
Comparison | Metric compared (e.g., 'trace_activity', 'event_rate'). |
Measure | 'activity', 'modulation'. |
status | 'up_modulated', 'down_modulated' (for modulation analysis). |
group | Group associated with the stats (specific group name or 'combined'). |
LMM Specific | coefficient , std_error , t_value /z_value , conf_int_lower , conf_int_upper may also be present. |
Pairwise Comparison Data (pairwise_comparisons.csv
)¶
A csv
file containing the results from post-hoc pairwise comparisons following LMM and ANOVA.
Example Output:
Contrast | A | B | Paired | Parametric | T | dof | alternative | p-unc | p-corr | p-adjust | BF10 | cohen | Comparison | Measure | status | analysis_level | comparison_type | group | State | U-val | W-val |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
state | immobile | mobile | True | False | NaN | NaN | two-sided | 0.38 | 0.76 | bonf | 0.5 | -1.07 | trace_activity | modulation | up_modulated | subject | state | Group 1 | NaN | NaN | 2.0 |
state | immobile | mobile | True | True | -2.5 | 150.2 | two-sided | 0.015 | 0.03 | bonf | 3.1 | -0.45 | trace_activity | activity | NaN | cell | state | Group 1 | NaN | NaN | NaN |
group | Group 1 | Group 2 | False | True | 1.9 | 8.0 | two-sided | 0.09 | 0.18 | bonf | 1.2 | 0.88 | trace_activity | activity | NaN | subject | group | combined | immobile | NaN | NaN |
state * group | Group 1 | Group 2 | False | True | 0.5 | 8.0 | two-sided | 0.63 | 1.0 | bonf | 0.3 | 0.25 | trace_activity | activity | NaN | subject | interaction | combined | mobile | NaN | NaN |
Column Descriptions: (Common columns shown, specific test columns like T, U-val, W-val vary)
Column Name | Description |
---|---|
Contrast | Type of comparison (e.g., 'state', 'group', 'state * group'). |
A | First group/condition in the comparison. |
B | Second group/condition in the comparison. |
Paired | Whether the test was paired. |
Parametric | Whether a parametric test (e.g., t-test) was used. |
T / U-val / W-val | Test statistic (depends on the test run). |
dof | Degrees of freedom. |
alternative | Hypothesis direction ('two-sided', 'less', 'greater'). |
p-unc | Uncorrected p-value. |
p-corr | Corrected p-value (if correction applied). |
p-adjust | Multiple comparison correction method used. |
BF10 | Bayes Factor (if calculated). |
Effect Size | Calculated effect size (column name matches method, e.g., 'cohen', 'AUC', 'CLES'). |
Comparison | Metric compared (e.g., 'trace_activity'). |
Measure | 'activity', 'modulation'. |
status | 'up_modulated', 'down_modulated' (for modulation analysis). |
analysis_level | 'cell' or 'subject'. |
comparison_type | 'state', 'group', 'interaction'. |
group | Group associated with the stats. |
State | State involved in the comparison (often for 'group' or 'interaction' contrasts). |
Statistical Comparison Previews¶
(Previews are associated with both aov_comparisons.csv
and pairwise_comparisons.csv
)
-
State Comparison (LMM based): Violin plot showing cell-level activity/event rate distribution per state, colored by group (if applicable). Title may show significance from LMM/pairwise tests. (e.g.,
activity_state_lmm.preview.svg
)Example: State Comparison (LMM)
-
Group Comparison (ANOVA based): Scatter plot showing subject-averaged activity/event rate per state, colored by group. Points for the same subject may be connected for paired analyses. Title may show significance from ANOVA/pairwise tests. (e.g.,
activity_group_mixed_anova.preview.svg
oractivity_group_rm_anova.preview.svg
)Example: Activity Group Comparison (Repeated Measures)
Example: Events Group Comparison (Repeated Measures)
-
Modulation Comparison (ANOVA based): Scatter plot showing the number of modulated cells per subject per state, colored by group. Similar to the group comparison plot but for modulation counts. Title may show significance. (e.g.,
activity_modulation_distribution.preview.svg
- note: filename might overlap with combination preview, but content shows comparison)Example: Modulation Comparison (ANOVA)