Combine And Compare Locomotion Data¶

This tool uses 1.0 compute credits per hour.

Overview¶

This tool combines locomotion metrics from multiple recordings and statistically compares them across two groups. Specifically, this tool tests whether the distributions of speeds and the proportions of time spent moving & at rest are statistically different between the 2 groups of input data.

To perform the statistical test, the user can choose:

whether or not the inputs are paired across the two groups
what specific hypothesis to test (one-tailed or two-tailed)
at what level to compare the data (by recording, by frame, or using hierarchical bootstrapping)

Parameters¶

Parameter	Required?	Default	Description
Group 1 Locomotion Files	True	N/A	Locomotion metrics files belonging to the first group
Group 2 Locomotion Files	True	N/A	Locomotion metrics files belonging to the second group
Group 1 Name	True	N/A	Name of the first group being compared
Group 2 Name	True	N/A	Name of the second group being compared
Group 1 Color	False	Blue	Display color for the first group
Group 2 Color	False	Orange	Display color for the second group
Sample Method	False	hierarchical bootstrap	Determine what method to compare the data at
Comparison Type	False	Two-Tailed (Unequal)	Alternative hypothesis for statistical analysis
Data Pairing	False	unpaired	Whether the data is paired or unpaired
Number Of Bootstrap Iterations	False	1000	How many times to sample during the bootstrap

Input Files¶

The inputs to this tool are a list of locomotion metrics files that are generated from the locomotion_metrics tool.

Input File Requirements¶

Each group must contain at least two recordings, otherwise the tool will terminate.
All input files must be generated using the same speed units to be compared (e.g. smoothed cm/s, cm/s, smoothed px/s, px/s). The tool preferentially looks for smoothed cm/s, then cm/s, then smoothed px/s, then px/s. If the input files have different speed units, the tool will terminate.
The input files must have more than one state in a recording to be compared. If the input files have only one state, the tool will terminate.

Source Parameter	File Type	File Format
Group 1 Locomotion Files	experiment_annotations, experiment_annotations	csv, parquet
Group 2 Locomotion Files	experiment_annotations, experiment_annotations	parquet, csv

Algorithm Description¶

The tool is summarized using a flowchart below.

graph TD A[Combine locomotion metrics files from first group] --> C[Compare speed data of the two input groups]; B[Combine locomotion metrics files data from second group] --> C; A --> D[Compare rest and move data of two input groups]; B --> D;

The tool begins by combining the data from each group. Then the two groups are compared to determine if there is statistically significant difference between the speeds estimates and the move and rest states.

Combination¶

The combination step is performed independently for each input group. The combination step concatenates the data from each recording end to end, retaining all original columns, and adds a column to indicate the source file.

Comparison¶

Two comparisons are made using this tool. The first comparison is made between the two groups to determine if there is a difference in the distribution of speed estimates. The second comparison is made to determine if there is a difference in the proportion of the recordings spent in either move or rest states. There are three sampling methods that can be used to compare the data, which will result in different statistical tests being performed.

Hierarchical Bootstrapping Method¶

If the hierarchical bootstrapping method is selected, the data will be hierarchically resampled to generate a distribution of mean estimates¹.

Hierarchical resampling involves resampling data with replacement at the level of the individual recordings, then resampling the frames of the recordings with replacement. For example, if there are 2 recordings in group 1 (A,B) and 3 recordings in group 2 (X,Y,Z), the hierarchical resampling method will resample 2 recordings from group 1 (e.g. AA, AB, BB, etc) and 3 recordings from group 2 (XXY, XYZ, YZZ, etc). The speed estimates of each the resampled recordings are then resampled with replacement, such that if recording A has 1000 frames, the hierarchical resampling method will resample 1000 frames. From this synthetic dataset, a single average metric is calculated for each group.

This process is repeated as many times as specified by the N Bootstraps parameter (by default 1000 times). The p-value is then calculated as the proportion of averages from group 1 that are greater than the averages from group 2. As a result, the precision of the p-value is directly correlated to the size of the null distribution, defined by the number of random shuffles. For example, if the null distribution is created by 1000 random shuffles, the resolution of the p-value will be 0.001. In this example the smallest p-value greater than 0 that can be obtained is 0.001. If there are no values in the group 1 that are greater than group 2, the p-value will be 0.

Figure

Hierarchical Bootstrap

a) An example of a hierarchical dataset. Here the dataset is divided into 3 levels, with the first level containing the experimental groups to be compared, the second containing the individual subjects and the third containing the individual neurons per subject. Each subject is color coded and the neurons per subject are distinguished by the position of the colored diamond. b) In “Traditional” statistics, the means for each group is computed across all the neurons and are then compared using a two sample t-test. c) In “Summarized” statistics, the mean for each subject is computed first. These means are then used to compute an overall mean for each group and the groups are compared using a two-sample t-test. d) In the “Hierarchical Bootstrap” method, we create new datasets Nbootstrap times by resampling with replacement first at the level of subjects followed by neurons within a subject. We then compute the mean across all neurons every time we perform resampling. The final statistic is computed on this population of resampled means.¹c is computed on this population of resampled means.¹

Move vs Rest State Comparison¶

If a one sided alternative hypothesis is selected, the move vs rest state comparison will perform opposite tests for the move and rest proportions. For example, if the alternative hypothesis is selected as greater, the following hypothesis will be tested:

group 1 move proportion < group 2 move proportion

group 1 rest proportion > group 2 rest proportion

because we expect the move and rest proportions to be inversely related.

By Recording¶

You can also compare data on a recording by recording basis. This method will average the speed estimates and the proportion of time spent in move and rest states for each recording. The following statistical test will then be performed on the averages of the recordings.

First the normality of the data is first tested based on D'Agostino and Pearson's²³. If the data is normally distributed, and the data is paired, a related t-test is performed. If the data is normally distributed and the data is unpaired, an independant t-test is performed. If the data is not normally distributed, and the data is paired, a Wilcoxon signed-rank test is performed. If the data is not normally distributed and the data is unpaired, a Mann-Whitney U test is performed.

Normality	Pairing	Test
Normal	Paired	Related t-test
Normal	Unpaired	Independent t-test
Non-normal	Paired	Wilcoxon signed-rank test
Non-normal	Unpaired	Mann-Whitney U test

Move vs Rest State Comparison¶

If a one sided alternative hypothesis is selected, the move vs rest state comparison will perform opposite tests for the move and rest proportions. For example, if the alternative hypothesis is selected as greater, the following hypothesis will be tested:

group 1 move proportion < group 2 move proportion

group 1 rest proportion > group 2 rest proportion

because we expect the move and rest proportions to be inversely related.

By Frame¶

You can also compare data on a frame by frame basis. This method will compare the total distribution of speed estimates per frame, discarding recording information. The same statistical tests as the recording by recording method will be performed on the speed estimates.

Move vs Rest State Comparison¶

For the move vs rest state comparison, a Two Proportion Z-test is performed on the total number of frames moving compared to non-moving, and resting compared to non-resting, discarding recording information (undefined is included in non-moving and non-resting).

If a one sided alternative hypothesis is selected, the move vs rest state comparison will perform opposite tests for the move and rest proportions. For example, if the alternative hypothesis is selected as greater, the following hypothesis will be tested:

group 1 move proportion < group 2 move proportion

group 1 rest proportion > group 2 rest proportion

because we expect the move and rest proportions to be inversely related.

Outputs¶

Combination Outputs¶

Combination Locomotion Data¶

A csv file in IDEAS experiment annotations format containing the locomotion activity data of all recordings in the group. This will carry over all the columns that were included in the input files, as well as a column that indicates the file that the data came from.

Here are the possible columns that can be included in the output file:

Required	Description
Global Frame Number	The frame number in the global video series
Movie Number	The number of the movie in the series
Local Frame Number	The frame number within a particular movie
Hardware counter (us)	Counter representing hardware timestamp in microseconds
Frame Timestamp (s)	Timestamp of the frame in seconds
Bounding Box Left	Left coordinate of the bounding box
Bounding Box Top	Top coordinate of the bounding box
Bounding Box Right	Right coordinate of the bounding box
Bounding Box Bottom	Bottom coordinate of the bounding box
Bounding Box Center X	X-coordinate of the center of the bounding box
Bounding Box Center Y	Y-coordinate of the center of the bounding box
Confidence	Confidence level of the detected subject
Displacement (px)	Frame-frame displacement of the object in pixels
Speed (px/s)	Speed of the object in pixels per second
File Name	Name of the file containing the frame

Situationally Present	Description
Smoothed Speed (px/s)	Filtered speed of the object in pixels per second
Displacement (cm)	Displacement of the object in centimeters
Speed (cm/s)	Speed of the object in centimeters per second
Smoothed Speed (cm/s)	Filtered speed of the object in centimeters per second
State	State of the subject (e.g., move, rest, undefined)

The output figures will vary depending on the sampling method selected.

Combination Figures¶

The combination figures will show a preview of the data in each group. If sample method is set to by recording or hierarchical bootstrap, the combination preview will show data averaged by recording.

Group 1 Recording Preview

Left column shows distribution of average speed for all recordings in group 1. Right panel shows the average percentage of time spent in each state for all recordings in group 1.

If sample method is set to by frame, the data will be all frames.

Group 1 Frame Preview

Left column shows distribution of speeds for all data in group 1. Right panel shows the percentage of time spent in each state for all data in group 1.

Comparison Ouputs¶

The comparison figures will also vary depending on whether the comparison is made using the hierarchical bootstrapping method or not.

Speed Comparison Outputs¶

Hierarchical Bootstrapping Method Data¶

The hierarchical bootstrapping method will output a csv file containing the distribution of average speed estimates for each group, where the number of values in the distribution is determined by the N Bootstraps parameter.

Example Output

Group	Speed
Control	7.8265
Control	4.2917
Drug	6.1593
Drug	8.7451

Hierarchical Bootstrapping Method Preview¶

The hierarchical bootstrapping method will show the distribution of average speed estimates for each group, where the number of values in the distribution is determined by the N Bootstraps parameter.

Speed Comparison Bootstrap

Comparison of speed estimates between group 1 and group 2. Left panel shows overlaid distributions of speed estimates. Right panel statistical result of comparison on a box plot

By Recording Data¶

The by recording method will output a csv file containing the average speed estimates for each recording in each group. The length of the file will be the number of recordings in all groups.

Group	File Name	Speed
Control	locomotion_1.csv	7.8265
Control	locomotion_2.csv	4.2917
Drug	locomotion_3.csv	6.1593
Drug	locomotion_4.csv	8.7451

By Recording Preview¶

The by recording method will show the average speed estimates for each recording in each group. The box plot will include a strip plot of the average of each recording.

Speed Comparison Recording

By Frame Data¶

The by frame method will output a csv file in IDEAS experiment annotations format containing the distribution of speed estimates for all frames in each group. The length of the file will be the number of frames in all groups.

Group	File Name	Speed
Control	locomotion_1.csv	0.7812
Control	locomotion_1.csv	0.4292
Control	locomotion_1.csv	0.6159
Control	locomotion_1.csv	0.8745

By Frame¶

The by frame method will show the distribution of speed estimates for all frames in each group.

Speed Comparison Frame

Comparison of speed estimates between group 1 and group 2. Left panel shows overlaid distributions of speed estimates. Right panel statistical result of comparison on a box plot

Move vs Rest State Comparison Outputs¶

Hierarchical Bootstrapping Method Data¶

The hierarchical bootstrapping method will output a csv file containing the distribution of the percent time moving between group 1 and group 2 for move and rest states. The estimates for moving and resting are calculated independently. The length of the file will be 2 * N Bootstraps.

Group	Moving	Resting
Control	0.2401	0.0365
Control	0.4823	0.1807
Drug	0.0936	0.2948
Drug	0.4019	0.4982

Hierarchical Bootstrapping Method Preview¶

The hierarchical bootstrapping method will show the distribution of the average percent time moving between group 1 and group 2 for move (left), and rest (right).

Move Rest Comparison Bootstrap

Comparison of the percent time moving between group 1 and group 2 for move (left), and rest (right)

By Recording Data¶

The by recording method will output a csv file containing the percent time moving between group 1 and group 2 for move and rest states. The estimates for moving and resting are calculated independently. The length of the file will be the number of total recordings

Group	File Name	Moving	Resting
Control	locomotion_1.csv	0.2401	0.0365
Drug	locomotion_2.csv	0.4823	0.1807
Control	locomotion_3.csv	0.0936	0.2948
Drug	locomotion_4.csv	0.4019	0.4982

By Recording Preview¶

The by recording method will show the average percent time moving and resting per recording between group 1 and group 2 for move (left), and rest (right). These plots will include a strip plot of the average of each recording.

Move Rest Comparison Recording

Comparison of the percent time moving between group 1 and group 2 for move (left), and rest (right)

By Frame Data¶

The by frame method will output a csv file containing a boolean value whether the subject was moving or resting as a boolean value for every frame in all recordings for both groups. The length of the file will be the number of total frames in all recordings.

Group	File Name	Resting
Control	locomotion_1.csv	1
Control	locomotion_1.csv	1
Control	locomotion_1.csv	1
Control	locomotion_1.csv	0

By Frame Preview¶

The by frame method will show a bar chart of the total proportion of time spent moving, resting, or undefined for each group (mean ± sem).

Move Rest Comparison Frame

FAQ¶

Question	Answer
What are the advantages of using the hierarchical bootstrapping method?	The hierarchical bootstrapping method is a non-parametric method that does not assume normality of the data. It is especially useful if you have a small sample size or if the data is not normally distributed. Furthermore, the hierarchical bootstrapping method is agnostic to the length of the recordings, as it resamples the data at the level of the individual recordings.
Why does my data look different when using the hierarchical bootstrapping method?	The plots using hierarchical bootstrapping method will look different because the data being ploted is a distribution of mean estimates rather than the raw values.
Why is the p-value different when using the hierarchical bootstrapping method?	Your p-value may differ between the hierarchical bootstrapping method and the non-hierarchical bootstrapping method because the hierarchical bootstrapping is reliant on the number of iterations specified by the `N Bootstraps` parameter. The more iterations you perform, the more accurate the p-value will be. The non-hierarchical bootstrapping method is reliant on number of total frames or the number of recordings.

Saravanan, V., Berman, G. J., & Sober, S. J. (2020). Application of the hierarchical bootstrap to multi-level data in neuroscience. Neurons, behavior, data analysis, and theory, 3(5), https://nbdt.scholasticahq.com/article/13927-application-of-the-hierarchical-bootstrap-to-multi-level-data-in-neuroscience. ↩↩↩
D’Agostino, R. B. (1971), “An omnibus test of normality for moderate and large sample size”, Biometrika, 58, 341-348 ↩
D’Agostino, R. and Pearson, E. S. (1973), “Tests for departure from normality”, Biometrika, 60, 613-622 ↩