Inscopix Bottom View Mouse Pose Estimation¶
This tool uses 2.0 compute credits per hour.
DeepLabCut is an open-source library for pose estimation based on deep learning. This tool applies a pre-trained DeepLabCut model to Inscopix behavioral movies recorded from a bottom-up camera view angle.
Parameters¶
Parameter | Required? | Default | Description |
---|---|---|---|
Behavior Movie | True | N/A | Behavioural movies to analyze. Must be one of the following formats: .isxb, .mp4, and .avi. |
Experiment Annotations Format | True | parquet | The file format of the output experiment annotations file. Can be either .parquet or .csv |
Crop Rectangle | False | N/A | Draw a cropping rectangle on the input movie which will be used by DeepLabCut to crop movie frames before running the model. |
Window Length | True | 5 | Length of the median filter applied on the predictions. Must be an odd number. If zero, then no filtering is applied. |
Displayed Body Parts | True | all | Selects the body parts that are plotted in the video. If "all", then all 8 body are plotted. Otherwise a comma-seperated list of strings selects a subset of body parts to plot. E.g., "nose","neck". The body parts available for plotting are: tail_tip, tail_base, R_hind, L_hind, neck, R_fore, L_fore, nose. |
P Cutoff | True | 0.6 | Cutoff threshold for predictions when labelling the input movie. If predictions are below the threshold, then they are not displayed. |
Dot Size | True | 5 | Size in pixels to draw a point labelling a body part. |
Color Map | True | rainbow | Color map used to color body part labels. Any matplotlib colormap name is acceptable. |
Keypoints Only | True | False | Only display keypoints, not video frames. |
Output Frame Rate | False | N/A | Positive number, output frame rate for labeled video. If None, use the input movie frame rate. |
Draw Skeleton | True | False | If True adds a line connecting the body parts, making a skeleton on each frame. The body parts to be connected and the color of these connecting lines are specified by the Color Map. |
Trail Points | True | 0 | Number of previous frames whose body parts are plotted in a frame (for displaying history). |
Inputs¶
The following table summarizes the valid input files types for this tool:
Source Parameter | File Type |
---|---|
Behavior Movie | nVision Movie [.isxb ] or Movie (general) [.mp4 , .avi ] |
The following sections explain in further detail the expected format for these input files.
Behavior Movie¶
The tool will analyze behavioral movies using the Inscopix Bottom View Pre-Trained Model
The behavior movies can be in the one the following file formats: .isxb
, .mp4
, .avi
.
Multiple input movies
The tool can accept multiple behavior movies (of the same file format) as input, applying the model on each movie individually.
If the behavior movies are in .isxb
file format, then the movies can be constructed into a series on IDEAS and then used as input to the tool as well.
See the Outputs
section for more information on how outputs are formatted for multiple input movies.
Algorithm Description¶
This tool consists of three main analysis steps, summarized in the following diagram:
The first step is running the deeplabcut.analyze_videos
API function, documented here.
This function takes a path to the DeepLabCut project which contains the trained model to use, and applies that model on the input behavior movies.
The output of this step is an h5
file containing the pose estimates of the model.
The second step is running the deeplabcut.filter_predictions
API function, documented here.
This functions filters the raw pose estimates, removing outliers and noise from the results.
The output of this step is also an h5
file containing the filtered pose estimates of the model.
This step is recommended in the DeepLabCut documentation, however it can be skipped by setting the Window Length
parameter to one.
The third and final step is running the deeplabcut.create_labeled_videos
API function, documented here.
This function creates a video of the input movie with annotations of the pose estimates results.
If filtering is applied on the pose estimates, it will annotate the input movie with the filtered results.
Otherwise, the raw pose estimates will be used for annotations.
The output of this step is a mp4
file with the annotated movie frames.
This result helps to easily visualize and assess the quality of the model results.
Model¶
This tool employs a pre-trained convolutional deep neural network model for pose estimation. The pre-trained model that is executed by the tool was trained using DeepLabCut, an open-source library for pose estimation using deep learning. The version of DeepLabCut used by the tool is version 2.3.0. The following figure shows an example of the structure of the model.
This figure was adapted from the the DeepLabCut paper.
The pre-trained model tracks 8 different body parts of a mouse (nose, neck, left forepaw, right forepaw, left hindpaw, right hindpaw, tail base, and tailtip), as illustrated in the figure below.
Training¶
In order to generalize well across different behavioral movies, the model was trained across a number of bottom view movies with different lighting conditions and environments. The following table summarizes the types of movies that were used to train the model.
Lighting Condition | Arena/Environment | Number of Movies | Number of Frames (overall) |
---|---|---|---|
White Light | Open Field | 10 | 200 |
IR Light | Open Field | 13 | 750 |
IR Light | Social Preference Test | 3 | 115 |
Overall, the model was trained on a total of 26 movies, 1065 frames, across 2 lighting conditions and 2 arenas/environments. The figure below shows examples of these different types of movies.
In addition, data augmentation was used (imgaug) to extend further model generalization, e.g., to black and white movies and to a range of mouse sizes and image contrast and sharpness.
The model was trained over 1000000 iterations, and the snapshot (model coefficients) minimizing the test set error was selected (min. error of 5.07 pixels at 420000 iterations).
Outputs¶
This model produces three different outputs. Each output corresponds to a step that is executed by the tool.
Pose Estimates H5 File¶
An .h5
file containing the model predictions.
The predictions are stored as a MultiIndex Pandas Array.
The files contains the name of the network, body part name, (x, y) label position in pixels, and the likelihood (i.e., how confident the model was to place the body part label at those coordinates) for each frame per body part.
The following figure shows an example of this output.
Filtering
If the Window Length
parameter is set to one, then no filtering is performed and the raw pose estimates (aka coordinates or keypoints) are output.
Otherwise if the Window Length
is greater than one, then filtering is performed and the filtered pose estimates are output.
Multiple input movies
If there are multiple input movies, a pose estimates.h5
file is created for each movie.
Previews¶
Each pose estimates file will have the following previews to visualize the data stored in these files.
All of these figures are generated using the plot_trajectories
DeepLabCut API function.
Histogram:
The first preview is a histogram plot of the consecutive differences in x and y coordinates for each body part. This plot can help determine if there's any outliers in the results based on significant jumps in the coordinates of a body part. Ideally the distribution for each body part should be close to zero.
Likelihood vs. Time Plot:
The second preview is plot of the likelihood of each body part over time. This plot can help identify body parts or time periods where likelihood is low, indicating less reliable results from the model to filter in subsequent analysis.
Body Parts vs. Time Plot:
The third preview is a plot of the x & y coordinates of each body part over time. This plot can help identify how the position of body parts changes over time.
Body Parts Trajectory:
The fourth preview is a plot of the x & y coordinates of each body part plotted spatially in the FOV (field of view). This plot can help identify where the location of body parts in the FOV throughout the duration of the input movies.
Experiment Annotations File¶
The pose estimates in IDEAS experiment annotations format. This can be either a .csv
or .parquet
file depending on the Experiment Annotations Format
parameter. The following figure shows an example of this output. Note: Only a subset of the model body parts are shown in the figure.
Frame number | Movie number | Local frame number | Time since start (s) | Hardware counter (us) | tail_tip x | tail_tip y | tail_tip likelihood | tail_base x | tail_base y | tail_base likelihood | R_hind x | R_hind y | R_hind likelihood | L_hind x | L_hind y | L_hind likelihood | neck x | neck y | neck likelihood | R_fore x | R_fore y | R_fore likelihood | L_fore x | L_fore y | L_fore likelihood | nose x | nose y | nose likelihood |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 2.47904E+11 | 564.8240966796875 | 317.3860778808594 | 0.9447007775306702 | 509.9263000488281 | 351.1965332 | 0.9994614124298096 | 528.3774414 | 390.61273193359375 | 0.9971207976341248 | 476.92999267578125 | 352.4303283691406 | 0.977459729 | 491.11468505859375 | 454.84625244140625 | 0.9713097810745239 | 492.2207031 | 422.94659423828125 | 0.9939277768135071 | 467.0288086 | 440.4466247558594 | 0.9846889972686768 | 488.8327637 | 484.03619384765625 | 0.9942392706871033 |
1 | 0 | 1 | 0.052012 | 2.47904E+11 | 565.0961914 | 323.2819824 | 0.34924545884132385 | 509.9263000488281 | 356.41339111328125 | 0.9949443936347961 | 528.3774414 | 391.04302978515625 | 0.9976459741592407 | 476.92999267578125 | 358.0550842285156 | 0.8112503886222839 | 492.7936706542969 | 461.8389892578125 | 0.8834290504455566 | 492.35577392578125 | 423.30364990234375 | 0.9897199869155884 | 469.6249694824219 | 445.0766296386719 | 0.989734411 | 490.6102600097656 | 492.1567687988281 | 0.9948519468307495 |
2 | 0 | 2 | 0.100012 | 2.47904E+11 | 566.8184814453125 | 525.7069091796875 | 0.6965065002441406 | 509.9263000488281 | 358.21875 | 0.9983623623847961 | 528.3774414 | 391.3701477050781 | 0.9962753653526306 | 476.92999267578125 | 362.00274658203125 | 0.9659325480461121 | 492.7936706542969 | 467.57122802734375 | 0.98145169 | 492.7096862792969 | 430.7136535644531 | 0.9121096134185791 | 470.1773986816406 | 445.0766296386719 | 0.7179741859436035 | 490.6102600097656 | 494.5678405761719 | 0.9978989958763123 |
3 | 0 | 3 | 0.15201 | 2.47904E+11 | 566.8184814453125 | 525.7069091796875 | 0.8461886644363403 | 498.8354492 | 362.7205505371094 | 0.9997307658195496 | 527.5282592773438 | 391.3701477050781 | 0.9964771270751953 | 459.6620178222656 | 398.22320556640625 | 0.9799673557281494 | 492.7936706542969 | 467.57122802734375 | 0.9810561537742615 | 496.5051574707031 | 451.73236083984375 | 0.9870792627334595 | 470.1773986816406 | 444.61700439453125 | 0.9764858484268188 | 490.6102600097656 | 494.5678405761719 | 0.9971500039100647 |
4 | 0 | 4 | 0.200012 | 2.47904E+11 | 566.8184814453125 | 525.7069091796875 | 0.9621436595916748 | 498.6442565917969 | 362.7205505371094 | 0.9995316863059998 | 527.5282592773438 | 391.4361267089844 | 0.9963982105255127 | 459.53948974609375 | 400.02703857421875 | 0.9914501905441284 | 487.4304199 | 467.57122802734375 | 0.9598495960235596 | 496.5051574707031 | 451.73236083984375 | 0.9904251098632812 | 470.1773986816406 | 444.1321105957031 | 0.9979997873306274 | 479.2045898 | 496.55084228515625 | 0.9930460453033447 |
For every body part estimated by the model, there are three columns that are output: x
, y
, and likelihood
. In addition, there is a Frame Number
column in the output labeling every frame of the input movie analyzed. In addition, the following columns are included:
Frame number
: The frame number in the input movie. If there are multiple input movies, this column will refer to the global frame number across the frames in all the input movies.Movie number
: The movie number that the frame is from, within the multiple input movies.Local frame number
: The frame number within the individual movie that the frame is from.Time since start (s)
: Timestamp in seconds for every frame of the input movie, relative to the start of the movie.Hardware counter (us)
: Only included for.isxb
input movies. This column contains the hardware counter timestamps in microseconds for every frame of the input movie. These are the hardware counter values generated by the nVision system which can be compared to corresponding hardware counter values in.isxd
,.gpio
, and.imu
files from the same synchronized recording. These timestamps will be used downstream in theMap Annotations to ISXD Data
tool in order map frames from annotations to isxd data, so the two datasets can be compared for analysis.
Multiple input movies
If there are multiple input movies, there will still only be one experiment annotations file, which concatenates the pose estimates across all input movies into a single result.
Labeled MP4 File (for visualization)¶
A .mp4
file containing the model predictions annotated on every frame of the input movie.
The following figure shows an example of this output.
Multiple input movies
If there are multiple input movies, a labeled movie .mp4
file is created for each movie.
Next Steps¶
Here are some examples of subsequent analyses that can be executed using the outputs of this tool:
- Average DeepLabCut Keypoints: Average the keypoint estimates in each frame of the tool output, and use this to represent the mouse center of mass (COM).
- Compute Locomotion Metrics: Compute instantaneous speed and label states of rest and movement from averaged keypoints.
- Compare Neural Activity Across States: Compute population activity during states of rest and movement from locomotion metrics.
- Compare Neural Circuit Correlations Across States: Compute correlations of cell activity during states of rest and movement from locomotion metrics.
- Combine and Compare Population Activity Data: Combine and compare population activity from multiple recordings and compare across states of rest and movement.
- Combine and Compare Correlation Data: Combine and compare correlations data from multiple recordings and compare across states of rest and movement.