Automated quantification of the schooling behaviour of sticklebacks
© Ardekani et al.; licensee Springer. 2013
Received: 31 January 2013
Accepted: 26 September 2013
Published: 9 November 2013
Sticklebacks have long been used as model organisms in behavioural biology. An important anti-predator behaviour in sticklebacks is schooling. We plan to use quantitative trait locus mapping to identify the genetic basis for differences in schooling behaviour between marine and benthic sticklebacks. To do this, we need to quantify the schooling behaviour of thousands of fish. We have developed a robust high-throughput video analysis method that allows us to screen a few thousand individuals automatically. We propose a non-local background modelling approach that allows us to detect and track sticklebacks and obtain the schooling parameters efficiently.
Our goal is to dissect the genetic basis for the divergent schooling behaviour between marine and benthic sticklebacks. Quantitative trait locus (QTL) mapping has successfully identified the genetic basis for many variant traits in sticklebacks . The plan is to use QTL mapping in benthic-marine hybrids to identify genetic loci that contribute to differences in schooling behaviour.
To assay the hundreds of fish necessary for this technique, a robust high-throughput video analysis system is essential. In this paper, we present a custom approach for analysis of videos from our assay. We propose a method for background modelling for videos that are (semi-)periodic; i.e. those in which some or all of the background in each frame is repeated in at least a few other frames in the video. We show the result of this simple yet effective method for processing videos from our experiments.
Target detection for video tracking
For any video tracking system, target detection is an essential ingredient. One approach is to detect an object of interest based on appearance features such as geometric shape, texture and colour . In this approach, the visual features should be chosen so that the target can be easily distinguished from other objects in the scene. This approach has become more popular recently, partially due to the great progress in object detection . Another approach to detect moving objects in the scene is background subtraction . This approach is especially useful for surveillance systems, such as for parking lots, offices, and controlled experimental environments, in which cameras are fixed and directed to the area of interest. The main property of these systems is that background is to some extent static, and a model of background can be calculated for each frame . For example, Wu et al. used this method for detection and tracking of a colony of Brazilian free-tailed bat in nature . Different methods have been developed to robustly maintain the background model in scenes with possible changes in background such as gradual change in lighting and sudden changes in illumination due to light switches [8, 9]. Moreover, there are studies that address background modelling in dynamic scenes with significant stochastic motion, such as water or waving trees [11, 12]. Unfortunately, the aforementioned approaches are not applicable for our experiments due to our experimental set-up (see the ‘Challenges’ section). In this paper, we propose a non-local background modelling approach, which exploits the semi-periodic nature of the videos and overcomes the limitations of other approaches.
Moreover, since the model school is rotating, the associated poles and wires are also moving in the scene, but these are not the desired targets. Therefore, detecting real fish by background subtraction using a static model or using the most recent frames as the background model is not effective. We define a new ‘background’ model in which all objects (including moving ones) are a part of the background, and only the target, which is the real fish, is detected as foreground. It is possible to create such a background if objects in the video have a predictable motion model. Our main contribution is to exploit the periodicity of the videos and build a background model, which enables us to discount all moving parts of the set-up except the fish.
Model school detection
To detect the schooling behaviour of the fish, we need to detect the model school. As can be seen in Figure 3, the fish are suspended from a circular wire. An obvious choice for circle detection is the generalized Hough transform , and since the radius of the circle (aside from the negligible variation due to perspective effect) is constant, the model fish are effectively located. The process of model detection can be expedited using the previous frame information for each frame and searching for a circle in the neighbourhood of the region of interest (close to the last frame detected) instead of searching the whole image. By finding the centre of the circle at each frame, the movement direction of model fishes is extractable; this is needed to calculate the statistics we need from each experiment.
Real fish detection
We want to build a background model for each frame such that the only ‘foreground’ would be the real fish. This means we want to have the model school, poles and wires as background.
in which h and w are the height and width of the region of interest, respectively, C is a normalization factor and I f (i,j) is the intensity value of the pixel (i,j) which is between 0-255 at frame f. To keep between 0 and 1, we choose C to be (255×w×h)-1.
Implementation and results
We implemented our method in C++ and using OpenCV library. We have a pre-processing block in which the Haar-like features as well as the position of the model fish are extracted at each frame. In the processing step, we use extracted features to identify the similar frames for each frame and detect the fish as described in the methods section. Since the model school is moving semi-periodically, we can limit our search space to find similar frames and search in a limited number of frames instead of searching in all frames. In our set-up, the model school turns almost 25 times during the 5-min video (approximately 9,000 frames). As mentioned, the period of turning is not constant and differs between and within videos. By assuming a constant period of 350 frames per turn, we find frames in other periods that should be the most similar to the current frame; we then add the 10 frames before and after to the search space. Thus, instead of searching all 9,000 frames, we find the most similar frame by looking at around 500 frames. This expedites the processing of the videos. Finally, in the post-processing block (implemented in the R language), we look at the extracted trajectory of the fish from the model school and annotate each frame using the distance of the fish and model school as well as the speed of the fish.
Additional file 1: SticklebackTracking.avi - sample video. This video shows one typical experiment that has been processed. Detected fish is indicated in blue and a red circle shows the position of model school at each frame. (AVI 11 MB)
Detection performance in five video segments, with 1,000 frames each
Comparing automated and manual schooling time (in seconds) for 10 experiments, each of which lasts 5 min
For each video, what we are ultimately interested in is the proportion of time in which the fish schools. Each video lasts 300 seconds, and for each second, we determine if the fish is schooling. This results in two vectors of 0 and 1 (0 for not schooling and 1 for schooling), one for manual and one for automated annotation. To assess the concordance between the manual and the automated annotation, we used the Kappa statistic . Values of Kappa can be at most 1, with larger values corresponding to better agreement between human and machine; observed values are given in Table 2. To determine the significance of the Kappa statistic for each experiment, we produced 1,000 permutations of the automated annotation and computed the observed value of the Kappa statistic for the comparison between the human annotation and the permuted one. The observed value of Kappa was compared to the values obtained under the permutation procedure. In all experiments, the observed value was larger than the largest simulated statistic; this corresponds to a nominal p value of 0.001, confirming the agreement between the manual and automated annotation.
We have proposed a method to automate the quantitative analysis of stickleback schooling behaviour. We exploit the semi-periodic nature of the videos to build an accurate background model for each model. Since we are processing recorded videos, our background modelling algorithm does not need to be causal; however, it can be extended for causal systems, e.g. real-time applications. The proposed method enables us to detect the fish in difficult situations, for example, when the fish is very close to the model and/or is partially occluded. Most modern online tracking methods rely on the visual features and/or motion model of the targets [6, 7]. These approaches would fail in the frames in which the actual fish is swimming close to the models since they are similar in appearance and movement pattern. If a switching between the real fish and one model fish happens, this might lead to tracking the model throughout the rest of the video, thereby giving a much higher schooling score to the real fish. This leads to another advantage of the proposed method: since the detection in each frame is independent of the neighbouring frames, detection errors will not propagate to the other frames. Using our approach, we can find the important parameters of schooling behaviour. This enables us to screen many individuals with different genotypes efficiently and conduct association studies between genotype and schooling behaviour. Moreover, the new definition of background can be used in situations where the moving part of the background is predictable or periodic, for example, in detecting an object in assembly lines that use robotic arms with repetitive moves.
Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under awards number P50HG002790 (RA, ST) and P50HG002568 (AKG, CLP), and National Science Foundation grant IOS 1145866 (AKG, CLP). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.
- Tinbergen N: The curious behavior of the stickleback. Sci. Am 1952, 187: 22-26.View Article
- Bell MA, Foster SA: The Evolutionary Biology of the Threespine Stickleback. Oxford: Oxford University Press; 1994.
- Wootton RJ: The Biology of the Sticklebacks. London: Academic Press; 1976.
- Kingsley DM, Peichel CL: The molecular genetics of evolutionary change in sticklebacks. In Biology of the Three-Spined Stickleback. Edited by: Ostlund-Nilsson S, Mayer I, Huntingford F. Boca, Raton: CRC Press; 2007.
- Wark AR, Greenwood AK, Taylor EM, Yoshida K, Peichel CL: Heritable differences in schooling behavior among threespine stickleback populations revealed by a novel assay. PLoS ONE 2011, 6: e18316. 10.1371/journal.pone.0018316View Article
- Yilmaz A, Javed O, Shah M: Object tracking: a survey. ACM Comput. Surv 2006. doi: 10.1145/1177352.1177355
- Hare S, Saffari A, Torr PH: Struck: structured output tracking with kernels. IEEE International Conference on Computer Vision, Barcelona 6–13 Nov. 2011.
- Piccardi M: Background subtraction techniques: a review. IEEE Int. Conf. Syst. Man Cybern 2004, 4: 3099-3104.
- Toyama K, Krumm J, Brumitt B, Meyers B: Wallflower: principles and practice of background maintenance. ICCV 1999, 1: 255-261.
- Wu Z, Kunz TH, Betke M: Efficient track linking methods for track graphs using network-flow and set-cover techniques. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, 20–25, June 2011.
- Sheikh Y, Shah M: Bayesian modeling of dynamic scenes for object detection. PAMI 2005, 27: 1778-1792.View Article
- Chan AB, Mahadevan V, Vasconcelos N: Generalized Stauffer-Grimson background subtraction for dynamic scenes. Mach. Vision Appl 2011, 22: 751-766. 10.1007/s00138-010-0262-3View Article
- Duda RO, Hart PE: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 1972, 15: 11-15. 10.1145/361237.361242View Article
- Buades A, Coll B, Morel JM: A non-local algorithm for image denoising. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) 2005, 2: 60-65.
- Papageorgiou CP, Oren M, Poggio T: A general framework for object detection. Sixth International Conference on Computer Vision (ICCV 98), Bombay, 4–7 Jan 1998.
- Viola P, Jones M: Robust real-time face detection. Int. J. Comput. Vis 2004, 57: 137-154.View Article
- Crow FC: Summed-area tables for texture mapping. Proc. SIGGRAPH 1984, 18: 207-212. 10.1145/964965.808600View Article
- Cohen J: A coefficient of agreement for nominal scales. Educ. Psychol. Meas 1960, 20: 37-46. 10.1177/001316446002000104View Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.