• 大小: 0.25M
    文件类型: .pdf
    金币: 1
    下载: 0 次
    发布日期: 2021-03-27
  • 语言: 其他
  • 标签: 其他  

资源简介


Dalal的经典论文,2005-cvpr-Histograms of oriented gradients for human detection.pdf
Figure 2. Some sample images from our new human detection database. The subjects are al ways upright, but with some partial occlusions and a wide range of variations in pose, appearance, clothing, illumination and background probabilities to be distinguished more easily. We will often the edge threshold chosen automatically to maximize detec use miss rate at 10FPPW as a reference point for results. tion performance(the values selected were somewhat vari- This is arbitrary but no more so than, e.g. Area Under roC. able, in the region of 20-50 graylevels) In a multiscale detector it corresponds to a raw error rate of Results. FIg. 3 shows the performance of the various detec- about 0. 8 false positives per 640 x480 image tested. (The full tors on the mit and inria data sets The hog-based de detector has an even lower false positive rate owing to non- tectors greatly outperform the wavelet PCA-SIFT and Shape maximum suppression). Our DET curves are usually quite Context ones, giving near-perfect separation on the Mittest shallow so even very small improvements in miss rate are set and at least an order of magnitude reduction in FPPw equivalent to large gains in FPPW at constant miss rate. For on the INRIa one. Our Haar-like wavelets outperform MIT example, for our default detector at le-4 FPPw, every 10 wavelets because we also use 2nd order derivatives and con- absolute(9% relative)reduction in miss rate is equivalent to trast normalize the output vector. Fig 3(a)also shows MIT's reducing the FPPw at constant miss rate by a factor of 1. 57. best parts based and monolithic detectors( the points are in- 5 Overview of results terpolated from [17, however beware that an exact compar ison is not possible as we do not know how the database in Before presenting our detailed implementation and per- [17 was divided into training and test parts and the nega- formance analysis, we compare the overall performance of tive images used are not available. The performances of the our final Hog detectors with that of some other existing final rectangular(R-HOG) and circular(C-HOG) detectors methods. Detectors based on rectangular(R-HOG) or cir- are very similar, with C-Hog having the slight edge. Aug- nd cular log-polar(C-HOG) blocks and linear or kernel SVM menting R-HOG with primitive bar detectors(oriented 2 are compared with our implementations of the Haar wavelet, derivatives-'R2-HOG') doubles the feature dimension but PCA-SIFT, and shape context approaches. Briefly, these ap- further improves the performance(by 2% at 10 FPPW) proaches are as follows Replacing the linear svm with a gaussian kernel one im Generalized Haar Wavelets. This is an extended set of ori- proves performance by about 3 %at 10 FPPW, at the cost ented Haar-like wavelets similar to(but better than)that used of much higher run times. USing binary edge voting(EC- ses from 9x9 and HOG)instead of gradient magnitude weighted voting (C- in [17]. The features are rectified responses from 9x9 and 12x12 oriented 1st and 2nd derivative box filters at 45. inter HoG) decreases performance by 5% at 10+ FPPw, while vals and the corresponding 2nd derivative wy filter omitting orientation information decreases it by much more even if additional spatial or radial bins are added(by 33 %at PCA-SIFT. These descriptors are based on projecting gradi- 10-4 FPPW, for both edges(E-Shape C)and gradients(G ent images onto a basis learned from training images using ShapeC) PCA-SIFT also performs poorly. One reason Is PCA [1l]. Ke Sukthankar found that they outperformed that, in comparison to [11, many more(80 of 512) principal SIFT for key point based matching, but this is controversial vectors have to be retained to capture the same proportion of [14]. Our implementation uses 16x16 blocks with the same the variance. This may be because the spatial registration is derivative scale, overlap, etc, as our HOG descriptors. The eaker when there is no keypoint detector PCa basis is calculated using the positive training images Shape Contexts. The original Shape Contexts [1] used bi 6 Implementation and performance stud nary edge-presence voting into log- polar spaced bins, irre We now give details of our HOG implementations and spective of edge orientation. We simulate this using our C- systematically study the effects of the various choices on de hoG descriptor(see below) with just 1 orientation bin. 16 tector performance. Throughout this section we refer results angular and 3 radial intervals with inner radius 2 pixels and outer radius 8 pixels gave the best results. Both gradient We use the hard examples generated by linear r-HOG to train the ker- nel R-HOG detector, as kernel R-HOG generates so few false positives that strength and edge-presence based voting were tested, with its hard example set is too sparse to improve the generalization significantly Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPRo5 1063-6919/05520.00@2005IEEE DET-different descriptors on mit database DET -different descriptors on INRIA database 0.2 0.5 o- Lin. R-HOG 曰·Lin.C-HOG v-Lin. EC-HOG ▲. Wavelet 4- PCA-SIFT 0.2 ▲. miT best(par 0.1 MIT baseline R: 心-Ker.R-HOG Lin. R2-HOG v- Lin R-HOG ▲·Lin.C-HOG 0.05 <- Lin. EC-HOG .02H -p. Wavelet 、.1 0.02 Lin. G-ShapeC 0.01 In 0.01 lse positives per window(FPPW) false positives per window(FPPW) Figure 3. The performance of selected detectors on(left) MIT and(right) INRIA data sets. See the text for details to our default detector which has the following properties, ing o=0(none). Masks tested included various 1-D point described below: RGB colour space with no gamma cor- derivatives(uncentred [-1, 1, centred-1, 0, 1 and cubic rection;[-1,0, 1 gradient filter with no smoothing; linear corrected [1-8, 0,8,-1)as well as 3 x3 Sobel masks and gradient voting into 9 orientation bins in 00-180 16x16 2x2 diagonal ones(010), (00)(the most compact cen- pixel blocks of four 8x8 pixel cells; Gaussian spatial win- tred 2-D derivative masks). Simple 1-D[-1, 0, 1 masks at dow with o=& pixel; L2-Hys (Lowe-Sty le clipped L2 norm) 0=0 work best. Using larger masks always seems to de block normalization; block spacing stride of 8 pixels(hence crease performance, and smoothing damages it significantI 4-fold coverage of each cell); 64x 128 detection window; for Gaussian derivatives, moving from a-0 to 0-2 reduces linear sⅤ M classifier. the recall rate from 89% to 80%c at 10-1 FPPW. At 0=0 Fig 4 summarizes the effects of the various hog param- cubic corrected 1-d width 5 filters are about 1 worse than eters on overall detection performance. These will be exam 1, 0, 1 at 10-4 FPPW, while the 2x2 diagonal masks are ined in detail below. The main conclusions are that for good 1.5% worse. USing uncentred [-1, 1 derivative masks also performance, one should use fine scale derivatives (essen- decreases performance (by 1.5%c at 10+ FPPW),presum tially no smoothing), many orientation bins, and moderately ably because orientation estimation suffers as a result of the sized, strongly normalized, overlapping descriptor blocks. r and y filters being based at different centres 6.1 Gamma/Colour normalization For colour images, we calculate separate gradients for We evaluated several input pixel representations includ- each colour channel, and take the one with the largest norm ing grayscale, RGB and LAB colour spaces optionally with as the pixels gradient vector power law(gamma)equalization. These normalizations have 6.3 Spatial /Orientation Binning only a modest effect on performance, perhaps because the The next step is the fundamental nonlinearity of the de subsequent descriptor normalization achieves similar results We do use colour information when available. RGB and scriptor. Each pixel calculates a weighted vote for an edge orientation histogram channel based on the orientation of the LAB colour spaces give comparable results, but restricting gradient element centred on it, and the votes are accumu to grayscale reduces performance by 1. 5% at 10 FPPW lated into orientation bins over local spatial regions that we Square root gamma compression of each colour channel im call cells. Cells can be either rectangular or radial (log-polar proves performance at low FPPW(by 1%o at 10-4 FPPW sectors). The orientation bins are evenly spaced over 0 but log compression is too strong and worsens it by2%at180°(“ unsigned" gradient)or0°-360°(“3gned' gradient 10-4 FPPW To reduce aliasing, votes are interpolated bilinearly between 6.2 Gradient Computation the neighbouring bin centres in both orientation and posi- Detector performance is sensitive to the way in which tion. The vote is a function of the gradient magnitude at the gradients are computed, but the simplest scheme turns out pixel, either the magnitude itself, its square its square root, to be the best. We tested gradients computed using Gaus- or a clipped form of the magnitude representing soft pres- sian smoothing followed by one of several discrete deriva- encelabsence of an edge at the pixel. In practice, using the tive masks. Several smoothing scales were tested includ- magnitude itself gives the best results. Taking the square root Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPRo5 1063-6919/05520.00@2005IEEE ET-effect of gradient scale o DET -effect of number of orientation bins p DET -effect of normalization methads 会 0.2 ∴ 0.1 0.1 obin=90-180) 0.0 005--bin=60-180) 影 n--0.5 bin=4(0-180) E005H-·L-Hys ▲-bin=3(0-180) bin=18(0360) -v.L1- Sart 0 002-bin=120-360) L1-norm bin=8(0-360) -d. No nom b 0=0. c-cor ▲-bin=6(0-360) 争- Window norm 0.01 0.01 10 10 10 10 false positives per window(FPPW) false positives per window(FPPw) alse positives per window(FPPW (b DET -effect of overlap(cell size=8, num cell 22, wt=( DET- offect of window size DET-effect of kernel width. Y, on kernel SVM 5 暮 0.2 ∴∴ g0.1 04 8 0.05 ∈005 002·-°· overlap=34. stride=4 0.02-064X128 002‖-y8e a-ovcrlap= 1/2, stride =8 a-56×120 0.0,Lp.overlap=o,stride=16 48X112 7e-2 01 0.01 10 10 10 10 10 10 false positives per window(FPPW false positives per window(PPw false positives per window(FPPW) e) Figure 4. For details see the text.(a) Using fine derivative scale significantly increases the performance. ("c-cor'is the ID cubic-corrected point derivative).(b) Increasing the number of orientation bins increases performance significantly up to about 9 bins spaced over 0 180.(c)The effect of different block normalization schemes(see $6.4).(d) Using overlapping descriptor blocks decreases the miss rate by around 5%.(e) Reducing the 16 pixel margin around the 64x 128 detection window decreases the performance by about 4%.(f) Using a Gaussian kernel SVM, exp (- X1-X24), improves the performance by about 3%o reduces performance slightly, while using binary edge pres ence voting decreases it significantly(by 5 %o at 10 FPPW) Fine orientation coding turns out to be essential for good performance, whereas(see below) spatial binning can be rather coarse. As fig. 4(b) shows, increasing the number 3 15 of orientation bins improves performance significantly up to about 9 bins, but makes little difference beyond this. This is for bins spaced over0-180°,te.the‘ sign'of the gradi-乏53 ent is ignored. Including signed gradients (orientation range 0o-3600, as in the original Sift descriptor) decreases the 12x12 performance, even when the number of bins is also doubled to preserve the original orientation resolution. For humans Cell size(pixels)4x4 4x4 Block size(Cells) the wide range of clothing and background colours presum ably makes the signs of contrasts uninformative. However Figure 5. The miss rate at 10-4 FpPW as the cell and block sizes note that including sign information does help substantially change. The stride (block overlap)is fixed at half of the block size in some other object recognition tasks, e.g. cars, motorbikes. 3x3 blocks of 6x6 pixel cells perforIn best, with 10.4% miss rate 6. 4 Normalization and descriptor blocks Gradient strengths vary over a wide range owing to local based on grouping cells into larger spatial blocks and con- variations in illumination and foreground-background con- trast normalizing each block separately. The final descriptor trast, so effective local contrast normalization turns out to is then the vector of all components of the normalized cell ber of different normalization schemes. Most of them are In fact, we typical of the blocks in the detection window be essential for good performance. We evaluated a num- responses from all ally overlap the blocks so that cach scalar Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPRo5 1063-6919/05520.00@2005IEEE cell response contributes several components to the final de- iniscent of Shape Contexts [1] except that, crucially, each scriptor vector, each normalized with respect to a different spatial cell contains a stack of gradient-weighted orienta block. This may seem redundant but good normalization is tion cells instead of a single orientation-independent edge critical and including overlap significantly improves the per- presence count. The log-polar grid was originally suggested formance Fig 4(d) shows that performance increases by 4% by the idea that it would allow fine coding of nearby struc at 10 FPPW as we increase the overlap from none(stride ture to be combined with coarser coding of wider context, 16)to 16-fold area/4-fold linear coverage(stride 4) and the fact that the transformation from the visual field to We evaluated two classes of block geometries, square or the vI cortex in primates is logarithmic [21]. However small rectangular ones partitioned into grids of square or rectangu- descriptors with very few radial bins turn out to give the best lar spatial cells, and circular blocks partitioned into cells in performance, so in practice there is little inhomogeneity or log-polar fashion. We will refer to these two arrangements context. It is probably better to think of c-HOG's simply as as R-hog and C-HoG (for rectangular and circular HO). an advanced form of centre-surround coding R-HOG. R-HOG blocks have many similarities to sift de- We evaluated two variants of the C-HOG geometry, ones with a single circular central cell(similar to scriptors [12] but they are used quite differently. They are the GLOH feature of [141), and ones whose cen computed in dense grids at a single scale without dominant orientation alignment and used as part of a larger code vecto tral cell is divided into angular sectors as in shape that implicitly encodes spatial position relative to the detec contexts. We present results only for the circular- tion window, whereas SIFTs are computed at a sparse set et centre variants, as these have fewer spatial cells of scale-invariant key points, rotated to align their dominant than the divided centre ones and give the same per orientations. and used individually sIfts are optimized for formance in practice. A technical report will provide fur sparse wide baseline matching, R-HOG'S for dense robust ther details. The C-hog layout has four parameters: the coding of spatial form. Other precursors include the edge numbers of angular and radial bins the radius of the central bin in pixels; and the expansion factor for subsequent radii orientation histograms of Freeman Roth [4 We usually At least two radial bins(a centre and a surround)and four use square R-HOG's, i.e. sxs grids of nxn pixel cells each containing 6 orientation bins, where s, m, B are parameters angular bins(quartering)are needed for good performance Fig 5 plots the miss rate at 10 FPPW w.r. t cell size and Including additional radial bins does not change the perfor- mance much while increasing the number of angular bins block size in cells. For human detection. 3x 3 cell blocks of 6x6 piXel cells perform best, with 10.4% miss-rate at 10-4 decreases performance(by 1.3% at 10-4 FPPW when go- ing from 4 to 12 angular bins). 4 pixels is the best radius FPPW.Our standard 2 x2 cell blocks of 8x8 cells are a close for the central bin, but 3 and 5 give similar results. Increas- second. In fact, 6-8 pixel wide cells do best irrespective of ing the expansion factor from 2 to 3 leaves the performance the block size- an interesting coincidence as human limbs are about 6-8 pixels across in our images. 2x2 and 3 x3 cell essentially unchanged. With these parameters, neither Gaus- blocks work best. Adaptivity to local imaging conditions is sian spatial weighting nor inverse weighting of cell votes by weakened when the block becomes too big, and when it is cell area changes the performance, but combining these two reduces slightly. These values assume fine orientation sam- too small (1xl cell block, i. e normalization over orientation alone) valuable spatial information is suppressed pling. Shape contexts(I orientation bin) require much finer spatial subdivision to work well As in [12], it is useful to downweight pixels near the edges of the block by applying a Gaussian spatial window to each Block Normalization schemes. We evaluated four differ pixel before accumulating orientation votes into cells. This ent block normalization schemes for each of the above HOG improves performance by 1% at 10-4 FPPW for a Gaussian geometries. Let v be the unnormalized descriptor vector, with a-0.5* block - width vk be its k-norm for k=1, 2, and e be a small constant We also tried including multiple block types with differ The schemes are:(a)L2-nom,v→v/lⅤ|2 ent cell and block sizes in the overall descriptor. This slightly L2-Ilys, L2-norm followed by clipping (limiting the maxi- improves performance(by around 3% at 10-4 FPPW), at the mum values of v to 0. 2) and renormalizing, as in [12]: (c) cost of greatly increased descriptor size LI-norm,v-v/(v1+e); and (d)LI-sqrt, LI-norm fol- Besides square R-HoG blocks, we also tested vertical lowed by square root vVv/(v1+e),which amounts (2X 1 cell) and horizontal (1x2 cell) blocks and a combined to treating the descriptor vectors as probability distributions descriptor including both vertical and horizontal pairs. Verti and using the Bhattacharya distance between them. Fig 4(c) shows that L2-Hys, L2-norm and LI-sqrt all perform equally cal and vertical+horizontal pairs are significantly better than horizontal pairs alone but not as good as 2x2 or 3 x3 cell well. while simple ll-norm reduces performance by 5%c blocks(1% worse at 10-4 FPPW) and omitting normaliz ation entirely reduces it by 27%0,at 10-4 FPPW. Some regularization E is needed as we evalu- C-HOG. Our circular block(C-HOG) descriptors are rem- ate descriptors densely, including on empty patches, but the Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPRo5 1063-6919/05520.00@2005IEEE (b) (g) Figure 6. Our HOG detectors cue mainly on silhouette contours(especially the head, shoulders and feet). The most active blocks are centred on the image background just outside the contour. (a) The average gradient image over the training examples. (b) Each"pixe (e)It's computed R-HOG descriptor. (f, g) The R-HOG descriptor weighted by respectively the positive and the negative SVM weight shows the maximum positive SVM weight in the block centred on the pixel.(c) Likewise for the negative SV weights(d)A test image results are insensitive to E's value over a large range itives in which long vertical lines trigger vertical head and leg cells Centre-surround normalization. We also investigated an alternative centre-surround style cell normalization scheme 6.5 Detector Window and Context in which the image is tiled with a grid of cells and for Our 64x128 detection window includes about 16 pixels each cell the total energy in the cell and its surrounding re- of margin around the person on all four sides. Fig. 4(e) gion(summed over orientations and pooled using Gaussian shows that this border provides a significant amount of con- weighting)is used to normalize the cell. However as fig. 4(c) text that helps detection. Decreasing it from 16 to 8 pixels window norm")shows. this decreases performance relative (48x112 detection window) decreases performance by 4%o to the corresponding block based scheme (by 2%0 at 10-4 at 10-4 FPPW. Keeping a 64 128 window but increasing FPPW, for pooling with o=l cell widths). One reason is the person size within it(again decreasing the border)causes that there are no longer any overlapping blocks so each cell a sIm oSS of performance, even though the resolution of is coded only once in the final descriptor. Including several the person is actually increased normalizations for each cell based on different pooling scales 6.6 Classifier o provides no perceptible change in performance, so it seems By default we use a soft(C=0.01) linear SVM trained that it is the existence of several pooling regions with differ- with SVMLight [101(slightly modified to reduce memory ent spatial offsets relative to the cell that is important here, usage for problems with large dense descriptor vectors). Us not the pooling scale ing a gaussian kernel svm increases performance by about To clarify this point, consider the R-HoG detector with 3%o at 10 FPPw at the cost of a much higher run time. overlapping blocks. The coefficients of the trained linear 6.7 Discussion SVM give a measure of how much weight each cell of each Overall, there are several notable findings in this work block can have in the final discrimination decision. Close ex- The fact that HOG greatly out-performs wavelets and that amination of fig. 6(b, f) shows that the most important cells any significant degree of smoothing before calculating gra- are the ones that typically contain major human contours(es- dients damages the HOG results emphasizes that much of pecially the head and shoulders and the feet), normalized the available image information is from abrupt edges at fine w.r.t. blocks lying outside the contour. In other words- scales, and that blurring this in the hope of reducing the sen despite the complex, cluttered backgrounds that are com- sitivity to spatial position is a mistake. Instead, gradients mon in our training set -the detector cues mainly on the should be calculated at the finest available scale in the cur- contrast of silhouette contours against the background, not rent pyramid layer, rectified or used for orientation voting, on internal edges or on silhouette contours against the fore- and only then blurred spatially. Given this, relatively coarse ground. Patterned clothing and pose variations may make spatial quantization suffices(6-8 pixel wide cells /one limb internal regions unreliable as cues, or foreground-to-contour width). On the other hand, at least for human detection, it transitions may be confused by smooth shading and shad- pays to sample orientation rather finely: both wavelets and owing effects. Similarly, fig. 6(c, g) illustrate that gradients shape contexts lose out signi ficantly here inside the person(especially vertical ones) typically count as Secondly, strong local contrast normalization is essen- negative cues, presumably because this suppresses false pos- tial for good results, and traditional centre-surround style Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPRo5 1063-6919/05520.00@2005IEEE schemes are not the best choice. better results can be and Gesture- Recognition, IEEE Computer Society, Zurich achieved by normalizing each element(edge, cell) several Switzerland, pages 296-301, June 1995 times with respect to different local supports, and treating [5] W. T. Freeman, K.Tanaka, J. Ohta, and K. Kyuma. Com- the results as independent signals. In our standard detector puter vision for computer games. 2nd International Confer each HOG cell appears four times with different normaliza ence on Automatic Face and Gesture Recognition, Killington, tions and including this redundant' information improves VT,USA, pages 100-105, October 1996 performance from 84% to 89% at 104FPPW. [6 D. M. Gavrila. The visual analysis of human movement: A 7 Summary and Conclusions survey.CvU,73(1):82-98,1999 [7] D. M. Gavrila, J Giebel, and S Munder. Vision-based pedes We have shown that using locally normalized histogram trian detection: the protector+ system. Proc. of the IEEE In- of gradient orientations features similar to siFt descriptors telligent Vehicles symposium, Parma, Italy, 2004 [12] in a dense overlapping grid gives very good results for [8] D. M. Gavrila and V Philomin. Real-time object detection for person detection, reducing false positive rates by more than smart vehicles. CVPR, Fort Collins, Colorado, USA, pages an order of magnitude relative to the best haar wavelet based 87-93,1999 detector from [17]. We studied the influence of various de- scriptor parameters and concluded that fine-scale gradients, [9] S. loffe and D. A. Forsyth. Probabilistic methods for findin people.CV,43(1):45-68,2001 fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping [101 T Joachims. Making large-scale svm learning practical. In descriptor blocks are all important for good performance B. Schlkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods- Support Vector Learning. The MIT Press We also introduced a new and more challenging pedestrian database, which is publicly available Cambridge, MA, USA, 1999 [11 Y. Ke and R. Sukthankar. Pca-sift: A more distinctive Future work: Although our current linear svm detector is resentation for local image descriptors. CVPR, Washington, reasonably efficient-processing a 320X240 scale-space im DC, USA, pages 66-75, 2004 age(4000 detection windows) in less than a second-there i D.G. Lowe, Distinctive image features from scale-invariant still room for optimization and to further speed up detections keypoints. IJCV, 60(2): 91-110, 2004 t would be useful to develop a coarse-to-fine or rejection- chain style detector based on HOG descriptors. We are also [13] R. K. McConnell. Method of and apparatus for pattern recog nition, January 1986. U.S. Patent No. 4, 567, 610 working on HOG-based detectors that incorporate motion in formation using block matching or optical flow fields. Fi- [14] K. Mikolajczyk and C.Schmid.A nally, although the current fixed-template-style detector has local descriptors. PAMI. 2004. Accepted proven difficult to beat for fully visible pedestrians, humans [15] K. Mikolajczyk and C. Schmid. Scale and affine invariant are highly articulated and we believe that including a parts interest point detectors. 1CV, 60(1): 6386, 2004 based model with a greater degree of local spatial invariance [16]K. Mikolajczyk, C. Schmid, and A. Zisserman. Human detec would help to improve the detection results in more general tion based on a probabilistic assembly of robust part detectors situations The &th ECCv Prague, Czech Republic, volume I, pages 69 81,2004 Acknowledgments. This work was supported by the euro- pean Union research projects ACEMEDIA and PASCAL. We [17 A Mohan, C Papageorgiou, and T Poggio. Example-based thanks Cordelia schmid for many useful comments. SVM Light [10 provided reliable training of large-scale SVms 361, April2001 [18 C. Papagcorgiou and T Poggio. A trainable systcm for object References detection.JCV,38(1):15-33,2000 [1] S. Belongie, J. Malik, and J Puzicha. Matching shapes. The [19] R. Ronfard, C. Schmid, and B. Triggs. Lcarning to parse pic 8th ICCv, vancouver, Canada, pages 454-461, 2001 tures of people. The 7th ECCV, Copenhagen, Denmark, vol- [2V. de poortere, J. Cant, B. Van den Bosch, j. de ume Iv, pages 700-714, 2002 Prins, F. Franscns, and L. Van Gool. Efficient pcde [20 Henry Schneiderman and Takeo Kanade. Object detection trian detection: a test case for svm based categorization using the statistics of parts. IJCV, 56(3): 151-177, 2004 Workshop on Cognitive Vision, 2002. Available online http://www.vision.ethz.ch/cogvis02/ [21] Eric L. Schwartz. Spatial mapping in thc primate sensory pro- lection: analytic structure and relevance to perception Bio- B3 P. Fclzcnszwalb and D. Huttcnlochcr. Efficient matching of logical CyberneticS, 25(4): 181-194, 1977 pictorial structures. CVPR, Hilton Head Island, South Car olin, USA, pages 66-75, 2000 [22] P. Viola, M. J. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. The 9th ICCV, Nice [4]W. T. Frccman and M. Roth. Oricntation histograms for france, volume l, pages 734-741. 2003 hand gesture recognition. Intl. Workshop on Automatic face Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPRo5 1063-6919/05520.00@2005IEEE

资源截图

代码片段和文件信息

评论

共有 条评论