appendix a - home - springer978-1-4757-3399-0/1.pdf · references 133 [30] diamondback vision, inc,...

18
Appendix A To prove the Data Processing Theorem, we need to show that H(WjR) - H(WjD) :s; 0 We do this as follows: H(Wj R) - H(Wj D) d;j H(W) - H(WIR) - H(W) + H(WID) - -H(WIR) + H(WID) 1 = - P(w, r) log P(wlr) 1 + L P( w, d) log P( wid) w,d - LP(w,r)logP(wlr) W,T 1 + L P(w, d) log P(wld) w,d " P(wlr) P(w,d,r) log P(wld) W,d,T " P(wlr) P(d, r)P(wld, r) log P(wld)" W,d,T The last step follows from the definition of conditional probabilities, P(wld,r) = . Since P(w,d,r) = P(w)P(dlw)P(rld), we further have ' P(w)P(dlw)P(rld) P(wld,r) = P(d, r)

Upload: dinhkien

Post on 10-Mar-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Appendix A

To prove the Data Processing Theorem, we need to show that

H(WjR) - H(WjD) :s; 0

We do this as follows:

H(Wj R) - H(Wj D) d;j H(W) - H(WIR) - H(W) + H(WID)

- -H(WIR) + H(WID) 1

= - ~ P(w, r) log P(wlr)

1 + L P( w, d) log P( wid)

w,d

- LP(w,r)logP(wlr) W,T

1 + L P(w, d) log P(wld)

w,d

" P(wlr) ~ P(w,d,r) log P(wld)

W,d,T

" P(wlr) ~ P(d, r)P(wld, r) log P(wld)"

W,d,T

The last step follows from the definition of conditional probabilities,

P(wld,r) = Ppwd~r . Since P(w,d,r) = P(w)P(dlw)P(rld), we further have '

P(w)P(dlw)P(rld) P(wld,r) =

P(d, r)

130

=

=

VISUAL EVENT DETECTION

P{w}P{d, w}P{r, d} P{d, r}P{w}P{d}

P{d,w} P{d}

- P{wld}.

Plugging this result into equation (5.0) we obtain

H{W; R} - H{W; D} = '" P{wlr} L...J P{ d, r }P{ wid} log P{ wid}

w,d,r

= '" ['" p{Wlr}] t P{d, r} ~ P{wld} log P{wld} .

Since 0 ~ ~f:I~~ ~ 1, we have log~f:I~~ ~ O. Therefore, the quantity in brackets is minus a relative entropy, and H{W; R} - H{W; D} ~ O.

References

[1] A. Akutsu and Y. Tonomura, "Video Tomography: An Efficient Method for Camerawork Extraction and Motion Analysis," in the proceedings of ACM Multimedia, ACM, 1994.

[2] T.D. Alter and D. W. Jacobs, "Error Propagation in Full 3D-from 2D Object Recognition", Computer Vision and Pattern Recognition 1994, pp. 892-899, 1994.

[3] F. Arman, R. Depommier, A. Hsu, and M.-y' Chiu, "Content-based Browsing of Video Sequences," in the proceedings of ACM Multimedia, pp. 97-103, 1994.

[4] V. Athitsos, MJ. Swain, and C. Frankel, "Distinguishing Photographs and Graphics on the World Wide Web," in the proceedings of IEEE Workshop on Content-based Access of Image and Video Libraries, in conjunction with CVPR'97, pp. 10-17,1997.

[5] N. Ayache and O.D. Faugeras, "Artificial Vision for Mobile Robots: Stereo Vision and Multisensory Perception,", MIT Press, 1991.

[6] D. Ayers and M. Shah, "Monitoring Human Behavior in an Office Environment," in the proceedings of Workshop on Interpretation of Visual Motion, in conjunction with CVPR'98, 1998.

[7] A. Azerbayejani, et. al., "Real-Time 3D Tracking of the Human Body", MIT Media Lab, Perceptual Computing Section, TR No. 374, 1996.

[8] S. Barnard, "A Stochastic Approach to Stereo Vision", in Readings in Computer Vision: Issues, Problems, Principles and Paradigms, pp. 21-25, 1987.

[9] P. Belhumeur and D. Kriegman, "What Is the Set of Images of an Object Under All Possible Illumination Conditions?," Int. Journal of Computer Vision, 28(3), pp. 245-260, 1998.

[10] P. Belhumeur and G. Hager, "Efficient Region Tracking with Parametric Models of Geom­etry and Illumination," IEEE Trans. PAMI, 20(10), pp. 1025-1039, October 1998.

[11] M. J. Black and Y. Yacoob, "Tracking and Recognizing Rigid and Non-Rigid Facial Motion using Local Parametric Models of Image Motion," in the proceedings of the International Conference on Computer Vision, 1995.

[12] Sing-Tze Bow "Pattern Recognition and Image Preprocessing," Dekker, 1992.

132 VISUAL EVENT DETECTION

[13] S. Belongie, C. Carson, H. Greenspan, J. Malik, "Color- and Texture-Based Image Segmen­tation Using EM and its Application to Content-Based Image Retrieval", in IEEE Workshop on Content based Access of Image and Video Databases, in conjunction with ICCV' 98, 1998.

[14] K. Bowyer, M. Sallam, D. Eggert, and J. Stewman, "Computing The Generalized Aspect Graph For Objects With Moving Parts", in IEEE Transaction on Pattern Analysis and Machine Intelligence, 15:605-610, 1993.

[15] R.A. Brooks, "A Robust Layered Control System for a Mobile Robot," A.I. Memo 864, MIT, 1985.

[16] 0.1. Camps, "Towards a Robust Physics Based Object Recognition System," Lecture Notes in Computer Science (994): Object Representation in Computer Vision, Springer-Verlag, 1995.

[17] J.F. Canny, "Finding Edges and lines in Images," Masters Thesis, Massachusetts Institute of Technology, 1983.

[18] C. Carson, S. Belongie, H. Greenspan, and 1. Malik, "Color- and Texture-Based Image Segmentation using EM and its Application to Image Querying and Classification," submitted to IEEE for possible publication, 1998.

[19] S.-F. Chang, W. Chen, H.J. Meng, H. Sundaram, and D. Zhong, "A Fully Automated Con­tent Based Video Search Engine Supporting Spatio-Temporal Queries," in IEEE Transactions o,n Circuits and Systems for Video Technology, 1998.

[20] B.B. Chaudhuri, N. Sarkar, and P. Kundu, "Improved Fractal Geometry Based Texture Segmentation Technique," in lEE Proceedings, part E, vol. 140, pp. 233-241, 1993.

[21] M.B. Clowes, "On Seeing Things", in Artificial Intelligence, 2, No.1, pp. 76-116, 1971.

[22] R.w. Conners, C.A. Harlow, "A Theoretical Comparison of Texture Algorithms," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, no 3, pp. 204-222,1980.

[23] T.N. Cornsweet, "Visual Perception", Academic Press, New York, 1970.

[24] J.D. Courtney, "Automatic Video Indexing via Object Motion Analysis," Pattern Recog­nition, vol. 30, no. 4, pp. 607-626, 1997.

[25] T. Cover and 1. Thomas, "Elements of Information Theory," Wiley & Sons, New York, 1991.

[26] G. Cybenko, "Approximation by Superposition of Sigmoidal Function," Mathematics of Control. Signals. and Systems. Chapter 2, pp. 303-314, 1989.

[27] M. Davis, "Media Streams: An Iconic Visual Language for Video Annotation" in the pro­ceedings of 1993 IEEE Symposium on Visual Languages in Bergen. Norway. IEEE Computer Society Press, pp. 196-202, 1993.

[28] 1. Davis and M. Shah, "Visual Gesture Recognition", in lEE proceedings of Vision. Image and Signal Processing. Vol. 141. No.2, pp. 101-106, 1994.

[29] Y.F. Day, A. Khokhar, S. Dagtas, and A. Ghafoor, "Spatio-Temporal Modeling of Video Data for On-line Query Processing," in the proceedings of IEEE International Conference on Multimedia Computer Systems. 1995.

REFERENCES 133

[30] DiamondBack Vision, Inc, "Video Segmentation using Statistical Pixel Modeling," US Patent filed, 2001.

[31] A Del Bimbo, E. Vicario, D. Zingoni, "A Spatial Logic for Symbolic Description of Image Contents," in the Journal of Visual Languages and Computing, vol. 5, pp. 267-286, 1994.

[32] AP. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via theEM algorithm", in Journal of the Royal Statistical Society, Series B, 39(1): 1-38, 1977.

[33] N. Dimitrova and F. Golshani, "Motion Recovery for Video Content Classification," in ACM Transactions on Information Systems, vol. 13, no 4, pp 408-439, 1995.

[34] P. England, RB. Allen, M. Sullivan, and A. Heybey, "I/Browse: The Bellcore Video Library Toolkit," in SPIE proceedings on Storage and Retrievalfor Image and Video Databases, pp. 254-264, 1996.

[35] T.Kato, T. Kurita, and H. Shimogaki, "Intelligent Visual Interaction with Image Database Systems - Toward the Multimedia Personal Interface," in the Journal of Information Pro­cessing .. Vol. 14, No.2, 1991, pp. 134-143.

[36] S. Fahlman, "Faster-Learning Variations on Back-Propagation: An Empirical Study," in the proceedings of the Connectionist Models Summer School, Morgan Kaufmann, 1988.

[37] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, MIT Press, 1993.

[38] RA Fisher, "The use of multiple measurements in taxonomic problems," in Ann. Eugenics 7 (part 2) pp. 179-188, 1936.

[39] M.A. Fischler "Robotic Vision: Sketching Natural Scenes", in the proceedings of the 1996 ARPA IU Workshop, 1996.

[40] I.Fogel and D.Sagi, "Gabor Filters as Texture Discriminator," in the Journal of Biological Cybernetics, vol. 61, pp. 103-113, 1989.

[41] W.T. Freeman and E.H. Adelson, "The Design and Use of Steerable Filters," in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, pp. 891-906, 1991.

[42] D. Gabor, ''Theory of communication," in Journal of the lEE, vol. 93, pp. 429-457, 1946.

[43] R Gonzalez, "Digital Image Processing," published by Addison-Wesley

[44] W.E.L. Grimson, "The Combinatorics of Heuristic Search Termination for Object Recog­nition in cluttered environments," AI. Memo No. 1111, MIT, Cambridge MA, May 1990.

[45] W.E.L. Grimson, "Object Recognition by Computer," MIT Press, Cambridge, MA, 1990.

[46] W.E.L. Grimson, "The Effect of Indexing on the Complexity of Object Recognition," AI. Memo No. 1226, MIT, Cambridge MA, April 1990.

[47] A Guzman, "Decomposition of a Visual Scene into Three-Dimensional Bodies," in the proceedings of the FJCC, 1968.

[48] G.D. Hager and P.N. Belhumeur, "Real-Time Tracking of Image Regions with Changes in Geometry and Illumination," in the proceedings of the Conference on Vision and Pattern Recognition, 1996.

134 VISUAL EVENT DETECTION

[49] D.l Hand, "Construction and Assessment of Classification Rules," Wiley, 1997.

[50] N. Haering, Z. Myles, and N. da Vitoria Lobo, "Features and Classification Methods to Lo­cate Deciduous Trees in Images", in Journal on Computer Vision and Image Understanding, pp. 133-149, 1999.

[51] N. Haering, R.J. Qian, and M.I. Sezan, "Detecting Hunts in Wildlife Videos," in the pro­ceedings of the International Conference on Multimedia Computing and Systems, 1999.

[52] R.M. Haralick, K. Shanmugam, and I. Dinstein, "Textural Features for Image Classifica­tion," in IEEE Transactions on Systems Man and Cybernetics, vol. 3, no 6, pp. 610-621, 1973.

[53] D. Panjwani and G. Healey, "Markov Random Field Models for Unsupervised Segmen­tation of Textured Color Images," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(10):939-954, October, 1995.

[54] M. Hoetter, "Differential Estimation of the Global Motion Parameters Zoom and Pan," in Signal Processing, (16), pp. 249-265, 1989.

[55] B.K.P. Hom, "Shape from Shading: A Method for Obtaining the Shape of A Smooth, Opaque Object from One View", PhD thesis, Massachusetts Institute of Technology, 1970.

[56] B.K.P. Hom, "Height and Gradient from Shading", in International Journal of Computer Vision, 5:37-75, 1990.

[57] M.R. Naphade, T. Krist jansson, B. Frey, and T.S. Huang, "Probabilistic Multimedia Objects (Multijects): A Novel Approach to Video Indexing and Retrieval in Multimedia Systmes", in the proceedings of the International Conference on Image Processing, 1998.

[58] D.A. Huffman, "Impossible Objects as Nonsense Sentences," in Machine Intelligence, 6, pp. 295-323, 1971.

[59] M.S. Lew, K. Lempinen, N. Huijsmans, "Webcrawling Using Sketches", in the proceedings of Visual'97, 1997.

[60] S.S. Intille, "Tracking Using a Local Closed-World Assumption: Tracking in the Football Domain," Master Thesis, Massachusetts Institute of Technology, Media Lab, 1994.

[61] M. Irani, P. Anandan, and S. Hsu, "Mosaic Based Representations of Video Sequences and Their Applications", Proc. 5th International Conference on Computer Vision, pp. 605-611, 1995.

[62] M. Irani and P. Anandan, "A unified approach to moving object detection in 2D and 3D scenes," in IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 707-718, 1997.

[63] G. Iyengar and A. Lippman, "Models for Automatic Classification of Video Sequences", SPIE Proc. Storage and Retrievalfor Image and Video Databases, pp. 216-227, 1997.

[64] D.W. Jacobs, "Recognizing 3-D Objects Using 2-D Images," PhD Thesis, Massachusetts Institute of Technology, 1992.

[65] R. Jain, R. Kasturi and B. Schunck, "Machine Vision," McGraw Hill, 1995.

REFERENCES 135

[66] P. Kanerva, "Sparse Distributed Memory" MIT Press, 1990.

[67] T. Kawashima, K. Tateyama, T. Iijima, and Y. Aoki, "Indexing of Baseball Telcast for Content-based Video Retrieval," in the proceedings ofthe International Conference on Image Processing, pp. 871-875, 1998.

[68] J.M. Keller and S. Chen, "Texture Description and Segmentation through Fractal Geome­try," in Journal of Computer Vision, Graphics and Image Processing, vol. 45, pp. 150-166, 1989.

[69] S. Khan and M. Shah, "Tracking People in Presence of Occlusion", in Asian Conference on Computer Vision, 2000.

[70] J. Krumm, "Space Frequency Shape Inference and Segmentation of 3D Surfaces", PhD Thesis, Carnegie Mellon University, 1993.

[71] S.-Y. Lee, M.-K. Shan, w.-P. Yang, "Similarity Retrieval of Iconic Image Database," in Pattern Recognition, Vol. 22, No.6, pp 675-682, 1989.

[72] Y. Bengio, Y. LeCun, D. Henderson, "Globally Trained Handwritten Word Recognizer using Spatial Represention, Convolutional Neural Networks and Hidden Markov Models", in Neural Networks, 1994.

[73] J. Ma and S. Olsen, "Depth from Zooming", in Journal of the Optical Society of America, 7 :no.4: 1883-1890, 1990.

[74] D. Marr, "Vision", W. H. Freeman and Company, New York, NY, 1982

[75] R.L. Lagendijk, A. Hanjalic, M. Ceccarelli, M. Soletic, and E. Persoon, "Visual Search in a SMASH System", in the proceedings of the International Conference on Image Processing, pp. 671-674, 1997.

[76] J. Malik and P. Perona, "Preattentive texture discrimination with early vision mechanisms," in Journal of Optical Society of America A, 7 (2), May 1990, pp. 923-932.

[77] B.Manjunath and W. Ma, "Texture Features for Browsing and Retrieval of Image Data," in IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837-859, 1996.

[78] B.F.J. Manly, "Multivariate Statistical Methods," Chapman & Hall, 1994.

[79] S. Marks and OJ. Dunn, "Discriminant Functions when Covariance Matrices are Unequal", J. Amer. Statist. Assoc., 69,1974.

[80] G. Smith and I. Burns, "MeasTex", http://www.cssip.elec.uq.edu.aurguy/meastexlmeastex.html.

[81] G.A. Miller, "English Verbs of Motion: A Case Study in Semantics and lexical Memory," in Coding Process in Human Memory, Martin and Melton, Eds., Washington, DC: Winston, 1972.

[82] J.R. Miller, J.R. Freemantle, MJ. Belanger, C.D. Elvidge and M.G. Boyer, "Potential for determination of leaf chlorophyll content using AVIRIS", in the proceedings of the Second Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Workshop, pp. 72-77, June 4-8, 1990, Pasadena, Calif. USA.

136 VISUAL EVENT DETECTION

[83] Y. Awaya, J.R. Miller and J.R. Freemantle, "Background Effects on Reflectance and Deriva­tives in an Open-Canopy Forest using Airborne Imaging Spectrometer Data", in the proceed­ings of the XVII Congress of ISPRS Aug. 2-14, 1992 Washington, D.C., USA. pp. 836-843.

[84] T. Minka, "An Image Database Browser that Learns from User Interaction," Masters Thesis, M.I.T. Media Lab Perceptual Computing Group Technical Report No. 365, 1996.

[85] H.-H. Nagel, "From Image Sequences towards Conceptual Descriptions," Image and Vision Computing 6:2, pp. 59-74, 1988.

[86] H. Murase and S.K. Nayar, "Visual Learning and Recognition of 3-D Objects from Ap­pearance", International Journal of Computer Vision, 14:5-24, January 1995.

[87] M. Hebert, 1. Ponce, T.E. Boult, A. Gross, and D. Forsyth, "Report of the NSF/ARPA Workshop on 3D Object Representation for Computer Vision", Dec. 5-7, 1994.

[88] S. Peleg, 1. Naor, R. Hartley, and D. Avnir, "Multiple Resolution Texture Analysis and Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no 4, pp. 518-523, 1984.

[89] A.P. Pentland, "Fractal-based Description of Natural Scenes," in IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 6, no 6, pp. 661-674, 1984.

[90] AP. Pentland, "Shape Information from Shading: A Theory about Human Perception", in Second International Conference on Computer Vision, pp 403-413, Computer Society Press, 1988.

[91] M.A. Turk and AP. Pentland, "Face Recognition Using Eigenfaces", in the proceedings of the conference on Computer Vision and Pattern Recognition, pp. 586-591, 1991.

[92] P. Perona, "Deformable Kernels for Early Vision," in IEEE Transactions on Pattern Anal­ysis and Machine Intelligence, Vol. 17, pp. 488-499, 1995.

[93] AP. Pentland, R.w. Picard, and S. Sclaroff, "Photobook: Tools for Content-Based Manip­ulation of Image Databases," in SPIE proceedings of Storage and Retrieval for Image and Video Databases II, Vol. 2,185, SPIE, Bellingham, Wash., 1994, pp. 34-47.

[94] R. W. Picard, "A Society of Models for Video and Image Libraries," Massachusetts Institute of Technology Media Lab Perceptual Computing Group Technical Report No. 360, 1996.

[95] B. Vijayakumar, D.J. Kriegrnan, and 1. Ponce "Invariant-Based Recognition of Complex Curved 3D Objects from Image Contours", in the proceedings of the International Conference on Computer Vision, pp. 508-514, 1995.

[96] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, 1. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, "Query by Image and Video Content: The QBIC System," in IEEE Computer, Vol. 28, No.9, pp. 23-32, September, 1995.

[97] R.J. Qian, M.I. Sezan and K.E. Matthews, "A Robust Real-Time Face Tracking Algorithm", in the proceedings of the International Conference on Image Processing, pp. 131-135, 1998.

[98] AC. Rencher, "Methods of Multivariate Analysis," Wiley, 1996.

[99] 1.M. Rubin, "Categories of Visual Motion," Ph.D. Thesis, 1977.

REFERENCES 137

[100] R.H. Riffenburgh and C.W.Clunies-Ross, "Linear Discriminant Analysis," Pacific Sci­ence, 14:251-256, 1960.

[101] D. Saur, y'-P. Tan, S.R. Kularni, and P.I. Ramadge, "Automated Analysis and Annotation of Basketball Video," in SPIE proceedings of Storage and Retrieval for Image and Video Databases, pp. 176-187, 1997.

[102] S. Satoh and T. Kanade, "Name It: Associating of Face and Name in Video", Carnegie Mellon University Computer Science Department Technical Report CMU CS-95-I86, 1996.

[103] H.S. Sawhney, S. Ayer, and M. Gorkani, "Model-based 2d & 3d Dominant Motion Es­timation for Mosaicing and Video Representation," in the proceedings of the International Conference on Computer Vision, pages 583-590, 1995.

[104] P.-S. Tsai and M. Shah, "A Fast Linear Shape From Shading Algorithm," in the proceed­ings of the conference on Computer Vision and Pattern Recognition, pp. 734-736, 1992.

[105] B. Shahraray and D. Gibbon, "Automatic Generation of Pictorial Transcripts of Video Programs," in Multimedia Computing and Networking, SPIE 2417, pp. 512-528, 1995.

[106] M. Smith and T. Kanade, "Video Skimming for Quick Browsing Based on Audio and Image Characterization," CMU Computer Science Department Technical Report CMU CS-95-186, 1995.

[107] D.F. Specht, "Generation of Polynomial Discriminant Functions for Pattern Recognition", in IEEE Transactions on Electronic Computers, vol. EC-16, no. 3, pp. 308-319, 1967.

[108] M.l Swain and D.H. Ballard, "Color Indexing" in International Journal of Computer Vision, 7:1, pp. 11-32, 1991.

[109] R. Szeliski, "Video Mosaics for Virtual Environments", IEEE Computer Graphics and Applications, 16(2), pp. 22-30, 1996.

[110] M. Szummer, "Temporal Texture Modeling," Master Thesis, Massachusetts Institute of Technology, Media Lab, 1995.

[111] M. Szummer and R.W. Picard, "Indoor-outdoor image classification," in the pro­ceedings of the IEEE workshop on Content based Access of Image and Video Databases, in conjunction with ICCV'98, (Bombay, India), Jan. 1998. http://www­white.media.mit.edulpeople/szummer/profile.html

[112] K. Toyama and G.D. Hager, "Incremental Focus of Attention for Robust Visual Tracking," in the proceedings of the Conference on Vision and Pattern Recognition', 1996.

[113] J.K. Tsotsos, J. Mylopoulos, H.D. Covvey, S.w. Zucker, "A Framework for Visual Mo­tion Understanding", in IEEE Transactions on Pattern Analysis and Machine Intelligence", Special Issue on Computer Analysis of Time-Varying Imagery", Nov. 1980, p563 - 573.

[114] A. Gupta and R. Jain, "Visual information retrieval", Comm. Assoc. Compo Mach., 40(5), May 1997

[115] A. Vailaya, A. Jain, and H.I. Zhang, "On Image Classification: City Images vs. Land­scapes," in the proceedings of the IEEE workshop on Content based Access of Image and Video Libraries, June, 1998.

138 VISUAL EVENT DETECTION

[116] N. Vasconcelos and A. Lippman, "A Bayesian Framework for Semantic Content Charac­terization," in the proceedings of the conference on Computer Vision and Pattern Recognition, pp. 566-571, 1998.

[117] J.S. Weszka, C.R. Dyer, and A. Rosenfeld, "A Comparative Study of Texture measures for Terrain Classification," IEEE Transactions on Systems Man and Cybernetics, vol. 6, no 4, pp. 269-285, 1976.

[118] R.R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, Statistical Mod­eling and Decision Science Series, Academic Press, 1997.

[119] M. Yeung, B.-L. Yeo, and B. Liu, "Extracting Story Units from Long Programs for Video Browsing and Navigation," in the proceedings of International Conference on Multimedia Computing and Systems, 1996.

[120] M. Yeung, and B. -L. Yeo, "Video Visualization for Compact Presentation and Fast Brows­ing of Pictorial Content," IEEE Transactions on Cicuits and Systems for Video Technology, vol. 7, no 5, pp. 771-785,1996.

[121] D. Yow, B.L. Yeo, M. Yeung, and G. Liu, ''Analysis and Presentation of Soccer Highlights from Digital Video," in the proceedings of the Asian Conference on Computer Vision, 1995.

[122] H.J. Zhang, S.w. Smoliar, and J.H. Wu, "Content-Based Video Browsing Tools," in SPIE proceedings of Storage and Retrievalfor Image and Video Databases, pp. 389-398, 1995.

Index

10 image projection, 49

abstraction, 4 accuracy, 117 actions, 1 alignment, 2, 3 alphabet, 8 amorphous, 19 applications, 126 articulate natural objects, 19 audio, 117 autonomous, 5 awareness, 14

background differencing, 119 background knowledge, 104 background model, 119 background subtraction, 119 bar-edges, 32 basis functions, x, 9, 29,32,34,68 biology, 9

microscopy, 9 neutron activation analysis, 9

bottom-up approach, 4

chemistry,9 chromatography, 9 spectroscopy, 9

chemoreceptors, 9 classification, 8, 19,20,65, 117

animals, 93 classification error, 71 clouds, 93, 94 convergence, 85 convergence speed, 73 deciduous trees, 86, 88 exhaust, 94 feature set, 71 generalization, 80, 115 grass, 93

human-made structures, 94 memorization, 80 misclassification, 101, 105 over-training, 80 pre-processing data, 82 problem space, 73 quality of solution, 73, 85 region classification, 20 robust classification, 9 robustness, 22 rock,93 sky,93,94 task simplification, 74 training set, 71 trees, 93

classifier, 19 classifiers, 65, 111

back-propagation neural network, 79 best discriminating functions, 67 BPNN,79 CNN,65,79 comparison, 71, 79 convergence speed, 81 convolutional neural network, 65, 79 decision tree, 111 eigen-analysis based, 65, 68 Fisher's linear discriminating functions,

67 limitations, 66 linear, 65,79, 114 linear discriminating functions, 67 maximally discriminating functions, 67 neural network, 19,79 performance, 71 quadtatic, 65, 79,114 quadtatic discriminating functions, 67

clutter, 118 color, 16, 19-21,23, 118

HSI,52 hue,52

140

intensity, 52 opponent colors, 24 saturation, 52

conunonsense, 14, 104 complexity, 118 .

analysis, 118 compression artifacts, 21 compression schemes, 1 computer vision, 8, 9

alternative approach, 9 constrained environments, 12 early efforts, 8 paradigm, 9

condition number, 70, 75 context, 14,104 conventional approach to object recognition, 2 convolution, 30

kernel, 32, 81 comer detector, 120 correlation, 113 correspondence, 118

matching, 125 problem, 118

covariance,22,30,66,113 between-class covariance, 70 covariance-based classification, 66 limitations, 30 numerical stability, 74 within-class covariance, 70

cross fades, 98

data fusion, 122 Data Processing Theorem, v, II, 129 decision surfaces, 6 descriptions of video content, 6 descriptors, 21, 53

intermediate-level descriptors, 53 object, 54 spatial,54 temporal, 54

low-level descriptors, 21 color, 21 motion, 21 spatial texture, 21 spatio-temporal texture, 21 texture, 21

difference image, 20, 48 dimensionality, 7 discemibility of patterns, 8 discussion, 111 dissolves, 98 distance metrics, 39 domain specific inference, 17 dot-product, 68 duality,31

edge

VISUAL EVENT DETECTION

bar-edges, 32 energy, 32 lines, 32 power, 33 step-edges, 32

edge detection, 2, 3 eigen-analysis, 68, 75

data pre-processing, 69 eigen-decomposition,76 eigen-value, 68 eigen-vector, 68 global analysis, 69 interpretation, 68 most discriminating features, 70 most expressive features, 69

electromagnetic waves, 10 infrared,lO 1R,10 microwave, 10 radar, 10 ultra violet, 10 UV,10 visible, 10

encapsulation, 117 energy, 32 entropy,21,37,38,114

entropy distance, 39, 114 joint entropy, 38, 114 Kullback-Leibler divergence, 39 marginal entropy, 38 mutual information, 39, 114 relative entropy, 125

EVENT DETECTION DESIGN OF EVENT DETECTORS, v,19

eventdetection,v,2,8, 12, 14, 19,94, 116 animal hunts, 14 bottom-up approach, 13 depositing an object, 14 entering a room, 14 explosions, 14 hunt detection, 16 landing events, 16 office monitoring, 14 rocket launches, 16 semantic event detection, 16

event inference, 58,99 events, 1,58,94,99,116, 126

animal hunts, 19 applications, 126 complex events, 9 detection failure, 106, 109 domain-spcific knowledge, 58 hunt event model, 59 hunts, 58, 99

state diagram, 59 landing event model, 61 landings, 19,60,100

INDEX

state diagram, 61 misclassification, 10 1 object shape, 58 object variations, 60 precision, 99 primitive events, 9 recall,99 rocket launch event model, 62 rocket launches, 19,60,106

state diagram, 62 rules, 58

F-test, 67 false negatives, 3 false positives, 3 fat hierarchy of abstractions, 6 feature de-correlation, 74 feature extraction, 19 feature relevance

F-test, 67 Wilks test, 67

feature representation, 19 feature sets

each method in isolation, 72 randomly selected, 72

feature space representation, 47 feature vectors, 4 features, 2, 65, III

best subset, 84 blue, 75 color, 19,21,23,86

blue, 21, 23 green, 21, 23 HSI,23 HSV,23 hue,23 red, 21, 23 RGB,23 saturation, 23

color edges, 3 comers, 3 correlated features, 74 correlation between features, 75 de-correlation, 65,74 edges, 3 entropy, 21, 37, 38 extraction methods

each method in isolation, 71 leaving one out, 73

feature extraction, 21, 65 feature relevance, 65

covariance-based, 65 F-test, 67 Wilks test, 67

feature space, 86 Fourier transform, 19; 21, 27 fractal dimension, 19,21,36,86

Gabor filter, 29 Gabor filter bank, 21 Gabor filters, 19, 86 good subsets of features, 83

a greedy algorithm, 83 better than greedy, 83

141

gray-level co-occurrence matrix, 19, 21, 24,86

Angular Second Moment, 25 Contrast, 26 Correlation, 27 Difference Angular Second Moment,

26 Difference Entropy, 27 Difference Variance, 27 Entropy, 26 Inverse Difference Moment, 26 Mean, 26 Prominence, 27 Shade,27 Sum Entropy, 26

gray-level intensity, 23 green, 75 HSI,75 hue, 75 independence, 113 intensity, 75 interest points, 3 linearly dependent features, 113 motion, 21 random sets, 82, 115 red, 75 redundancy, 22 relevance of, 22 RGB,75 rich feature set, 21 saturation, 75 spatial,22 spatio-temporal, 22 steerable filter

orthogonal basis set, 32 steerable filters, 21, 31, 116

infinite basis set, 32 subset selection, 82 texture, 21

regularity, 28 texture boundaries, 3

flat hierarchy of abstractions, 6 flavors, 9 Fourier series, 35 Fourier transform, 19,21,27 fractal dimension, 19,36

multi-fractals,37 self-similarity, 36

frame differencing, 48 residual error, 105

framework, ix, 20

142

frequency domain, 27 functions, 2

basis functions, x, 9, 29, 32, 34 cosine function, 29 cumulative density function, 30 even function, 32 Fourier function, 27 Fourier series, 35 fractal dimension- function, 37 Gabor function, 29 Gaussian function, 30, 33 Hilbert function, 33, 35 interpolation functions, 34 linear discriminating functions, 67 multi-fractals, 37 non-orthogonal basis functions, 30 odd function, 32 orthogonal function, 33 probability density function, 30 quadratic discriminating functions, 67 sine function, 29 sinusoidal function, 30

fuse, 2

Gabor filter bank, 21 Gabor filters, 19, 29

complexity, 30 multi resolution, 19

Gabor functions, 29 geometries, 42

affine geometry, 44 Euclidean geometry, 43 projective geometry, 44 similarity geometry, 43 translational geometry, 43

GLCM,19 global motion estimation, 20, 42, 94 global motion parameters, 2 good features for classification, 70, 77 gray-level co-occurrence matrix, 19,24 gray-level intensity, 23 greedy algorithm, 77 grouping, 118

hierarchy of abstractions, 2 high dimensional representations, 9 high-dimensional spaces, 4 Hilbert function, 35 Hilbert transform, 35 histogram intersection, 52 histogram normalization, 52 HSI,52 hue, 52

illuminationinvariance,125 image categorization, 15 image pyramid, 46, 94

VISUAL EVENT DETECTION

image retrieval, 15 graphics, 15 indoor vs. outdoor, 15 landscape vs. cityscape, 15 logos, 15 shape descriptors, 15 tree vs. non-tree regions, 15

independence, 113 indexing, 118 information, 4, 5, 8, 11 intensity, 52 intermediate-level descriptors, 53

object, 54 spatial, 54 temporal, 54

intractable, 3 invariance, 28

frequency invariance, 28 rotation invariance, 28

inverse, 70 invertibility, 70 isotropic scaling, 37

joint entropy, 38

Kanerva, 4 Kullback-Leibler divergence, 39,125

labeling, 19 language parsing, 8 lens distortion, 45 lighting changes, 21,45 linear classifier, 114 linear combination, 68 linear dependence, 113 linear redundancy, 22 linearly separable, 7 lines, 32 low-level descriptors, 21 low-level features, 2

Markov chain, 11 matching, 2 measures, 2

blue, 75 colOr, 12, 16, 19,21,23

blue, 21, 23 green, 21, 23 HSI,23 HSV,23 hue, 23 opponent color, 21 opponent colors, 24 red, 21, 23 RGB,23 saturation, 23

correlated measures, 74 entropy, 3, 21, 37, 38

INDEX

Fourier transform, 19, 21, 27 annular ring sampling, 28 parallel-slit sampling, 28 wedge sampling, 28

fractal dimension, 3, 19,21,36 frequency, 3 Gabor filter bank, 21 Gabor filters, 19, 29 gray-level co-occurrence matrix, 19, 21,

24 Angular Second Moment, 25 Contrast, 26 Correlation, 27 Difference Angular Second Moment,

26 Difference Entropy, 27 Difference Variance, 27 Entropy, 26 Inverse Difference Moment, 26 Mean,26 Prominence, 27 Sbade,27 Sum Entropy, 26

gray-level intensity, 23 green, 75 HSI,75 hue, 75 intensity, 21, 75 motion, 12, 16,21 orientation, 3 red, 75 RGB,75 saturation, 75 spatial,22 spatial texture, 21 spatio-temporal, 22 spatio-temporal texture, 21 steerable filters, 21, 31 texture, 12, 16, 21

mesures

orientation, 32 regularity, 28 strength,32

fractal dimension self-similarity, 36

minimally correlated features, 70 mosaics, 119 motion, 16,21

affine model, 44 dominant motion, 49 Euclidean model, 43 motion model, 44 motion-blob, 48 projective model, 44 qualitative measures, II, 55 quantitative measures, II, 55 similarity model, 43

translational model, 43 motion estimation, 19,42,94

affine, 120 aperture problem, 45,105 background differencing, 51 center of an object, 49 difference image, 48 estimation failure, 105 frame-to-frame, 20, 94, 119 frame-to-frame differencing, 51 frame-ta-mosaic differencing, 51 global motion estimation, 94 image pyramid, 94 interlaced video, 44 large displacements, 44 lens distortion, 45 lighting changes, 45 motion blur, 45 motion compensation, 94, 97 motion-blob detection, 51, 96 occlusion, 45 orthographic camera model, 47 parallax, 46 perspective, 120 predicted motion, 94 prediction, 47 problems, 44 qualitative measures, 55 quantitative measures, 55 residnal error, 105 robust estimation, 50

trimmed mean, 50 trimmed standard deviation, 50

robust estimation of the mean, 49 size of an object, 49

motion models, 42 motion-blob, 48

absence of motion-blobs, 52 motion information, 51 multiple motion-blobs, 52 region-classification information, 51 spatial information, 51 spatio-temporal information, 51

motion-blob detection, 42, 51, 96 motion-blob information, 51 motion-blob verification, 51 moving blobs, 16 multi resolution, 19

Gabor filters, 19 multi-fractals, 37 multi-resolution image analysis, 30 mutual information, 39

neural network, 9,16,19-21,115,117 activation function, 40

143

back-propagation, 19,21 back-propagation neural network, 40, 115,

116

144

CNN,65 convolutional neural network, 65 hidden layer, 40 sigmoidal activation function, 40

non-rigid, 19 non-rigid deformations, 125 normal distribution, 114 numerical stability, 74

object hypotheses, 4 object recognition, v, 8,12,19,86, 115

segmentation, 12 aligument algorithms, 12 animals, 13 automatic model construction, 12 block worlds, 12 blocks, 12 bottom-up approach, 13 chicken-and-egg problem, 13 clouds, 13 complexity, 3 deciduous trees, 16 end-to-end recognition systems, 12 fire, 13 geometry, 14 grass, 13 human-made objects, 13 if it looks like a duck ... , 13 indexing, 12 model to image aligument, 3

2D to 2D aligument, 3 3D to 2D aligument, 3

model-based vision, 12 mountains, 13 natural objects, 13 post estimation, 12 rocks, 13 segmentation, 13 shape-based representations, 12 sky,13 staplers, 12 telephones, 12 top-down approach, 14 toy-cars, 12 trees, 13 water, 13

objects, 9 Occam's Razor, 7 occlusion, 45, 118 odors, 9 opponent colors, 24 orientation, 30, 32 orthogonal, 32, 68 orthogonality, 113 overlays, 98

Parzen window, 39

VISUAL EVENT DETECTION

physics, 9 chromatography, 9 spectroscopy, 9

post-production editing artifacts, 117 precipitation, 8

quadratic classifier, 114 quadrature pair, 32

region classification, 20, 40 ambiguities, 41 categorization, 42 classes, 41

registration, 2 relative entropy, 125 relevant primitives, 9 remote sensing, v, \0

air quality, 10 algae growth, 10 chemical warfare agents, \0 coniferous forest, \0 crop conditions, \0 deciduous forest, \0 desert, 10 evaporation, 10 freshwater bodies, 10 granite soil, 10 limestone soil, \0 marsh, \0 metals, \0 minerals, \0 plankton blooms, \0 rainforest, 10 saltwater bodies, 10 shrub,10 soil, \0 steppe, 10 swamp, 10 tundra, \0 urban areas, \0 water, 10

results, 79 rich image descriptions, 4, 5, 21, 125 rich intermediate representations, 2 rich internal representation, 9 rich object descriptions, 11 robotics, 13 robust estimation of the mean, 49 robustuess, 22, 117

saturation, 52 scalability, 117 scale, 3 scales, 30 scene descriptions, 9 scents, 9 segmentation, 94, 96, 97, 119 selection, 118

INDEX

self-similarity, 36 semantic event detection, 16 semantic events, 19 semantics, 2, 15 shadows, 118, 125 shape, 118 shot boundary, 16, 19,52

artificial, 53,97 content localization, 53 detection, 20 motion homogeneity, 53 real, 53

shot descriptions, 16 shot detection, 52 shot features, 17 shotsUDUIrnIY,21,53,97

domain specific information, 54, 55 hunts, 55 landings, 56 predicates, 55 range of shot, 54 rocket launches, 57 shot information, 54 shot statistics, 54

signature responses, 10 signatures, v, 4, 10, 117

antennae, 10 blue, 11 brightness, 11 green, 11 IR,1O lenses, 10 opponent colors, 11 red, 11 remote sensing, v, 10 signature based recognition, v, 9 signature based tracking, 10 signature responses, 10 UV,1O visual signatures, v, 10

singular matrix, 70 srnarts,4 software re-use, 117 spatial domain, 27 spatio-temporal auto-regression with moving av-

erage,22 specular reflectious, 118 STARMA,22 statistics, 30

cdf,30 covariance, 30 cumulative density function, 30 maximum, 30 mean, 30 median, 30 minimum, 30 pdf, 30

probability density function, 30 variance, 30

steerable filters, 21, 31, 116 quadrature pair, 32

step-edges, 32 sUbsumption architecture, 13 sunshine, 8 syntactic primitives, 9 syntactic structnrization, 15 syntactic visual measures, 8

taxonomy, 54 temporal texture, 125 texture, 16,20,21, 118

amount of texture, 46 complexity, 37 energy, 32 lack of texture, 45,46 orientation, 32 regularity, 28, 37 roughness, 37 smoothness, 37 strength, 32 uniformity, 37

top-down architectures, 4 tracking, 125

human face tracking, 49 transform, 27, 33

affine transform, 44 Euclidean transform, 43 Fourier transform, 27, 29, 31 Hilbert transform, 33, 35 projective transform, 44 similarity transform, 43 translational transform, 43

video categorization, 15 video retrieval, 15

action movies, 15 baseball, 16 basketball, 16 close-ups, 15 crowds, 15 fast browsing, 16 football, 16 key frame extraction, 15 news, 15 query-by-sketch, 16 semantically relevant, 15 shot boundary detection, 15 shot clustering, 15 soccer, 16 sports, 15 surveillance, 16 table of content creation, 15 video skimming, 15 video summarization, 15

viewing angles, 21

145

146

visual languages, 54

visual primitives, v, 8

visual vocabulary, 54

VISUAL EVENT DETECTION

Wilks test, 67