Computer Vision and Artificial Intelligence

Gang Hua

Gang Hua, Ph.D.

Principal Researcher/Research Manatger

Microsoft Research Asia

firstnamelastname AT 


[Vision Lab][Teaching] @Stevens Institute of Technology

CS 598 Visual Information Retrieval

Term: Spring 201
Prof. Gang Hua
Monday 6:15pm – 8:45pm
Building/Room: McLean Chemical Sciences Building 414 
Office Hour
: Wednesday 4:00pm—5:00pm by appointment
Office Hour Location: Lieb Building/Room 305
Course Assistant:
Haoxiang Li
Course Website

Course Overview:
Visual information retrieval studies the processing, indexing, querying, organization, classification, search, and browsing of visual information from images, videos, and other new emerging visual media. This course will cover traditional techniques as well as recent advances in visual information retrieval, especially under the context of web-scale image and video search. Students will acquire in-depth knowledge on state-of-the-art algorithms and technologies to transform unstructured visual data into structured representation for indexing and retrieval. These algorithms and technologies have empowered a broad range of applications in internet image and video search engine mobile augmented reality, location recognition, and online shopping, etc.

CS 182 or CS 385 or CS 590, or per instructor’s permission

Text Books:
None (see a list of suggested online references below)

The students will be graded based on course participation (10%), one written homework 10%, 4 Course Projects ( Project #1 -- 10%, Project #2--10%, Project #3 -- 15%, Project #4 -- 15% ) and a Final Project 30% (15% system demo, 15% final report). All projects will be team project. The final project will be based on the first 4 project. Final grade: A-- 90% to 100%; B--80% to 89%; C-- 70% to 79%; D--60% to 69%; F -- < 60% .





Suggested Reading





Introduction to Computer Vision

Ref. [1-2]

HW#1 & Teaming report

Lecture I

2 01/27/2014 No class (Prof. Hua is Traveling)  
3 02/03/2014 Class canceled due to snow storm      



Representation: color, shape, and texture [4-7] HW#1 Due, Project I out Lecture II



Representation: invariant local descriptor

[8-13]   Lecture III
6 02/24/2014

Representation: feature coding and pooling


Project I due, Project II out

Lecture IV



Representation: attributes and semantics [20-22]   Lecture V



No class (Spring break)  
8 03/17/2014

Indexing: hashing


Project II due, Project III out

Guest Lecture by Dr. Wei Liu (IBM Research)



Spatiotemporal visual representation

[27-28] Lecture VII


04/01/2014 Indexing: TF-IDF, and inverted file [29] Project III due, Project IV out Lecture VIII



Indexing: words and pictures


  Lecture IX



Retrieval: User Interaction

[3] Project IV due, Final Project out

Lecture X



Case study: IBM Multimedia Analysis and Retrieval System TBA  

Guest Lecture by Dr. John R. Smith (IBM Research)



Case study: sky search TBA



Final Presentation and Competition & Final Report Due

Final Project Due

Required references:
  1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. (E-version online at

  2.  A.W. M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain. Content Based Image Retrieval at the End of the Early Years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000.

  3.  S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine In: World-Wide Web Conference, 1998.

  4.  R. Datta, D. Joshi, J. Li, J.Z. Wang. Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys, 2008.

  5. M. Flickner, et al. Query by Image and Video Content: The QBIC System. IEEE Computer, 1995.

  6. J.R. Smith and S.F. Chang. Visually Searching the Web for Content. IEEE MultiMedia, 1997.

  7. T. Deselaers, D. Keysers, H. Ney. Features for Image Retrieval: An Experimental Comparison. Information Retrieval, 2008.

  8. D. G. Lowe. Distinctive Image Features from Scale-invariant Keypoints. International Journal of Computer Vision, 2004.

  9.  T. Tuytelaars, K. Mikolajczyk. Local Invariant Feature Detectors: A Survey. Foundations and Trends in Computer Graphics and Vision, 2008.

  10.  K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005.

  11.  J. Sivic, J. and A. Zisserman. Efficient Visual Search Cast as Text Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.

  12.  A. Ferencz, E.G. Learned-Miller and J. Malik. Learning to Locate Informative Features for Visual Identification. International Journal of Computer Vision, 2008.

  13.  J. Heinly, E. Dunn, and J-M. Frahm, Comparative Evaluation of Binary Features, in Proc. European Conf. on Computer Vision, 2012.

  14.  J. Yang, K. Yu, Y. Gong, and T. S. Huang, Linear spatial pyramid matching using sparse coding for image classification, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009.

  15. Y. Boureau, J. Ponce, and Y. LeCun, A theoretical analysis of feature pooling in visual recognition, in Proc. International Conf. on Machine Learning, 2010.

  16. J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong et al, Locality-constrained Linear Coding for image classification , in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2010.

  17.  P. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.

  18.  P. Felzenszwalb, R. B. Girshick, and D. McAllester, Cascade Object Detection with Deformable Part Models, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2010.
    [19] H. Song, S. Zickler, T. Altho, R. Girshick, M. Fritz, C. Geyer, P. Felzenszwalb, and T. Darrell, Sparselet Models for Efficient Multiclass Object Detection, in Proc. European Conf. on Computer Vision, 2012.

  19.  D. Parikh and K. Grauman, Relative Attributes, in Proc. IEEE International  Conf. on Computer Vision, 2011.
    [21] A. Kovashka et al, WhittleSearch: Image Search with Relative Attribute Feedback, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2012.

  20. A. Parkash and D. Parikh, Attributes for Classifier Feedback, in Proc. European Conf. on Computer Vision, 2012.

  21. M. Naphade, J. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, J. Curtis. Large-Scale Concept Ontology for Multimedia. IEEE MultiMedia, 2006.

  22. L. von Ahn and L. Dabbish. General Techniques for Designing Games with a Purpose. Communications of the ACM, 2008.

  23. B. Russell, A. Torralba, K. Murphy, W. T. Freeman. LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 2008.

  24. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009.

  25.  I. Laptev. On Space-Time Interest Points. International Journal of Computer Vision, 2010.

  26.  X. Zhang, G. Hua, L. Zhang, and H-Y Shum, "Interest Seam Image", in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2010.

  27.  D. Nistér, H. Stewenius, Scalable Recognition with a Vocabulary Tree. in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2006.

  28. O. Chum, M. Perdoch, and J. Matas, Geometric min-Hashing: Finding a (thick) needle in a haystack. IEEE Conf. on Computer Vision and Pattern Recognition, 2009.

  29.  H. J. Wolfson and I. Rigoutsos, Geometric Hashing: An Overview. IEEE Computational Science and Engineering, 4(4), pp.10-21, 1997.

  30. A. Gionis, P. Indyk,  and R. Motwani. "Similarity Search in High Dimensions via Hashing". in Proc. the 25th Very Large Database (VLDB) Conference, 1999. 

  31.  K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M.I. Jordan. Matching Words and Pictures. Journal of Machine Learning Research, 2003.

  32.  T. Berg, A. Berg, J. Edwards, M. Maire, R. White, Y. Teh, E. Learned-Miller and D. Forsyth. Names and Faces in the News. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2004.

  33.  X. Li, C.G.M. Snoek, and M. Worring. Unsupervised Multi-Feature Tag Relevance Learning for Social Image Retrieval. In Proc. ACM Conf. on Image and Video Retrieval, 2010.

  34.  T.S. Huang, C.K. Dagli, S. Rajaram, E.Y. Chang, M.I. Mandel, G.E. Poliner, D.P.W. Ellis. Active Learning for Interactive Multimedia Retrieval. Proceedings of the IEEE, 2008.

  35.  C.G.M. Snoek, M. Worring, O. de Rooij, K.E.A. van de Sande, R. Yan, and A.G. Hauptmann VideOlympics: Real-Time Evaluation of Multimedia Retrieval Systems. IEEE MultiMedia, 2008.

  36.  N.A. Chinchor, J.J. Thomas, P.C. Wong, M.G. Christel, W. Ribarsky Multimedia Analysis + Visual Analytics = Multimedia Analytics. IEEE Computer Graphics and Applications, 2010.

  37.  Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, Relevance Feedback: A Power Tool in Interactive Content-Based Image Retrieval,  IEEE Trans. on Circuits and Systems for Video Technology , Vol 8, No. 5, pp644-655, September, 1998

Site Meter