RGB-D Video and Deep Learning Research
|讲座名称||RGB-D Video and Deep Learning Research|
|讲座人介绍||Hsueh-Ming Hang received the B.S. and M.S. degrees from National Chiao Tung University, Hsinchu, Taiwan, in 1978 and 1980, respectively, and Ph.D. in Electrical Engineering from Rensselaer Polytechnic Institute, Troy, NY, in 1984. From 1984 to 1991, he was with AT&T Bell Laboratories, Holmdel, NJ, and then he joined the Electronics Engineering Department of National Chiao Tung University (NCTU), Hsinchu, Taiwan, in December 1991. From 2006 to 2009, he was appointed as Dean of the EECS College at National Taipei University of Technology (NTUT). From 2014 to 2017, he served as the Dean of the ECE College at NCTU. He has been actively involved in the international MPEG standards since 1984 and his current research interests include multimedia compression, multiview image/video processing, and deep-learning based image/video processing.
Dr. Hang holds 13 patents (Taiwan, US and Japan) and has published over 190 technical papers related to image compression, signal processing, and video codec architecture. He was an associate editor (AE) of the IEEE Transactions on Image Processing (1992-1994, 2008-2012) and the IEEE Transactions on Circuits and Systems for Video Technology (1997-1999). He is a co-editor and contributor of the Handbook of Visual Communications published by Academic Press in 1995. He was an IEEE Circuits and Systems Society Distinguished Lecturer (2014-2015) and is a Board Member of the Asia-Pacific Signal and Information Processing Association (APSIPA) (2013-2018). He is a recipient of the IEEE Third Millennium Medal and is a Fellow of IEEE and IET and a member of Sigma Xi.
|讲座内容||One of our research work on 3D data with depth. The focus of next-generation 3D research is on the so-called virtual-viewpoint (or free-viewpoint) video system. It is also an on-going standardization item in the international ITU/MPEG Standards. Typically, a densely arranged camera array is used to acquire input images and a number of virtual view pictures are synthesized at the receiver using the depth-image based rendering (DIBR) technique. Three essential components are needed for building a virtual-view system: depth estimation, data compression, and view synthesis.
Another research work is applying deep learning techniques to image analysis. The focus here is to combine the information from different sources. They may have different physical characteristics or meaning. The term “information fusion” is used in a broad sense here. Another core technique in our systems is machine learning, particularly, Convolutional Neural Network (CNN). A typical CNN system is end-to-end, but in order to combine information from different sources, we propose multiple-stage system structures. We will show two examples in this talk: Human detection based RGB-D data and multiple query image retrieval.