Название: A Guide to Convolutional Neural Networks for Computer Vision
Автор: Salman Khan
Издательство: Ingram
Жанр: Программы
Серия: Synthesis Lectures on Computer Vision
isbn: 9781681732824
isbn:
7.3.2 Deep Deconvolution Network (DDN)
7.3.3 DeepLab
7.4 Scene Understanding
7.4.1 DeepContext
7.4.2 Learning Rich Features from RGB-D Images
7.4.3 PointNet for Scene Understanding
7.5 Image Generation
7.5.1 Generative Adversarial Networks (GANs)
7.5.2 Deep Convolutional Generative Adversarial Networks (DCGANs)
7.5.3 Super Resolution Generative Adversarial Network (SRGAN)
7.6 Video-based Action Recognition
7.6.1 Action Recognition From Still Video Frames
7.6.2 Two-stream CNNs
7.6.3 Long-term Recurrent Convolutional Network (LRCN)
8 Deep Learning Tools and Libraries
8.1 Caffe
8.2 TensorFlow
8.3 MatConvNet
8.4 Torch7
8.5 Theano
8.6 Keras
8.7 Lasagne
8.8 Marvin
8.9 Chainer
8.10 PyTorch
Preface
The primary goal of this book is to provide a comprehensive treatment to the subject of convolutional neural networks (CNNs) from the perspective of computer vision. In this regard, this book covers basic, intermediate and well as advanced topics relating to both the theoretical and practical aspects.
This book is organized into nine chapters. The first chapter introduces the computer vision and machine learning disciplines and presents their highly relevant application domains. This sets up the platform for the main subject of this book, “Deep Learning”, which is first defined towards the later part of first chapter. The second chapter serves as a background material, which presents popular hand-crafted features and classifiers which have remained popular in computer vision during the last two decades. These include feature descriptors such as Scale-Invariant Feature Transform (SIFT), Histogram of Oriented Gradients (HOG), Speeded-Up Robust Features (SURF), and classifiers such as Support Vector Machines (SVM), and Random Decision Forests (RDF).
Chapter 3 describes neural networks and covers preliminary concepts related to their architecture, basic building blocks, and learning algorithms. Chapter 4 builds on this and serves as a thorough introduction to CNN architecture. It covers its layers, including the basic ones (e.g., sub-sampling, convolution) as well as more advanced ones (e.g., pyramid pooling, spatial transform). Chapter 5 comprehensively presents techniques to learn and regularize CNN parameters. It also provides tools to visualize and understand the learned parameters.
Chapter 6 and onward are more focused on the practical aspects of CNNs. Specifically, Chapter 6 presents state-of-the-art CNN architectures that have demonstrated excellent performances on a number of vision tasks. It also provides a comparative analysis and discusses their relative pros and cons. Chapter 7 goes in further depth regarding applications of CNNs to core vision problems. For each task, it discusses a set of representative works using CNNs and reports their key ingredients for success. Chapter 8 covers popular software libraries for deep learning such as Theano, Tensorflow, Caffe, and Torch. Finally, in Chapter 9, open problems and challenges for deep learning are presented along with a succinct summary of the book.
The purpose of the book is not to provide a literature survey for the applications of CNNs in computer vision. Rather, it succinctly covers key concepts and provides a bird’s eye view of recent state-of-the-art models designed for practical problems in computer vision.
Salman Khan, Hossein Rahmani, Syed Afaq Ali Shah, and Mohammed Bennamoun January 2018
Acknowledgments
We would like to thank Gerard Medioni and Sven Dickinson, the editors of this Synthesis Lectures on Computer Vision series, for giving us an opportunity to contribute to this series. We greatly appreciate the help and support of Diane Cerra, Executive Editor at Morgan & Claypool, who managed the complete book preparation process. We are indebted to our colleagues, students, collaborators, and co-authors we worked with during our careers, who contributed to the development of our interest in this subject. We are also deeply thankful to the wider research community, whose work has led to major advancements in computer vision and machines learning, a part of which is covered in this book. More importantly, we want to express our gratitude toward the people who allowed us to use their figures or tables in some portions of this book. This book has greatly benefited from the constructive comments and appreciation by the reviewers, which helped us improve the presented content. Finally, this effort would not have been possible without the help and support from our families.
We would like to acknowledge support from Australian Research Council (ARC), whose funding and support was crucial to some of the contents of this book.
Salman Khan, Hossein Rahmani, Syed Afaq Ali Shah, and Mohammed Bennamoun January 2018
CHAPTER 1
Introduction
Computer Vision and Machine Learning have played together decisive roles in the development of a variety of image-based applications within the last decade (e.g., various services provided by Google, Facebook, Microsoft, Snapchat). During this time, the vision-based technology has transformed from just a sensing modality to intelligent computing systems which can understand the СКАЧАТЬ