Johan Suykens
[introductory/intermediate] Deep Learning, Neural Networks and Kernel Machines
Summary
Neural networks & Deep learning and Support vector machines & Kernel methods are among the most powerful and successful techniques in machine learning and data driven modelling. Universal approximators and flexible models are available with neural networks and deep learning, while support vector machines and kernel methods have solid foundations in learning theory and optimization theory. In this course we will explain several synergies between neural networks, deep learning, least squares support vector machines and kernel methods. A key role at this point is played by primal and dual model representations and different duality principles. In this way the bigger and unifying picture will be obtained and future perspectives will be outlined.
A recent example is restricted kernel machines, which connects least squares support vector machines and kernel principal component analysis to restricted Boltzmann machines. New developments on this will be shown for deep learning, generative models, multi-view and tensor based models, latent space exploration, robustness and explainability. It also enables to either work with explicit or implicit feature maps and choose model representations that are tailored to the given problem characteristics such as high dimensionality or large problem sizes.
Syllabus
The material is organized into 3 parts:
- Part I – Neural networks, Support vector machines and Kernel methods
- Part II – Restricted Boltzmann machines, Generative restricted kernel machines and Deep learning
- Part III – Deep kernel machines and future perspectives
In Part I a basic introduction is given to support vector machines (SVM) and kernel methods with emphasis on their artificial neural networks (ANN) interpretations. The latter can be understood in view of primal and dual model representations, expressed in terms of the feature map and the kernel function, respectively. Feature maps may be chosen either explicitly or implicitly in connection to kernel functions. Related to least squares support vector machines (LS-SVM) such characterizations exist for supervised and unsupervised learning, including classification, regression, kernel principal component analysis (KPCA), kernel spectral clustering (KSC), kernel canonical correlation analysis (KCCA), and other. Primal and dual representations are also relevant in order to obtain efficient training algorithms, tailored to the nature of the given application (high dimensional input spaces versus large data sizes). Application examples are given e.g. in black-box weather forecasting, pollution modelling, prediction of energy consumption, and community detection in networks.
In Part II we explain how to obtain a so-called restricted kernel machine (RKM) representation for least squares support vector machine related models. By using a principle of conjugate feature duality it is possible to obtain a similar representation as in restricted Boltzmann machines (RBM) (with visible and hidden units), which are used in deep belief networks (DBN) and deep Boltzmann machines (DBM). The principle is explained both for supervised and unsupervised learning. Related to kernel principal component analysis a generative model is obtained within the restricted kernel machine framework. Furthermore, we discuss Generative Restricted Kernel Machines (Gen-RKM), a framework for multi-view generation and disentangled feature learning, and compare with Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE). The use of tensor-based models is also very natural within this new RKM framework, and either explicit feature maps (e.g. convolutional feature maps) or implicit feature maps in connection to kernel functions can be used. Latent space exploration with Gen-RKM and aspects of robustness and explainability will be explained.
In Part III deep restricted kernel machines (Deep RKM) are explained which consist of restricted kernel machines taken in a deep architecture. In these models a distinction is made between depth in a layer sense and depth in a level sense. Links and differences between Deep RKM and stacked autoencoders and deep Boltzmann machines are given. The framework enables to conceive both deep feedforward neural networks (DNN) and deep kernel machines, through primal and dual model representations. Feature maps and related kernel functions are taken then for each of the levels. By combining the objectives of the different levels (e.g. several KPCA levels followed by an LS-SVM classifier) in the deep architecture, the training process becomes faster and gives improved solutions. Furthermore, deep kernel machines with the incorporation of orthogonality constraints for deep unsupervised learning is explained. Finally, recent developments, future perspectives and challenges will be outlined.
References
Belkin M., Ma S., Mandal S., To understand deep learning we need to understand kernel learning, Proceedings of Machine Learning Research, 80:541-549, 2018.
Belkin M., Hsu D., Ma S., Mandal S., Reconciling modern machine learning practice and the bias-variance trade-off, PNAS, 2019, 116 (32).
Bengio Y., Learning deep architectures for AI, Boston: Now, 2009.
Bietti A., Mialon G., Chen D., Mairal J., A Kernel Perspective for Regularizing Deep Neural Networks, Proceedings of the 36th International Conference on Machine Learning, PMLR 97:664-674, 2019.
Binkowski M., Sutherland D.J., Arbel M., Gretton A., Demystifying MMD GANs, ICLR 2018.
Eastwood C., Williams, C.K.I. , A framework for the quantitative evaluation of disentangled representations. In International Conference on Learning Representations, 2018.
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y., Generative Adversarial Networks, pp. 2672-2680, NIPS 2014.
Goodfellow I., Bengio Y., Courville A., Deep learning, Cambridge, MA: MIT Press, 2016.
Hinton G.E., What kind of graphical model is the brain?, In Proc. 19th International Joint Conference on Artificial Intelligence, pp. 1765-1775, 2005.
Hinton G.E., Osindero S., Teh Y.-W., A fast learning algorithm for deep belief nets, Neural Computation, 18, 1527-1554, 2006.
Houthuys L., Suykens J.A.K., Tensor Learning in Multi-View Kernel PCA, in Proc. of the 27th International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece, pp. 205-215, Oct. 2018.
LeCun Y., Bengio Y., Hinton G., Deep learning, Nature, 521, 436-444, 2015.
Liu F., Liao Z., Suykens J.A.K., Kernel regression in high dimensions: Refined analysis beyond double descent, International Conference on Artificial Intelligence and Statistics (AISTATS), 649-657, 2021.
Mall R., Langone R., Suykens J.A.K., Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks, PLOS ONE, e99966, 9(6), 1-18, 2014.
Mehrkanoon S., Suykens J.A.K., Deep hybrid neural-kernel networks using random Fourier features, Neurocomputing, Vol. 298, pp. 46-54, July 2018.
Montavon G., Muller K.-R., Cuturi M., Wasserstein Training of Restricted Boltzmann Machines, pp. 3718-3726, NIPS 2016.
Pandey A., Schreurs J., Suykens J.A.K., Generative restricted kernel machines: A framework for multi-view generation and disentangled feature learning, Neural Networks, Vol. 135, pp. 177-191, March 2021.
Pandey A., Schreurs J., Suykens J.A.K., Robust Generative Restricted Kernel Machines using Weighted Conjugate Feature Duality, in Lecture Notes in Computer Science, Proc. of the Machine Learning, Optimization, and Data Science, LOD 2020, Siena, Italy., vol. 12565 of LNCS, Springer, Cham., 2020, pp. 613-624.
Pandey A., Fanuel M., Schreurs J., Suykens J.A.K., Disentangled Representation Learning and Generation with Manifold Optimization, arXiv preprint arXiv:2006.07046
Pandey A., Schreurs J., Suykens J.A.K., Robust Generative Restricted Kernel Machines using Weighted Conjugate Feature Duality, International Conference on Machine Learning, Optimization, and Data Science (LOD), 2020.
Salakhutdinov R., Hinton G.E., Deep Boltzmann machines, Proceedings of Machine Learning Research, 5:448-455, 2009.
Salakhutdinov R., Learning deep generative models, Annu. Rev. Stat. Appl., 2, 361-385, 2015.
Scholkopf B., Smola A., Muller K.-R., Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10:1299-1319, 1998.
Scholkopf B., Smola A., Learning with kernels, Cambridge, MA: MIT Press, 2002.
Schreurs J., Suykens J.A.K., Generative Kernel PCA, ESANN 2018.
Suykens J.A.K., Vandewalle J., Training multilayer perceptron classifiers based on a modified support vector method, IEEE Transactions on Neural Networks, vol. 10, no. 4, pp. 907-911, Jul. 1999.
Suykens J.A.K., Vandewalle J., Least squares support vector machine classifiers, Neural Processing Letters, vol. 9, no. 3, pp. 293-300, Jun. 1999.
Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B., Vandewalle J., Least squares support vector machines, Singapore: World Scientific, 2002.
Suykens J.A.K., Alzate C., Pelckmans K., Primal and dual model representations in kernel-based learning, Statistics Surveys, vol. 4, pp. 148-183, Aug. 2010.
Suykens J.A.K., Deep Restricted Kernel Machines using Conjugate Feature Duality, Neural Computation, vol. 29, no. 8, pp. 2123-2163, Aug. 2017.
Tonin F., Pandey A., Patrinos P., Suykens J.A.K., Unsupervised Energy-based Out-of-distribution Detection using Stiefel-Restricted Kernel Machine, arXiv preprint arXiv:2102.08443, to appear IJCNN 2021.
Tonin F., Patrinos P., Suykens J.A.K., Unsupervised learning of disentangled representations in deep restricted kernel machines with orthogonality constraints, Neural Networks, Vol 142, pp. 661-679, Oct 2021.
Vapnik V., Statistical learning theory, New York: Wiley, 1998.
Winant D., Schreurs J., Suykens J.A.K., Latent Space Exploration Using Generative Kernel PCA, Communications in Computer and Information Science, vol 1196. Springer, Cham. (BNAIC 2019, BENELEARN 2019), Brussels, Belgium, Nov. 2019, pp. 70-82.
Zhang C., Bengio S., Hardt M., Recht B., Vinyals O., Understanding deep learning requires rethinking generalization, ICLR 2017.
Pre-requisites
Basics of linear algebra.
Short bio
https://www.esat.kuleuven.be/stadius/person.php?id=16
Johan A.K. Suykens was born in Willebroek, Belgium, May 18, 1966. He received the master degree in Electro-Mechanical Engineering and the PhD degree in Applied Sciences from the Katholieke Universiteit Leuven, in 1989 and 1995, respectively. In 1996 he has been a Visiting Postdoctoral Researcher at the University of California, Berkeley. He has been a Postdoctoral Researcher with the Fund for Scientific Research FWO Flanders and is currently a full Professor with KU Leuven. He is author of the books “Artificial Neural Networks for Modelling and Control of Non-linear Systems” (Kluwer Academic Publishers) and “Least Squares Support Vector Machines” (World Scientific), co-author of the book “Cellular Neural Networks, Multi-Scroll Chaos and Synchronization” (World Scientific) and editor of the books “Nonlinear Modeling: Advanced Black-Box Techniques” (Kluwer Academic Publishers), “Advances in Learning Theory: Methods, Models and Applications” (IOS Press) and “Regularization, Optimization, Kernels, and Support Vector Machines” (Chapman & Hall/CRC). In 1998 he organized an International Workshop on Nonlinear Modelling with Time-series Prediction Competition. He has served as associate editor for the IEEE Transactions on Circuits and Systems (1997-1999 and 2004-2007), the IEEE Transactions on Neural Networks (1998-2009), the IEEE Transactions on Neural Networks and Learning Systems (from 2017) and the IEEE Transactions on Artificial Intelligence (from April 2020). He received an IEEE Signal Processing Society 1999 Best Paper Award, a 2019 Entropy Best Paper Award and several Best Paper Awards at International Conferences. He is a recipient of the International Neural Networks Society INNS 2000 Young Investigator Award for significant contributions in the field of neural networks. He has served as a Director and Organizer of the NATO Advanced Study Institute on Learning Theory and Practice (Leuven 2002), as a program co-chair for the International Joint Conference on Neural Networks 2004 and the International Symposium on Nonlinear Theory and its Applications 2005, as an organizer of the International Symposium on Synchronization in Complex Networks 2007, a co-organizer of the NIPS 2010 workshop on Tensors, Kernels and Machine Learning, and chair of ROKS 2013. He has been awarded an ERC Advanced Grant 2011 and 2017, has been elevated IEEE Fellow 2015 for developing least squares support vector machines, and is ELLIS Fellow. He is currently serving as program director of Master AI at KU Leuven.