arrow
Volume 28, Issue 5
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks

Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao & Zheng Ma

Commun. Comput. Phys., 28 (2020), pp. 1746-1767.

Published online: 2020-11

Export citation
  • Abstract

We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) — DNNs often fit target functions from low to high frequencies — on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of Jacobi method, a conventional iterative numerical scheme, which exhibits faster convergence for higher frequencies for various scientific computing problems. With theories under an idealized setting, we illustrate that this F-Principle results from the smoothness/regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or a randomized dataset.

  • Keywords

Deep learning, training behavior, generalization, Jacobi iteration, Fourier analysis.

  • AMS Subject Headings

68Q32, 65N06, 68T01

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{CiCP-28-1746, author = {Zhi-Qin and John Xu and and 9573 and and Zhi-Qin John Xu and Yaoyu and Zhang and and 9574 and and Yaoyu Zhang and Tao and Luo and and 9575 and and Tao Luo and Yanyang and Xiao and and 9576 and and Yanyang Xiao and Zheng and Ma and and 9577 and and Zheng Ma}, title = {Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks}, journal = {Communications in Computational Physics}, year = {2020}, volume = {28}, number = {5}, pages = {1746--1767}, abstract = {

We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) — DNNs often fit target functions from low to high frequencies — on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of Jacobi method, a conventional iterative numerical scheme, which exhibits faster convergence for higher frequencies for various scientific computing problems. With theories under an idealized setting, we illustrate that this F-Principle results from the smoothness/regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or a randomized dataset.

}, issn = {1991-7120}, doi = {https://doi.org/10.4208/cicp.OA-2020-0085}, url = {http://global-sci.org/intro/article_detail/cicp/18395.html} }
TY - JOUR T1 - Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks AU - John Xu , Zhi-Qin AU - Zhang , Yaoyu AU - Luo , Tao AU - Xiao , Yanyang AU - Ma , Zheng JO - Communications in Computational Physics VL - 5 SP - 1746 EP - 1767 PY - 2020 DA - 2020/11 SN - 28 DO - http://doi.org/10.4208/cicp.OA-2020-0085 UR - https://global-sci.org/intro/article_detail/cicp/18395.html KW - Deep learning, training behavior, generalization, Jacobi iteration, Fourier analysis. AB -

We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) — DNNs often fit target functions from low to high frequencies — on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of Jacobi method, a conventional iterative numerical scheme, which exhibits faster convergence for higher frequencies for various scientific computing problems. With theories under an idealized setting, we illustrate that this F-Principle results from the smoothness/regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or a randomized dataset.

Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao & Zheng Ma. (2020). Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks. Communications in Computational Physics. 28 (5). 1746-1767. doi:10.4208/cicp.OA-2020-0085
Copy to clipboard
The citation has been copied to your clipboard