Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

My bibliography Save this article

Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

Author

Listed:

Artem Sher
(Phystech School of Applied Mathematics and Informatics, Moscow Institute of Physics and Technology, 141701 Moscow, Russia
Smart Engines Service LLC, 117312 Moscow, Russia)
Anton Trusov
(Phystech School of Applied Mathematics and Informatics, Moscow Institute of Physics and Technology, 141701 Moscow, Russia
Smart Engines Service LLC, 117312 Moscow, Russia
Department of Mathematical Software for Computer Science, Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, 119333 Moscow, Russia)
Elena Limonova
(Smart Engines Service LLC, 117312 Moscow, Russia
Department of Mathematical Software for Computer Science, Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, 119333 Moscow, Russia)
Dmitry Nikolaev
(Smart Engines Service LLC, 117312 Moscow, Russia
Vision Systems Laboratory, Institute for Information Transmission Problems (Kharkevich Institute) of Russian Academy of Sciences, 127051 Moscow, Russia)
Vladimir V. Arlazarov
(Smart Engines Service LLC, 117312 Moscow, Russia
Department of Mathematical Software for Computer Science, Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, 119333 Moscow, Russia)

Registered:

Abstract

Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison to their classical analogs. To solve this issue, a number of quantization-aware training (QAT) approaches were proposed. In this paper, we study QAT approaches for two- to eight-bit linear quantization schemes and propose a new combined QAT approach: neuron-by-neuron quantization with straight-through estimator (STE) gradient forwarding. It is suitable for quantizations with two- to eight-bit widths and eliminates significant accuracy drops during training, which results in better accuracy of the final QNN. We experimentally evaluate our approach on CIFAR-10 and ImageNet classification and show that it is comparable to other approaches for four to eight bits and outperforms some of them for two to three bits while being easier to implement. For example, the proposed approach to three-bit quantization of the CIFAR-10 dataset results in 73.2% accuracy, while baseline direct and layer-by-layer result in 71.4% and 67.2% accuracy, respectively. The results for two-bit quantization for ResNet18 on the ImageNet dataset are 63.69% for our approach and 61.55% for the direct baseline.

Suggested Citation

Artem Sher & Anton Trusov & Elena Limonova & Dmitry Nikolaev & Vladimir V. Arlazarov, 2023. "Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training," Mathematics, MDPI, vol. 11(9), pages 1-17, April.

Handle: RePEc:gam:jmathe:v:11:y:2023:i:9:p:2112-:d:1136534

Download full text from publisher

References listed on IDEAS

Xue-Bo Jin & Nian-Xiang Yang & Xiao-Yi Wang & Yu-Ting Bai & Ting-Li Su & Jian-Lei Kong, 2020. "Deep Hybrid Model Based on EMD with Classification by Frequency Characteristics for Long-Term Air Quality Prediction," Mathematics, MDPI, vol. 8(2), pages 1-17, February.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Ying Shu & Chengfu Ding & Lingbing Tao & Chentao Hu & Zhixin Tie, 2023. "Air Pollution Prediction Based on Discrete Wavelets and Deep Learning," Sustainability, MDPI, vol. 15(9), pages 1-19, April.
Xue-Bo Jin & Wen-Tao Gong & Jian-Lei Kong & Yu-Ting Bai & Ting-Li Su, 2022. "PFVAE: A Planar Flow-Based Variational Auto-Encoder Prediction Model for Time Series Data," Mathematics, MDPI, vol. 10(4), pages 1-17, February.
Dinggao Liu & Zhenpeng Tang & Yi Cai, 2022. "A Hybrid Model for China’s Soybean Spot Price Prediction by Integrating CEEMDAN with Fuzzy Entropy Clustering and CNN-GRU-Attention," Sustainability, MDPI, vol. 14(23), pages 1-22, November.
Junbeom Park & Seongju Chang, 2021. "A Particulate Matter Concentration Prediction Model Based on Long Short-Term Memory and an Artificial Neural Network," IJERPH, MDPI, vol. 18(13), pages 1-15, June.
Mei-Hsin Chen & Yao-Chung Chen & Tien-Yin Chou & Fang-Shii Ning, 2023. "PM2.5 Concentration Prediction Model: A CNN–RF Ensemble Framework," IJERPH, MDPI, vol. 20(5), pages 1-13, February.
Gao, Mingyun & Yang, Honglin & Xiao, Qinzi & Goh, Mark, 2022. "COVID-19 lockdowns and air quality: Evidence from grey spatiotemporal forecasts," Socio-Economic Planning Sciences, Elsevier, vol. 83(C).
Tao Zhen & Lei Yan & Jian-lei Kong, 2020. "An Acceleration Based Fusion of Multiple Spatiotemporal Networks for Gait Phase Detection," IJERPH, MDPI, vol. 17(16), pages 1-17, August.
Wongchai, Anupong & Jenjeti, Durga rao & Priyadarsini, A. Indira & Deb, Nabamita & Bhardwaj, Arpit & Tomar, Pradeep, 2022. "Farm monitoring and disease prediction by classification based on deep learning architectures in sustainable agriculture," Ecological Modelling, Elsevier, vol. 474(C).

More about this item

Keywords

quantized neural network; low-bit quantization; layer-by-layer; neuron-by-neuron training;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:9:p:2112-:d:1136534. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data