IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v666y2025ics0378437125001815.html
   My bibliography  Save this article

Unified CNNs and transformers underlying learning mechanism reveals multi-head attention modus vivendi

Author

Listed:
  • Koresh, Ella
  • Gross, Ronit D.
  • Meir, Yuval
  • Tzach, Yarden
  • Halevi, Tal
  • Kanter, Ido

Abstract

Convolutional neural networks (CNNs) evaluate short-range correlations in input images which progress along the layers, whereas vision transformer (ViT) architectures evaluate long-range correlations, using repeated transformer encoders composed of fully connected layers. Both are designed to solve complex classification tasks but from different perspectives. This study demonstrates that CNNs and ViT architectures stem from a unified underlying learning mechanism, which quantitatively measures the single-nodal performance (SNP) of each node in feedforward (FF) and multi-head attention (MHA) sub-blocks. Each node identifies small clusters of possible output labels, with additional noise represented as labels outside these clusters. These features are progressively sharpened along the transformer encoders, enhancing the signal-to-noise ratio. This unified underlying learning mechanism leads to two main findings. First, it enables an efficient applied nodal diagonal connection (ANDC) pruning technique without affecting the accuracy. Second, based on the SNP, spontaneous symmetry breaking occurs among the MHA heads, such that each head focuses its attention on a subset of labels through cooperation among its SNPs. Consequently, each head becomes an expert in recognizing its designated labels, representing a quantitative MHA modus vivendi mechanism. This statistical mechanics inspired viewpoint enables to reveal macroscopic behavior of the entire network from the microscopic performance of each node. These results are based on a compact convolutional transformer architecture trained on the CIFAR-100 and Flowers-102 datasets and call for their extension to other architectures and applications, such as natural language processing.

Suggested Citation

  • Koresh, Ella & Gross, Ronit D. & Meir, Yuval & Tzach, Yarden & Halevi, Tal & Kanter, Ido, 2025. "Unified CNNs and transformers underlying learning mechanism reveals multi-head attention modus vivendi," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 666(C).
  • Handle: RePEc:eee:phsmap:v:666:y:2025:i:c:s0378437125001815
    DOI: 10.1016/j.physa.2025.130529
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437125001815
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2025.130529?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kung-Jeng Wang & Diwanda Ageng Rizqi & Hong-Phuc Nguyen, 2021. "Skill transfer support model based on deep learning," Journal of Intelligent Manufacturing, Springer, vol. 32(4), pages 1129-1146, April.
    2. Asim Patra & Mohammed K. A. Kaabar & Sergejs Solovjovs, 2021. "Catalan Transform of k-Balancing Sequences," International Journal of Mathematics and Mathematical Sciences, Hindawi, vol. 2021, pages 1-6, December.
    3. Marcelle Chauvet & Rafael R. S. Guimaraes, 2021. "Transfer Learning for Business Cycle Identification," Working Papers Series 545, Central Bank of Brazil, Research Department.
    4. Meir, Yuval & Tevet, Ofek & Tzach, Yarden & Hodassman, Shiri & Kanter, Ido, 2024. "Role of delay in brain dynamics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 654(C).
    5. Tevet, Ofek & Gross, Ronit D. & Hodassman, Shiri & Rogachevsky, Tal & Tzach, Yarden & Meir, Yuval & Kanter, Ido, 2024. "Efficient shallow learning mechanism as an alternative to deep learning," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 635(C).
    6. repec:osf:osfxxx:4btj6_v1 is not listed on IDEAS
    7. Kun Wang & Christopher W. Johnson & Kane C. Bennett & Paul A. Johnson, 2021. "Predicting fault slip via transfer learning," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    8. Kok Choon Tay & Calvin M. L. Chan, 2021. "Digital Transformation of Banks: The Case of DBS," World Scientific Book Chapters, in: David Kuo Chuen Lee & Ding Ding & Chong Guan (ed.), Financial Management in the Digital Economy, chapter 8, pages 141-161, World Scientific Publishing Co. Pte. Ltd..
    9. , Darmadi & Sari, Ratna, 2021. "Gaya Kepemimpinan Transformasional Dan Motivasi Kerja," Thesis Commons 9mcyn, Center for Open Science.
    10. Chi Wing Chu & Tony Sit & Gongjun Xu, 2021. "Transformed Dynamic Quantile Regression on Censored Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(534), pages 874-886, April.
    11. , Yangriani, 2021. "Yangriani - Managing Digital Transformation - GSLC 1," OSF Preprints 4btj6, Center for Open Science.
    12. Li, Tianya & Wang, Kejian & Wang, Jihao & Liu, Yueqi & Han, Yufen & Xu, Zhiyang & Lin, Guangyi & Liu, Yong, 2021. "Optimization of GDL to improve water transferability," Renewable Energy, Elsevier, vol. 179(C), pages 2086-2093.
    13. , Darmadi & Sari, Ratna, 2021. "Gaya Kepemimpinan Transformasional dan Motivasi Kerja," Thesis Commons 8zeh9, Center for Open Science.
    14. Jonah Busch & Irene Ring & Monique Akullo & Oyut Amarjargal & Maud Borie & Rodrigo S. Cassola & Annabelle Cruz-Trinidad & Nils Droste & Joko Tri Haryanto & Ulan Kasymov & Nataliia Viktorivna Kotenko &, 2021. "A global review of ecological fiscal transfers," Nature Sustainability, Nature, vol. 4(9), pages 756-765, September.
    15. Zebin Hu & Hao Liu & Zhendong Li & Zekuan Yu & Long Wang, 2021. "Cross-Model Transformer Method for Medical Image Synthesis," Complexity, Hindawi, vol. 2021, pages 1-7, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Dan & Li, Yijun & Wang, Chaoqun & Chen, Min & Wu, Qi, 2023. "Forecasting carbon prices based on real-time decomposition and causal temporal convolutional networks," Applied Energy, Elsevier, vol. 331(C).
    2. Gou, Liangjie & Yang, Zhaozhong & Min, Chao & Yi, Duo & Li, Xiaogang & Kong, Bing, 2024. "A novel domain adaptation method with physical constraints for shale gas production forecasting," Applied Energy, Elsevier, vol. 371(C).
    3. Hocke, Simone & Klee, Andreas, 2023. "Transformation in der Arbeitswelt gestalten: Welchen Beitrag leistet eine akademische Weiterbildung von Betriebs- und Personalräten?," Working Paper Forschungsförderung 309, Hans-Böckler-Stiftung, Düsseldorf.
    4. Huang, Wenyang & Gao, Tianxiao & Hao, Yun & Wang, Xiuqing, 2023. "Transformer-based forecasting for intraday trading in the Shanghai crude oil market: Analyzing open-high-low-close prices," Energy Economics, Elsevier, vol. 127(PA).
    5. Roger Fouquet & Ralph Hippe, 2022. "Twin Transitions of Decarbonisation and Digitalisation: A Historical Perspective on Energy and Information in European Economies," Working Papers 08-22, Association Française de Cliométrie (AFC).
    6. Schrape, Jan-Felix, 2025. "Artificial intelligence and social action: A techno-sociological contextualization," Research Contributions to Organizational Sociology and Innovation Studies, SOI Discussion Papers 2025-03, University of Stuttgart, Institute for Social Sciences, Department of Organizational Sociology and Innovation Studies.
    7. Pfeiffer, Sabine & Nicklich, Manuel & Henke, Michael & Heßler, Martina & Krzywdzinski, Martin & Schu (ed.), 2024. "Digitalisierung der Arbeitswelten: Zur Erfassbarkeit einer systemischen Transformation," EconStor Books, ZBW - Leibniz Information Centre for Economics, number 312546.
    8. Choung, Youngjoo & Chatterjee, Swarn & Pak, Tae-Young, 2023. "Digital Financial Literacy and Financial Well-Being," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, issue Journal P, pages 1-1.
    9. Blind, Knut & Niebel, Crispin, 2022. "5G roll-out failures addressed by innovation policies in the EU," Technological Forecasting and Social Change, Elsevier, vol. 180(C).
    10. Fehlings, Susanne & Karrar, Hasan H. & Rudaz, Philippe, 2025. "Small businesses and new adaptation capacities in Georgia and Kazakhstan," World Development, Elsevier, vol. 191(C).
    11. Chang, Chia-Hsun & Lin, Chi-Chang & Yang, Zaili & Kontovas, Christos, 2024. "Evaluating the social acceptance of autonomous ferries: An observation from passengers’ boarding willingness," Transport Policy, Elsevier, vol. 159(C), pages 83-94.
    12. Gross, Ronit & Koresh, Ella & Halevi, Tal & Hodassman, Shiri & Meir, Yuval & Tzach, Yarden & Kanter, Ido, 2025. "Multilabel classification outperforms detection-based technique," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 658(C).
    13. Qi, Yu & Yin, Aoxue & Chen, Jianwei & Yang, Chunfei & Zhan, Pengyu, 2024. "Motivating for environmental protection: Evidence from county officials in China," World Development, Elsevier, vol. 184(C).
    14. Liu, Huize & Hu, Zunyan & Li, Jianqiu & Xu, Liangfei & Shao, Yangbin & Ouyang, Minggao, 2023. "Investigation on the optimal GDL thickness design for PEMFCs considering channel/rib geometry matching and operating conditions," Energy, Elsevier, vol. 282(C).
    15. Ren, Shenggang & Zhou, Qiong & Zhang, Xinxin & Zeng, Huixiang, 2024. "How do heavily polluting firms cope with dual environmental regulation? A study from the perspective of financial asset allocation," Energy Economics, Elsevier, vol. 139(C).
    16. Meir, Yuval & Tevet, Ofek & Tzach, Yarden & Hodassman, Shiri & Kanter, Ido, 2024. "Role of delay in brain dynamics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 654(C).
    17. Qiao, Zhenhua & Xu, Xinyi & Zou, Weitao & Huang, Yingli, 2024. "Urban sustainable development goals and ecosystem services: Pathways to achieving coordination," Land Use Policy, Elsevier, vol. 146(C).
    18. Seidl, Andrew & Cumming, Tracey & Arlaud, Marco & Crossett, Cole & van den Heuvel, Onno, 2024. "Investing in the wealth of nature through biodiversity and ecosystem service finance solutions," Ecosystem Services, Elsevier, vol. 66(C).
    19. Rezazadeh, Arash & Kohns, Marco & Bohnsack, René & António, Nuno & Rita, Paulo, 2025. "Generative AI for growth hacking: How startups use generative AI in their growth strategies," Journal of Business Research, Elsevier, vol. 192(C).
    20. Daniela Kletzan-Slamanig & Angela Köppl & Hans Pitlik & Margit Schratzenstaller, 2023. "Der Finanzausgleich als Hebel zur Umsetzung der österreichischen Klimaziele. Handlungsfelder und konzeptionelle Grundlagen," WIFO Studies, WIFO, number 70785, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:666:y:2025:i:c:s0378437125001815. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.