Towards artificial general intelligence via a multimodal foundation model

My bibliography Save this article

Towards artificial general intelligence via a multimodal foundation model

Author

Listed:

Nanyi Fei
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods
Renmin University of China)
Zhiwu Lu
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods)
Yizhao Gao
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods)
Guoxing Yang
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods)
Yuqi Huo
(Beijing Key Laboratory of Big Data Management and Analysis Methods
Renmin University of China)
Jingyuan Wen
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods)
Haoyu Lu
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods)
Ruihua Song
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods)
Xin Gao
(King Abdullah University of Science and Technology)
Tao Xiang
(University of Surrey)
Hao Sun
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods)
Ji-Rong Wen
(Renmin University of China
Beijing Key Laboratory of Big Data Management and Analysis Methods
Renmin University of China)

Registered:

Abstract

The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks. To achieve this goal, we propose to pre-train our foundation model by self-supervised learning with weak semantic correlation data crawled from the Internet and show that promising results can be obtained on a wide range of downstream tasks. Particularly, with the developed model-interpretability tools, we demonstrate that strong imagination ability is now possessed by our foundation model. We believe that our work makes a transformative stride towards AGI, from our common practice of “weak or narrow AI” to that of “strong or generalized AI”.

Suggested Citation

Nanyi Fei & Zhiwu Lu & Yizhao Gao & Guoxing Yang & Yuqi Huo & Jingyuan Wen & Haoyu Lu & Ruihua Song & Xin Gao & Tao Xiang & Hao Sun & Ji-Rong Wen, 2022. "Towards artificial general intelligence via a multimodal foundation model," Nature Communications, Nature, vol. 13(1), pages 1-13, December.

Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-30761-2
DOI: 10.1038/s41467-022-30761-2

Download full text from publisher

References listed on IDEAS

R. Quian Quiroga & L. Reddy & G. Kreiman & C. Koch & I. Fried, 2005. "Invariant visual representation by single neurons in the human brain," Nature, Nature, vol. 435(7045), pages 1102-1107, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Hermann, Erik & Puntoni, Stefano, 2024. "Artificial intelligence and consumer behavior: From predictive to generative AI," Journal of Business Research, Elsevier, vol. 180(C).
Xiaoqian Zhu & Jianping Li & Yinghui Wang, 2024. "Are risk disclosures in financial reports informative? A text mining-based perspective," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 11(1), pages 1-18, December.
Gu, Hanchi & Schreyer, Marco & Moffitt, Kevin & Vasarhelyi, Miklos, 2024. "Artificial intelligence co-piloted auditing," International Journal of Accounting Information Systems, Elsevier, vol. 54(C).
Zhou, Zhen & Gu, Ziyuan & Qu, Xiaobo & Liu, Pan & Liu, Zhiyuan & Yu, Wenwu, 2024. "Urban mobility foundation model: A literature review and hierarchical perspective," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 192(C).
Teng, Dequn & Ye, Chen & Martinez, Veronica, 2025. "Gen-AI’s effects on new value propositions in business model innovation: Evidence from information technology industry," Technovation, Elsevier, vol. 143(C).

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Umut Güçlü & Marcel A J van Gerven, 2014. "Unsupervised Feature Learning Improves Prediction of Human Brain Activity in Response to Natural Images," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-12, August.
Rodrigo Quian Quiroga & Marta Boscaglia & Jacques Jonas & Hernan G. Rey & Xiaoqian Yan & Louis Maillard & Sophie Colnat-Coulbois & Laurent Koessler & Bruno Rossion, 2023. "Single neuron responses underlying face recognition in the human midfusiform face-selective cortex," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
Runnan Cao & Peter Brunner & Puneeth N. Chakravarthula & Krista L. Wahlstrom & Cory Inman & Elliot H. Smith & Xin Li & Adam N. Mamelak & Nicholas J. Brandmeir & Ueli Rutishauser & Jon T. Willie & Shuo, 2025. "A neuronal code for object representation and memory in the human amygdala and hippocampus," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
Leonhard Waschke & Fabian Kamp & Evi Elzen & Suresh Krishna & Ulman Lindenberger & Ueli Rutishauser & Douglas D. Garrett, 2025. "Single-neuron spiking variability in hippocampus dynamically tracks sensory content during memory formation in humans," Nature Communications, Nature, vol. 16(1), pages 1-9, December.
repec:plo:pone00:0172073 is not listed on IDEAS
Martinez-Saito, Mario, 2022. "Discrete scaling and criticality in a chain of adaptive excitable integrators," Chaos, Solitons & Fractals, Elsevier, vol. 163(C).
Luca D. Kolibius & Frederic Roux & George Parish & Marije Wal & Mircea Plas & Ramesh Chelvarajah & Vijay Sawlani & David T. Rollings & Johannes D. Lang & Stephanie Gollwitzer & Katrin Walther & Rüdige, 2023. "Hippocampal neurons code individual episodic memories in humans," Nature Human Behaviour, Nature, vol. 7(11), pages 1968-1979, November.
Jakub Kopal & Kuldeep Kumar & Kimia Shafighi & Karin Saltoun & Claudia Modenato & Clara A. Moreau & Guillaume Huguet & Martineau Jean-Louis & Charles-Olivier Martin & Zohra Saci & Nadine Younis & Elis, 2024. "Using rare genetic mutations to revisit structural brain asymmetry," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
Rybalova, E. & Averyanov, V. & Lozi, R. & Strelkova, G., 2024. "Peculiarities of the spatio-temporal dynamics of a Hénon–Lozi map network in the presence of Lévy noise," Chaos, Solitons & Fractals, Elsevier, vol. 184(C).
Louis Kang & Taro Toyoizumi, 2024. "Distinguishing examples while building concepts in hippocampal and artificial networks," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
Maëlle Guyoton & Giulio Matteucci & Charlie G. Foucher & Matthew P. Getz & Julijana Gjorgjieva & Sami El-Boustani, 2025. "Cortical circuits for cross-modal generalization," Nature Communications, Nature, vol. 16(1), pages 1-23, December.
Ahalya Prabhakar & Todd Murphey, 2022. "Mechanical intelligence for learning embodied sensor-object relationships," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
Thomas P. Reber & Sina Mackay & Marcel Bausch & Marcel S. Kehl & Valeri Borger & Rainer Surges & Florian Mormann, 2023. "Single-neuron mechanisms of neural adaptation in the human temporal lobe," Nature Communications, Nature, vol. 14(1), pages 1-9, December.
Henning Sprekeler & Christian Michaelis & Laurenz Wiskott, 2007. "Slowness: An Objective for Spike-Timing–Dependent Plasticity?," PLOS Computational Biology, Public Library of Science, vol. 3(6), pages 1-13, June.
Dock H. Duncan & Dirk Moorselaar & Jan Theeuwes, 2023. "Pinging the brain to reveal the hidden attentional priority map using encephalography," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
David Balduzzi & Giulio Tononi, 2009. "Qualia: The Geometry of Integrated Information," PLOS Computational Biology, Public Library of Science, vol. 5(8), pages 1-24, August.
Sina Mackay & Thomas P. Reber & Marcel Bausch & Jan Boström & Christian E. Elger & Florian Mormann, 2024. "Concept and location neurons in the human brain provide the ‘what’ and ‘where’ in memory formation," Nature Communications, Nature, vol. 15(1), pages 1-9, December.
Jörn Diedrichsen & Nikolaus Kriegeskorte, 2017. "Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis," PLOS Computational Biology, Public Library of Science, vol. 13(4), pages 1-33, April.
K. Paluch & M. Magnuski & W. Średniawa & D. Ivanovski & A. Rysz & M. Służewska-Niedźwiedź & T. Pasterski & W. Fortuna & K. Smarzewska & P. C. Reinacher & Sz. Kaczor & P. Tabakow & H. Babu & J. Kamińsk, 2025. "Unattended working memory items are coded by persistent activity in human medial temporal lobe neurons," Nature Human Behaviour, Nature, vol. 9(10), pages 2099-2113, October.
Chiara Gastaldi & Tilo Schwalger & Emanuela De Falco & Rodrigo Quian Quiroga & Wulfram Gerstner, 2021. "When shared concept cells support associations: Theory of overlapping memory engrams," PLOS Computational Biology, Public Library of Science, vol. 17(12), pages 1-44, December.
Carlo Baldassi & Alireza Alemi-Neissi & Marino Pagan & James J DiCarlo & Riccardo Zecchina & Davide Zoccolan, 2013. "Shape Similarity, Better than Semantic Membership, Accounts for the Structure of Visual Object Representations in a Population of Monkey Inferotemporal Neurons," PLOS Computational Biology, Public Library of Science, vol. 9(8), pages 1-20, August.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-30761-2. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Towards artificial general intelligence via a multimodal foundation model

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data