AI4(M)S Paper Collection

Update Time: 2025-12-09 17:08:52

This is a regularly updated paper collection about AI for science, with a specific focus on materials science.

📈 Publication Timeline

📝 Review/Survey (58 papers)

450. A call to elevate the role of processing in AI-driven materials design, Nature Reviews Materials (December 2025)
440. Self-driving labs for biotechnology, Nature Computational Science (November 2025)
389. Structural Insights into Autophagy in the AlphaFold Era, Journal of Molecular Biology (September 15, 2025)
376. High-throughput platforms for machine learning-guided lipid nanoparticle design, Nature Reviews Materials (September 08, 2025)
375. AI-driven protein design, Nature Reviews Bioengineering (September 08, 2025)
373. AI-Driven Defect Engineering for Advanced Thermoelectric Materials, Advanced Materials (September 04, 2025)
372. Artificial Intelligence-Driven Approaches in Semiconductor Research, Advanced Materials (September 04, 2025)
367. Developing machine learning for heterogeneous catalysis with experimental and computational data, Nature Reviews Chemistry (September 2025)
360. From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery, Preprint (August 30, 2025)
359. A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers, Preprint (August 28, 2025)
344. Steering towards safe self-driving laboratories, Nature Reviews Chemistry (August 18, 2025)
339. Empowering Generalist Material Intelligence with Large Language Models, Advanced Materials (August 14, 2025)
337. Probing the limitations of multimodal language models for chemistry and materials research, Nature Computational Science (August 11, 2025)
323. Deep learning for property prediction of natural fiber polymer composites, Scientific Reports (July 30, 2025)
319. Biomimetic Intelligent Thermal Management Materials: From Nature-Inspired Design to Machine-Learning-Driven Discovery, Advanced Materials (July 29, 2025)
308. Artificial Intelligence Paradigms for Next-Generation Metal–Organic Framework Research, Journal of the American Chemical Society (July 09, 2025)
275. Interpretable Machine Learning Applications: A Promising Prospect of AI for Materials, Advanced Functional Materials (May 13, 2025)
251. Network renormalization, Nature Reviews Physics (April 2025)
250. Applications of natural language processing and large language models in materials discovery, npj Computational Materials (March 24, 2025)
239. Machine Learning in Solid-State Hydrogen Storage Materials: Challenges and Perspectives, Advanced Materials (February 12, 2025)
238. Machine Learning in Polymer Research, Advanced Materials (February 09, 2025)
231. Knowledge-guided large language model for material science, Review of Materials Research (February 01, 2025)
229. A guidance to intelligent metamaterials and metamaterials intelligence, Nature Communications (January 29, 2025)
226. Integrating artificial intelligence with mechanistic epidemiological modeling: a scoping review of opportunities and challenges, Nature Communications (January 10, 2025)
224. Synthesis Strategies for High Entropy Nanoparticles, Advanced Materials (January 08, 2025)
222. Machine learning for the physics of climate, Nature Reviews Physics (January 2025)
220. AI4Materials: Transforming the landscape of materials science and enigneering, Review of Materials Research (January 01, 2025)
219. Computational microscopy with coherent diffractive imaging and ptychography, Nature (January 2025)
214. Poseidon: Efficient Foundation Models for PDEs, Advances in Neural Information Processing Systems (December 16, 2024)
213. Navigating Chemical Space with Latent Flows, Advances in Neural Information Processing Systems (December 16, 2024)
212. Generative Hierarchical Materials Search, Advances in Neural Information Processing Systems (December 16, 2024)
211. Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation, Advances in Neural Information Processing Systems (December 16, 2024)
208. Multifunctional high-entropy materials, Nature Reviews Materials (December 2024)
206. Towards the holistic design of alloys with large language models, Nature Reviews Materials (December 2024)
205. Synthesis of high-entropy materials, Nature Synthesis (December 2024)
190. Generative deep learning for the inverse design of materials, Preprint (September 27, 2024)
182. AI-driven research in pure mathematics and theoretical physics, Nature Reviews Physics (September 2024)
168. Promising directions of machine learning for partial differential equations, Nature Computational Science (July 2024)
155. High-Entropy Photothermal Materials, Advanced Materials (June 2024)
136. Data-Driven Design for Metamaterials and Multiscale Systems: A Review, Advanced Materials (February 22, 2024)
129. Expanding the Horizons of Machine Learning in Nanomaterials to Chiral Nanostructures, Advanced Materials (January 19, 2024)
111. Finite-difference time-domain methods, Nature Reviews Methods Primers (October 05, 2023)
97. Machine Learning Descriptors for Data-Driven Catalysis Study, Advanced Science (August 04, 2023)
88. The rise of self-driving labs in chemical and materials sciences, Nature Synthesis (June 2023)
87. Combinatorial synthesis for AI-driven materials discovery, Nature Synthesis (June 2023)
69. On scientific understanding with artificial intelligence, Nature Reviews Physics (December 2022)
58. Deep-learning seismology, Science (August 12, 2022)
56. Machine learning in the search for new fundamental physics, Nature Reviews Physics (June 2022)
53. High-entropy nanoparticles: Synthesis-structure-property relationships and data-driven discovery, Science (April 08, 2022)
47. Innovative Materials Science via Machine Learning, Advanced Functional Materials (February 03, 2022)
45. Applied Machine Learning for Developing Next-Generation Functional Materials, Advanced Functional Materials (December 16, 2021)
39. Nanoparticle synthesis assisted by machine learning, Nature Reviews Materials (August 2021)
38. Physics-informed machine learning, Nature Reviews Physics (June 2021)
35. Materials design by synthetic biology, Nature Reviews Materials (April 2021)
22. Machine learning and the physical sciences, Reviews of Modern Physics (December 06, 2019)
21. Data-Driven Materials Science: Status, Challenges, and Perspectives, Advanced Science (September 01, 2019)
18. Structure prediction drives materials discovery, Nature Reviews Materials (May 2019)
12. Inverse molecular design using machine learning: Generative models for matter engineering, Science (July 27, 2018)

📚 Venue Index

This section provides a quick overview of papers organized by publication venues for easy navigation.

📑 Papers (Chronological Order)

451. Real-space machine learning of correlation density functionals, Nature Communications (December 01, 2025)

Authors: Elias Polak, Heng Zhao, Stefan Vuckovic

Abstract: Machine learning (ML) plays a pivotal role in extending the reach of quantum chemistry methods for simulating both molecules and materials. However, leveraging ML to overcome the limitations of human-designed density functional approximations (DFAs), the primary workhorse for quantum simulations, remains a major challenge due to their severely limited transferability to unseen chemical systems. Here, we demonstrate how transferability is achieved using real-space ML, where energies are learned point by point in space through energy densities. Central to our real-space learning strategy is the derivation and implementation of correlation energy densities from regularized perturbation theory. This enables two key advances toward constructing highly transferable DFAs, grounded in the Møller-Plesset adiabatic connection framework, for correlation energies defined with respect to the Hartree-Fock reference. First, we introduce the Local Energy Loss, whose data efficiency (expanding each system’s single energy into thousands of data points) dramatically enhances transferability when combined with a physically informed ML model. Second, we formulate a real-space, machine-learned, and regularized extension of Spin-Component-Scaled second-order Møller-Plesset perturbation theory, yielding transferable DFAs that effectively mitigate the self-interaction errors common to traditional DFAs.

Tags: Computational chemistry, Method development, Density functional theory

450. A call to elevate the role of processing in AI-driven materials design, Nature Reviews Materials (December 2025)

Authors: Sreenivas Raguraman, Adam Griebel, Maitreyee Sharma Priyadarshini, Paulette Clancy, Timothy P. Weihs

Volume & Issue: Volume 10, Issue 12

Pages: 875-876

Abstract: Despite transformative advances in materials discovery, real-world performance still hinges on an often-overlooked variable: processing. To bridge the gap between discovery and deployment, processing must be elevated from an afterthought to a central pillar in design frameworks, data generation and machine learning.

Tags: None

449. Microstructure-agnostic deep learning for mechanistic discovery of corrosion-resistant Co-Cr-Fe-Ni MPEAs, Acta Materialia (December 01, 2025)

Authors: Zhengyu Zhang, Raja Shekar B Dandu, Dennis Boakye, Jun Yeop Lee, Hugh Shortt, Xuesong Fan, Lia Amalia, Zongyang Lyu, Jonathan Poplawsky, Chuang Deng, Peter K. Liaw, Wenjun Cai

Volume & Issue: Volume 301

Pages: 121619

Abstract: In multi-principal element alloys (MPEAs), the vast compositional design space and limited availability of microstructure-resolved corrosion data pose significant challenges to the predictive design of corrosion-resistant compositions. Traditional approaches rely heavily on trial-and-error experimentation and detailed microstructural characterization, which are time-consuming and resource-intensive. In this work, we develop a microstructure-agnostic deep learning framework that predicts the corrosion resistance of Co-Cr-Fe-Ni MPEAs directly from composition-based descriptors. By integrating physics-informed features, active learning, and uncertainty quantification, the model captures key physicochemical trends without requiring explicit structural input. The framework rapidly identifies non-equiatomic compositions with superior corrosion resistance, validated experimentally to outperform 304 L stainless steel under both acidic and chloride-rich conditions. Fundamental insights into corrosion mechanisms were obtained by linking model-predicted corrosion trends with electronic structure properties from density functional theory (DFT). The most corrosion-resistant compositions—e.g., Fe34Cr34 and Co34Cr34—exhibit high electron work functions and oxygen binding energies, indicative of strong passivation tendencies. Atom probe tomography (APT) of corroded surfaces confirms that these alloys form thinner (∼9 nm) and more protective passive films compared to conventional Cr-rich MPEAs (∼14 nm). Feature importance analysis within the machine learning model aligns with DFT-calculated elemental contributions to surface energetics, revealing a consistent ranking of Cr > Fe > Co > Ni in promoting passivation.

Tags: Machine learning, Alloys, APT, Corrosion, DFT, Multicomponent

448. Accelerating science with AI, Science (November 25, 2025)

Authors: Darío Gil, Kathryn A. Moler

Volume & Issue: Volume 0, Issue 0

Pages: eaee0605

Abstract: No abstract available

Tags: None

447. Leveraging spatial-angular redundancy for self-supervised denoising of 3D fluorescence imaging without temporal dependency, Nature Communications (November 24, 2025)

Authors: Zhi Lu, Wentao Chen, Feihao Sun, Jiaqi Fan, Xinyang Li, Zhenqi Fu, Manchang Jin, Jiamin Wu, Qionghai Dai

Abstract: Photon noise is one of the fundamental limits in fluorescence imaging. Despite broad applications, existing self-supervised denoising methods rely on either temporal redundancy or spatial redundancy, leading to degradation in either temporal resolution or spatial resolution, especially for 3D imaging. Here, we propose light field denoising (LF-denoising), a self-supervised transformer framework leveraging spatial-angular redundancy based on the high-dimensional light field measurements to achieve high-fidelity denoising without relying on temporal information and avoiding associated artefacts in spatial domain. We demonstrate the advantage of LF-denoising over previous methods in highly dynamic 3D imaging with both simulations and experimental data across different species. Combined with state-of-the-art light field microscopy variants, we achieve long-term high-speed high-resolution 3D intravital imaging on diverse animals including zebrafish, Drosophila and mice, with ultra-low excitation power of 10 μW/mm². Specifically, we show that LF-denoising well preserved the temporal causality with superior denoising performance, which is critical for quantitative biology analysis in immunology and neuroscience.

Tags: Microscopy, Fluorescence imaging

446. Discovering physical laws with parallel symbolic enumeration, Nature Computational Science (November 21, 2025)

Authors: Kai Ruan, Yilong Xu, Ze-Feng Gao, Yang Liu, Yike Guo, Ji-Rong Wen, Hao Sun

Pages: 1-14

Abstract: Symbolic regression has a crucial role in modern scientific research owing to its capability of discovering concise and interpretable mathematical expressions from data. A key challenge lies in the search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. Here, to this end, we introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data. Experiments show that PSE achieves higher accuracy and faster computation compared with the state-of-the-art baseline algorithms across over 200 synthetic and experimental problem sets (for example, improving the recovery accuracy by up to 99% and reducing runtime by an order of magnitude). PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models (for example, underlying physical laws), and improves the scalability of symbolic learning.

Tags: Applied mathematics, Computational science, Computer science

445. Multimodal learning enables chat-based exploration of single-cell data, Nature Biotechnology (November 11, 2025)

Authors: Moritz Schaefer, Peter Peneder, Daniel Malzl, Salvo Danilo Lombardo, Mihaela Peycheva, Jake Burton, Anna Hakobyan, Varun Sharma, Thomas Krausgruber, Celine Sin, Jörg Menche, Eleni M. Tomazou, Christoph Bock

Pages: 1-11

Abstract: Single-cell sequencing characterizes biological samples at unprecedented scale and detail, but data interpretation remains challenging. Here, we present CellWhisperer, an artificial intelligence (AI) model and software tool for chat-based interrogation of gene expression. We establish a multimodal embedding of transcriptomes and their textual annotations, using contrastive learning on 1 million RNA sequencing profiles with AI-curated descriptions. This embedding informs a large language model that answers user-provided questions about cells and genes in natural-language chats. We benchmark CellWhisperer’s performance for zero-shot prediction of cell types and other biological annotations and demonstrate its use for biological discovery in a meta-analysis of human embryonic development. We integrate a CellWhisperer chat box with the CELLxGENE browser, allowing users to interactively explore gene expression through a combined graphical and chat interface. In summary, CellWhisperer leverages large community-scale data repositories to connect transcriptomes and text, thereby enabling interactive exploration of single-cell RNA-sequencing data with natural-language chats.

Tags: Machine learning, Software, Transcriptomics, Gene regulation in immune cells, Preclinical research

444. Multilingualism protects against accelerated aging in cross-sectional and longitudinal analyses of 27 European countries, Nature Aging (November 10, 2025)

Authors: Lucia Amoruso, Hernan Hernandez, Hernando Santamaria-Garcia, Sebastian Moguilner, Agustina Legaz, Pavel Prado, Jhosmary Cuadros, Liset Gonzalez, Raul Gonzalez-Gomez, Joaquín Migeot, Carlos Coronel-Oliveros, Josephine Cruzat, Manuel Carreiras, Vicente Medel, Marcelo Adrián Maito, Claudia Duran-Aniotz, Enzo Tagliazucchi, Sandra Baez, Adolfo M. García, Agustin Ibanez

Pages: 1-15

Abstract: In cross-sectional and longitudinal analyses of 86,149 participants across 27 European countries, Amoruso, Hernandez and colleagues identify multilingualism as a protective factor against accelerated aging, using biobehavioral age gaps.

Tags: None

443. Emulating human-like adaptive vision for efficient and flexible machine visual perception, Nature Machine Intelligence (November 06, 2025)

Authors: Yulin Wang, Yang Yue, Huanqian Wang, Haojun Jiang, Yizeng Han, Zanlin Ni, Yifan Pu, Minglei Shi, Rui Lu, Qisen Yang, Andrew Zhao, Zhuofan Xia, Shiji Song, Gao Huang

Pages: 1-19

Abstract: A deep learning approach, AdaptiveNN, shifts machine vision models from passive to active to mimic human-like perception. The method achieves inference costs that are up to 28-times lower without accuracy loss, while showcasing online-adaptable and interpretable behaviours.

Tags: None

442. Squidiff: predicting cellular development and responses to perturbations using a diffusion model, Nature Methods (November 03, 2025)

Authors: Siyu He, Yuefei Zhu, Daniel Naveed Tavakol, Haotian Ye, Yeh-Hsing Lao, Zixian Zhu, Cong Xu, Shradha Chauhan, Guy Garty, Raju Tomer, Gordana Vunjak-Novakovic, James Zou, Elham Azizi, Kam W. Leong

Pages: 1-13

Abstract: Squidiff is a diffusion-based model to predict transcriptomic changes in response to perturbations.

Tags: None

441. Language models cannot reliably distinguish belief from knowledge and fact, Nature Machine Intelligence (November 03, 2025)

Authors: Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E. Ho, Thomas Icard, Dan Jurafsky, James Zou

Pages: 1-11

Abstract: Suzgun et al. find that current large language models cannot reliably distinguish between belief, knowledge and fact, raising concerns for their use in healthcare, law and journalism, where such distinctions are critical.

Tags: None

440. Self-driving labs for biotechnology, Nature Computational Science (November 2025)

Authors: Evan Collins, Robert Langer, Daniel G. Anderson

Volume & Issue: Volume 5, Issue 11

Pages: 976-979

Abstract: Self-driving laboratories that integrate robotic production with artificial intelligence have the potential to accelerate innovation in biotechnology. Because self-driving labs can be complex and not universally applicable, it is useful to consider their suitable use cases for successful integration into discovery workflows. Here, we review strategies for assessing the suitability of self-driving labs for biochemical design problems.

Tags: None

439. Enzyme specificity prediction using cross-attention graph neural networks, Nature (November 2025)

Authors: Haiyang Cui, Yufeng Su, Tanner J. Dean, Tianhao Yu, Zhengyi Zhang, Jian Peng, Diwakar Shukla, Huimin Zhao

Volume & Issue: Volume 647, Issue 8090

Pages: 639-647

Abstract: Enzymes are the molecular machines of life, and a key property that governs their function is substrate specificity—the ability of an enzyme to recognize and selectively act on particular substrates. This specificity originates from the three-dimensional (3D) structure of the enzyme active site and complicated transition state of the reaction1,2. Many enzymes can promiscuously catalyse reactions or act on substrates beyond those for which they were originally evolved1,3–5. However, millions of known enzymes still lack reliable substrate specificity information, impeding their practical applications and comprehensive understanding of the biocatalytic diversity in nature. Here we developed a cross-attention-empowered SE(3)-equivariant graph neural network architecture named EZSpecificity for predicting enzyme substrate specificity, which was trained on a comprehensive, tailor-made database of enzyme–substrate interactions at sequence and structural levels. EZSpecificity outperformed the existing machine learning models for enzyme substrate specificity prediction, as demonstrated by both an unknown substrate and enzyme database and seven proof-of-concept protein families. Experimental validation with eight halogenases and 78 substrates showed that EZSpecificity achieved a 91.7% accuracy in identifying the single potential reactive substrate, significantly higher than that of the state-of-the-art model enzyme substrate prediction (58.3%). EZSpecificity represents a general machine learning model for the accurate prediction of substrate specificity for enzymes related to fundamental and applied research in biology and medicine.

Tags: Biocatalysis, Machine learning, Protein function predictions

438. A multimodal whole-slide foundation model for pathology, Nature Medicine (November 2025)

Authors: Tong Ding, Sophia J. Wagner, Andrew H. Song, Richard J. Chen, Ming Y. Lu, Andrew Zhang, Anurag J. Vaidya, Guillaume Jaume, Muhammad Shaban, Ahrong Kim, Drew F. K. Williamson, Harry Robertson, Bowen Chen, Cristina Almagro-Pérez, Paul Doucet, Sharifa Sahai, Chengkuan Chen, Christina S. Chen, Daisuke Komura, Akihiro Kawabe, Mieko Ochi, Shinya Sato, Tomoyuki Yokose, Yohei Miyagi, Shumpei Ishikawa, Georg Gerber, Tingying Peng, Long Phi Le, Faisal Mahmood

Volume & Issue: Volume 31, Issue 11

Pages: 3749-3761

Abstract: The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning. However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions. We propose Transformer-based pathology Image and Text Alignment Network (TITAN), a multimodal whole-slide foundation model pretrained using 335,645 whole-slide images via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any fine-tuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis. We evaluate TITAN on diverse clinical tasks and find that it outperforms both ROI and slide foundation models across machine learning settings, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval and pathology report generation.

Tags: Machine learning, Pathology

437. Discovery of highly fluorescent covalent organic frameworks through AI-assisted iterative experiment–learning cycles, Nature Chemistry (November 2025)

Authors: Liang Zhang, Jiahui Du, Zikai Xie, Linjiang Chen, Wenlang Li, Weihang Geng, Yi Zhou, Xinwen Ou, Chengtao Gong, Yijun Gao, Shan He, Chunxing Yan, Chengbin Zhao, Yantian Jiao, Sheng-Yi Yang, Bing Huang, Jacky W. Y. Lam, Jun Qian, Jun Jiang, Ben Zhong Tang, Hexiang Deng

Volume & Issue: Volume 17, Issue 11

Pages: 1645-1654

Abstract: Developing porous crystalline materials with tailored properties is challenging because of the vast design space and the high cost of screening. Now, highly fluorescent covalent organic frameworks have been identified through an AI-assisted iterative experiment–learning cycle workflow that integrates electronic configuration and quantum-level insights into the learning process.

Tags: None

436. Automatic network structure discovery of physics informed neural networks via knowledge distillation, Nature Communications (October 29, 2025)

Authors: Ziti Liu, Yang Liu, Xunshi Yan, Wen Liu, Han Nie, Shuaiqi Guo, Chen-an Zhang

Volume & Issue: Volume 16, Issue 1

Pages: 9558

Abstract: Partial differential equations (PDEs) are fundamental for modeling complex physical processes, often exhibiting structural features such as symmetries and conservation laws. While physics-informed neural networks (PINNs) can simulate and invert PDEs, they mainly rely on external loss functions for physical constraints, making it difficult to automatically discover and embed physically consistent network structures. We propose a physics structure-informed neural network discovery method based on physics-informed distillation, which decouples physical and parameter regularization via staged optimization in teacher and student networks. After distillation, clustering and parameter reconstruction are used to extract and embed physically meaningful structures. Numerical experiments on Laplace, Burgers, and Poisson equations, as well as fluid mechanics, show that the method can automatically extract relevant structures, improve accuracy and training efficiency, and enhance structural adaptability and transferability. This approach offers a new perspective for efficient modeling and automatic discovery of structured neural networks.

Tags: Computational science, Fluid dynamics

435. MultiGATE: integrative analysis and regulatory inference in spatial multi-omics data via graph representation learning, Nature Communications (October 24, 2025)

Authors: Jishuai Miao, Jinzhao Li, Jingxue Xin, Jiajuan Tu, Muyang Ge, Ji Qi, Xiaocheng Zhou, Ying Zhu, Can Yang, Zhixiang Lin

Volume & Issue: Volume 16, Issue 1

Pages: 9403

Abstract: New spatial multi-omics technologies, which jointly profile transcriptome and epigenome/protein markers for the same tissue section, expand the frontiers of spatial techniques. Here, we introduce MultiGATE, which utilizes a two-level graph attention auto-encoder to integrate the multi-modality and spatial information in spatial multi-omics data. The key feature of MultiGATE is that it simultaneously performs embedding of the spatial pixels and infers the cross-modality regulatory relationship, which allows deeper data integration and provides insights on transcriptional regulation. We evaluate the performance of MultiGATE on spatial multi-omics datasets obtained from different tissues and platforms. Through effectively integrating spatial multi-omics data, MultiGATE both enhances the extraction of latent embeddings of the pixels and boosts the inference of transcriptional regulation for cross-modality genomic features.

Tags: Machine learning, Computational models, Data integration, Statistical methods

434. Deep-learning-based virtual screening of antibacterial compounds, Nature Biotechnology (October 24, 2025)

Authors: Gabriele Scalia, Steven T. Rutherford, Ziqing Lu, Kerry R. Buchholz, Nicholas Skelton, Kangway Chuang, Nathaniel Diamant, Jan-Christian Hütter, Jerome-Maxim Luescher, Anh Miu, Jeff Blaney, Leo Gendelev, Elizabeth Skippington, Greg Zynda, Nia Dickson, Micha Koziarski, Yoshua Bengio, Aviv Regev, Man-Wah Tan, Tommaso Biancalani

Pages: 1-14

Abstract: Antibacterial compounds are discovered by combining phenotypic screening with deep learning.

Tags: None

433. Energy-Based Coarse-Graining in Molecular Dynamics: A Flow-Based Framework without Data, Journal of Chemical Theory and Computation (October 24, 2025)

Authors: Maximilian Stupp, P. S. Koutsourelakis

Abstract: Coarse-grained (CG) models provide an effective route to reduce the complexity of molecular simulations, but conventional approaches depend heavily on long, all-atom molecular dynamics trajectories to adequately sample the configurational space. This data dependence limits accuracy and generalizability, as unvisited configurations remain excluded from the resulting CG models. We introduce a fully data-free, generative framework for coarse-graining that directly targets the all-atom Boltzmann distribution. The model defines a structured latent space comprising slow collective variables, associated with multimodal marginal densities capturing metastable states, and fast variables, represented through simple, unimodal conditional distributions. A learnable, bijective map from latent space to atomistic coordinates enables the automatic and accurate reconstruction of molecular structures. Training relies solely on the interatomic potential and minimizes the reverse Kullback–Leibler (KL) divergence via an energy-based objective. To stabilize optimization and ensure mode coverage, we employ an adaptive tempering scheme that promotes the exploration of diverse configurations. Once trained, the model can generate independent, one-shot equilibrium samples at full atomic resolution. Validation on two synthetic systems, a double-well potential and a Gaussian mixture model, as well as on the benchmark alanine dipeptide, demonstrates that the method captures all relevant modes of the Boltzmann distribution, reconstructs atomic configurations with high fidelity, and automatically learns physically meaningful CG representations. These results suggest that the proposed framework provides a promising, data-free alternative to traditional CG techniques, offering both a principled approach to addressing the long-standing “chicken-and-egg” challenge in coarse-graining and an effective solution to the back-mapping problem by enabling the accurate reconstruction of all-atom configurations.

Tags: None

Authors: Jin Su, Zhikai Li, Tianli Tao, Chenchen Han, Yan He, Fengyuan Dai, Qingyan Yuan, Yuan Gao, Tong Si, Xuting Zhang, Yuyang Zhou, Junjie Shan, Xibin Zhou, Xing Chang, Shiyu Jiang, Dacheng Ma, Martin Steinegger, Sergey Ovchinnikov, Fajie Yuan

Pages: 1-7

Abstract: SaprotHub is a community repository for protein language models.

Tags: None

431. Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes, Science (October 23, 2025)

Authors: Benjamin DeMeo, Charlotte Nesbitt, Samuel A. Miller, Daniel B. Burkhardt, Inna Lipchina, Doris Fu, Peter Holderrieth, David Kim, Sergey Kolchenko, Artur Szalata, Ishan Gupta, Christine Kerr, Thomas Pfefer, Raziel Rojas-Rodriguez, Sunil Kuppassani, Laurens Kruidenier, Parul B. Doshi, Mahdi Zamanighomi, James J. Collins, Alex K. Shalek, Fabian J. Theis, Mauricio Cortes

Volume & Issue: Volume 0, Issue 0

Pages: eadi8577

Abstract: Phenotypic drug screening remains constrained by the vastness of chemical space and technical challenges scaling experimental workflows. To overcome these barriers, computational methods have been developed to prioritize compounds, but they rely on either single-task models lacking generalizability or heuristic-based genomic proxies that resist optimization. We designed an active deep-learning framework that leverages omics to enable scalable, optimizable identification of compounds that induce complex phenotypes. Our generalizable algorithm outperformed state-of-the-art models on classical recall, translating to a 13-17x increase in phenotypic hit-rate across two hematological discovery campaigns. Combining this algorithm with a lab-in-the-loop signature refinement step, we achieved an additional two-fold increase in hit-rate and molecular insights. In sum, our framework enables efficient phenotypic hit identification campaigns, with broad potential to accelerate drug discovery.

Tags: None

430. Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes, Science (October 23, 2025)

Volume & Issue: Volume 390, Issue 6776

Pages: eadi8577

Abstract: Phenotypic drug screening remains constrained by the vastness of chemical space and the technical challenges of scaling experimental workflows. To overcome these barriers, computational methods have been developed to prioritize compounds, but they rely on either single-task models lacking generalizability or heuristic-based genomic proxies that resist optimization. We designed an active deep learning framework that leverages omics to enable scalable, optimizable identification of compounds that induce complex phenotypes. Our generalizable algorithm outperformed state-of-the-art models on classical recall, translating to a 13- to 17-fold increase in phenotypic hit rate across two hematological discovery campaigns. Combining this algorithm with a lab-in-the-loop signature refinement step, we achieved an additional twofold increase in hit rate along with molecular insights. In sum, our framework enables efficient phenotypic hit identification campaigns, with broad potential to accelerate drug discovery.

Tags: None

429. Discovering network dynamics with neural symbolic regression, Nature Computational Science (October 23, 2025)

Authors: Zihan Yu, Jingtao Ding, Yong Li

Pages: 1-13

Abstract: This study presents a neural symbolic regression approach that autonomously uncovers network dynamics from data. It was demonstrated to refine existing models of gene regulation and ecology, and identify epidemic transmission patterns across spatial scales to yield scientific insights.

Tags: None

428. CrystalFlow: a flow-based generative model for crystalline materials, Nature Communications (October 20, 2025)

Authors: Xiaoshan Luo, Zhenyu Wang, Qingchang Wang, Xuechen Shao, Jian Lv, Lei Wang, Yanchao Wang, Yanming Ma

Volume & Issue: Volume 16, Issue 1

Pages: 9267

Abstract: Deep learning-based generative models hold significant promise for exploring the configuration space of crystalline materials, though their application remains in its early stages. In this study, we present CrystalFlow, a flow-based generative model designed to address the unique challenges of this domain. By combining Continuous Normalizing Flows and Conditional Flow Matching with a graph-based equivariant neural network and symmetry-aware data representations, CrystalFlow efficiently models lattice parameters, atomic coordinates, and atom types. This architecture enables data-efficient learning and the generation of high-quality crystal structures. Our results indicate that CrystalFlow achieves performance comparable to state-of-the-art models on established benchmarks while exhibiting versatile conditional generation capabilities (e.g., predicting structures under specific pressures or material properties), and is approximately an order of magnitude more efficient than diffusion-based models in terms of integration steps.

Tags: Computational methods, Structure of solids and liquids

427. Flow matching for accelerated simulation of atomic transport in crystalline materials, Nature Machine Intelligence (October 16, 2025)

Authors: Juno Nam, Sulin Liu, Gavin Winter, KyuJung Jun, Soojung Yang, Rafael Gómez-Bombarelli

Pages: 1-11

Abstract: A generative framework that accelerates the simulations of atomic transport in crystalline solids is developed, enabling large-scale screening and extending simulations to larger spatiotemporal scales for energy storage materials.

Tags: None

426. gReLU: a comprehensive framework for DNA sequence modeling and design, Nature Methods (October 15, 2025)

Authors: Avantika Lal, Laura Gunsalus, Surag Nair, Tommaso Biancalani, Gokcen Eraslan

Pages: 1-5

Abstract: Deep learning models trained on DNA sequences can predict cell-type-specific regulatory activity, reveal cis-regulatory grammar, prioritize genetic variants and design synthetic DNA. However, building and interpreting these models correctly remains difficult, and models and software built by different groups are often not interoperable. Here we present gReLU, a comprehensive software framework that enables advanced sequence modeling pipelines, including data preprocessing, modeling, evaluation, interpretation, variant effect prediction and regulatory element design.

Tags: Machine learning, Software, Genomics

425. A neural symbolic model for space physics, Nature Machine Intelligence (October 15, 2025)

Authors: Jie Ying, Haowei Lin, Chao Yue, Yajie Chen, Chao Xiao, Quanqi Shi, Yitao Liang, Shing-Tung Yau, Yuan Zhou, Jianzhu Ma

Pages: 1-16

Abstract: Ying and colleagues present PhyE2E, an AI framework incorporating symbolic search techniques for discovering physics formulas directly from data. The method has already led to improvements in space physics models when compared, for example, with NASA’s 1993 formula for solar activity.

Tags: None

424. Transferable Generative Models Bridge Femtosecond to Nanosecond Time-Step Molecular Dynamics, Preprint (October 08, 2025)

Authors: Juan Viguera Diez, Mathias Schreiner, Simon Olsson

Abstract: Understanding molecular structure, dynamics, and reactivity requires bridging processes that occur across widely separated time scales. Conventional molecular dynamics simulations provide atomistic resolution, but their femtosecond time steps limit access to the slow conformational changes and relaxation processes that govern chemical function. Here, we introduce a deep generative modeling framework that accelerates sampling of molecular dynamics by four orders of magnitude while retaining physical realism. Applied to small organic molecules and peptides, the approach enables quantitative characterization of equilibrium ensembles and dynamical relaxation processes that were previously only accessible by costly brute-force simulation. Importantly, the method generalizes across chemical composition and system size, extrapolating to peptides larger than those used for training, and captures chemically meaningful transitions on extended time scales. By expanding the accessible range of molecular motions without sacrificing atomistic detail, this approach opens new opportunities for probing conformational landscapes, thermodynamics, and kinetics in systems central to chemistry and biophysics.

Tags: Statistics - Machine Learning, Physics - Chemical Physics

423. A deep single cell mass cytometry approach to capture canonical and noncanonical cell cycle states, Nature Communications (October 03, 2025)

Authors: Meelad Amouzgar, Patricia Favaro, Daniel Ho, Trevor Bruce, Sean C. Bendall

Volume & Issue: Volume 16, Issue 1

Pages: 8821

Abstract: The cell cycle (CC) underpins diverse cell processes like cell differentiation, cell expansion, and tumorigenesis but current single-cell (sc) strategies study CC as: coarse phases, rely on transcriptomic signatures, use imaging modalities limited to adherent cells, or lack high-throughput multiplexing. To solve this, we develop an expanded, Mass Cytometry (MC) approach with 48 CC-related molecules that deeply phenotypes the diversity of scCC states. Using Cytometry by Time of Flight, we quantify scCC states across suspension and adherent cell lines, and stimulated primary human T cells. Our approach captures the diversity of scCC states, including atypical CC states beyond canonical definitions. Pharmacologically-induced CC arrest reveals that perturbations exacerbate noncanonical states and induce previously unobserved states. Notably, primary cells escaping CC inhibition demonstrated aberrant CC states compared to untreated cells. Our approach enables deeper phenotyping of CC biology that generalizes to diverse cell systems with simultaneous multiplexing and integration with MC platforms.

Tags: Systems biology, Assay systems, Cell biology, Immunology, Proteomics

422. A generative artificial intelligence approach for the discovery of antimicrobial peptides against multidrug-resistant bacteria, Nature Microbiology (October 03, 2025)

Authors: Yihui Wang, Lanlan Zhao, Ziyun Li, Yaxuan Xi, Yingmiao Pan, Guoping Zhao, Lei Zhang

Pages: 1-16

Abstract: The discovery of novel antimicrobial peptides (AMPs) against clinical superbugs is urgently needed to address the ongoing antibiotic resistance crisis. AMPs are promising candidates due to their broad-spectrum activity, rapid bactericidal mechanisms and reduced likelihood of inducing resistance compared with conventional antibiotics. Here, a pre-trained protein large language model (LLM), ProteoGPT, was established and further developed into multiple specialized subLLMs to assemble a sequential pipeline. This pipeline enables rapid screening across hundreds of millions of peptide sequences, ensuring potent antimicrobial activity and minimizing cytotoxic risks. Through transfer learning, we endowed the LLMs with different domain-specific knowledge to achieve high-throughput mining and generation of AMPs within a unified methodological framework. Notably, both mined and generated AMPs exhibited reduced susceptibility to resistance development in ICU-derived carbapenem-resistant Acinetobacter baumannii (CRAB) and methicillin-resistant Staphylococcus aureus (MRSA) in vitro. The AMPs also showed comparable or superior therapeutic efficacy in in vivo thigh infection mouse models compared with clinical antibiotics, without causing organ damage and disrupting gut microbiota. The mechanisms of action of these AMPs involve disruption of the cytoplasmic membrane and membrane depolarization. Overall, this study presents a generative artificial intelligence approach for the discovery of novel antimicrobials against multidrug-resistant bacteria, enabling efficient and extensive exploration of AMP space.

Tags: High-throughput screening, Antibiotics, Antimicrobial resistance, Data mining

421. A comprehensive genetic catalog of human double-strand break repair, Science (October 02, 2025)

Authors: Ernesto López de Alba, Israel Salguero, Daniel Giménez-Llorente, Javier Montes-Torres, Ángel Fernández-Sanromán, Ester Casajús-Pelegay, José Terrón-Bautista, Jonathan Barroso-González, Juan A. Bernal, Geoff Macintyre, Rafael Fernández-Leiro, Ana Losada, Felipe Cortés-Ledesma

Volume & Issue: Volume 390, Issue 6768

Pages: eadr5048

Abstract: The analysis of DNA sequence outcomes provides molecular insights into double-strand break (DSB) repair mechanisms. Using parallel in-pool profiling of Cas9-induced insertions and deletions (indels) within a genome-wide knockout library, we present a comprehensive catalog that assesses the influence of nearly every human gene on DSB repair outcomes. This REPAIRome resource uncovers uncharacterized mechanisms, pathways, and factors involved in DSB repair, including opposing roles for XLF and PAXX, a molecular explanation for Cas9-induced multinucleotide insertions, HLTF functions in Cas9-induced DSB repair, the involvement of the SAGA complex in microhomology-mediated end joining, and an indel mutational signature linked to VHL loss, renal carcinoma, and hypoxia. These results exemplify the potential of REPAIRome to drive future discoveries in DSB repair, CRISPR-Cas gene editing and the etiology of cancer mutational signatures.

Tags: None

420. Additively Manufacturable High-Strength Aluminum Alloys with Coarsening-Resistant Microstructures Achieved via Rapid Solidification, Advanced Materials (October 02, 2025)

Authors: S. Mohadeseh Taheri-Mousavi, Michael Xu, Florian Hengsbach, Clay Houser, Zhaoxuan Ge, Benjamin Glaser, Shaolou Wei, Mirko Schaper, James M. LeBeau, Greg B. Olson, A. John Hart

Volume & Issue: Volume n/a, Issue n/a

Pages: e09507

Abstract: Additively manufactured aluminum (Al) alloys with high strength have broad industrial applications. Strength promotion necessitates a high-volume fraction of small, closely spaced precipitates to effectively impede dislocation motion. Here, it is shown that for certain compositions in the Al-Er-Zr-Y-Yb-Ni alloy class, L12-Al3M phases, the primary strength contributor, can initially precipitate as submicron-scale (≈100 nm) metastable ternary phases under the rapid solidification of powder bed additive manufacturing; yet the subsequent coarsening-resistant L12-Al3M phases that precipitate during heat treatment remain at the nanometer scale, imparting high strength. A candidate alloy is designed using hybrid calculation of phase diagrams (CALPHAD)-based integrated computational materials engineering (ICME) and Bayesian optimization algorithms. Powder is manufactured for this alloy and is additively manufactured into crack-free macroscale specimens with a strength that is five-fold that of the equivalent cast alloy and comparable to wrought Al 7075. After aging at 400 °C for 8 h, the room-temperature tensile strength reaches 395 MPa, which is 50% stronger than the best-known benchmark printable Al alloy. This integrated computational-experimental workflow shows the considerable potential to exploit rapid solidification in additive manufacturing to design alloys with commercially deployable properties.

Tags: machine learning, CALPHAD-based ICME, coarsening resistance, high strength, printable aluminum alloys

419. Strengthening nucleic acid biosecurity screening against generative protein design tools, Science (October 02, 2025)

Authors: Bruce J. Wittmann, Tessa Alexanian, Craig Bartling, Jacob Beal, Adam Clore, James Diggans, Kevin Flyangolts, Bryan T. Gemler, Tom Mitchell, Steven T. Murphy, Nicole E. Wheeler, Eric Horvitz

Volume & Issue: Volume 390, Issue 6768

Pages: 82-87

Abstract: Advances in artificial intelligence (AI)–assisted protein engineering are enabling breakthroughs in the life sciences but also introduce new biosecurity challenges. Synthesis of nucleic acids is a choke point in AI-assisted protein engineering pipelines. Thus, an important focus for efforts to enhance biosecurity given AI-enabled capabilities is bolstering methods used by nucleic acid synthesis providers to screen orders. We evaluated the ability of open-source AI-powered protein design software to create variants of proteins of concern that could evade detection by the biosecurity screening tools used by nucleic acid synthesis providers, identifying a vulnerability where AI-redesigned sequences could not be detected reliably by current tools. In response, we developed and deployed patches, greatly improving detection rates of synthetic homologs more likely to retain wild type–like function.

Tags: None

418. Resolving data bias improves generalization in binding affinity prediction, Nature Machine Intelligence (October 2025)

Authors: David Graber, Peter Stockinger, Fabian Meyer, Siddhartha Mishra, Claus Horn, Rebecca Buller

Volume & Issue: Volume 7, Issue 10

Pages: 1713-1725

Abstract: The field of computational drug design requires accurate scoring functions to predict binding affinities for protein–ligand interactions. However, train–test data leakage between the PDBbind database and the Comparative Assessment of Scoring Function benchmark datasets has severely inflated the performance metrics of currently available deep-learning-based binding affinity prediction models, leading to overestimation of their generalization capabilities. Here we address this issue by proposing PDBbind CleanSplit, a training dataset curated by a new structure-based filtering algorithm that eliminates train–test data leakage as well as redundancies within the training set. Retraining current top-performing models on CleanSplit caused their benchmark performance to drop substantially, indicating that the performance of existing models is largely driven by data leakage. By contrast, our graph neural network model maintains high benchmark performance when trained on CleanSplit. Leveraging a sparse graph modelling of protein–ligand interactions and transfer learning from language models, our model is able to generalize to strictly independent test datasets.

Tags: Machine learning, Drug discovery, Cheminformatics, Scientific data

417. Machine learning of charges and long-range interactions from energies and forces, Nature Communications (October 01, 2025)

Authors: Daniel S. King, Dongjin Kim, Peichen Zhong, Bingqing Cheng

Volume & Issue: Volume 16, Issue 1

Pages: 8763

Abstract: Accurate modeling of long-range forces is critical in atomistic simulations, as they play a central role in determining the properties of material and chemical systems. However, standard machine learning interatomic potentials (MLIPs) often rely on short-range approximations, limiting their applicability to systems with significant electrostatics and dispersion forces. We recently introduced the Latent Ewald Summation (LES) method, which captures long-range electrostatics without explicitly learning atomic charges or charge equilibration. We benchmark LES on diverse and challenging systems, including charged molecules, ionic liquids, electrolyte solutions, polar dipeptides, surface adsorption, electrolyte/solid interfaces, and solid-solid interfaces. Here we show that LES can reproduce the exact atomic charges for classical systems with fixed charges and can infer dipole and quadrupole moments, as well as the dipole derivative with respect to atomic positions, for quantum mechanical systems. Moreover, LES can achieve better accuracy in energy and force predictions compared to methods that explicitly learn from charges.

Tags: Atomistic models, Computational science, Method development

416. A full-featured 2D flash chip enabled by system integration, Nature (October 2025)

Authors: Chunsen Liu, Yongbo Jiang, Boqian Shen, Shengchao Yuan, Zhenyuan Cao, Zhongyu Bi, Chong Wang, Yutong Xiang, Tanjun Wang, Haoqi Wu, Zizheng Liu, Yang Wang, Shuiyuan Wang, Peng Zhou

Volume & Issue: Volume 646, Issue 8087

Pages: 1081-1088

Abstract: Two-dimensional (2D) materials have extended the device scalability1–3 of silicon (Si) technology and enabled fundamental innovations in device mechanisms4–6. Both industry7–9 and academia10–13, particularly in the field of integrated circuits, are pursuing integration breakthroughs to demonstrate the superiority of 2D electronics at the system level. Despite considerable integration progress on either 2D material integration11–13 or 2D-CMOS hybrid integration14, a system that can migrate the advantages of the device to the application is still lacking. Here we report a full-featured 2D NOR flash memory chip realized by an atomic device to chip (ATOM2CHIP) technology, which combines a superior 2D electronic device as a memory core and a powerful CMOS platform to support complex instruction control. The ATOM2CHIP blueprint includes a full-stack on-chip process and a cross-platform system design, providing a complete framework to bridge the gap from emerging device concept to an applicable chip. The full-stack on-chip process is a specially designed flow that incorporates planar integration, three-dimensional (3D) architecture and chip packaging, contributing to a high yield of 94.34% based on a full-chip test. The cross-platform system design handles both the 2D circuit design and the 2D-CMOS modules compatibility verification design, contributing to a highly complex, instruction-driven, full-featured chip with 8-bit commands and 32-bit parallelism. These results demonstrate an efficient system integration strategy that showcases the advantages of the 2D electronic system.

Tags: Electrical and electronic engineering, Electronic and spintronic devices

415. An adaptive autoregressive diffusion approach to design active humanized antibodies and nanobodies, Nature Machine Intelligence (October 2025)

Authors: Jian Ma, Fandi Wu, Tingyang Xu, Shaoyong Xu, Wei Liu, Liang Yan, Minghao Qu, Xiaoke Yang, Qifeng Bai, Junyu Xiao, Jianhua Yao

Volume & Issue: Volume 7, Issue 10

Pages: 1698-1712

Abstract: Jian Ma et al. present HuDiff, a diffusion-based deep learning framework that humanizes antibodies and nanobodies (a small type of antibody) without templates. The model achieves improved humanness while preserving or enhancing binding strength, and the authors show promising results in virus neutralization experiments.

Tags: None

414. Heat-rechargeable computation in DNA logic circuits and neural networks, Nature (October 01, 2025)

Authors: Tianqi Song, Lulu Qian

Pages: 1-8

Abstract: Metabolism enables life to sustain dynamics and to repeatedly interact with the environment by storing and consuming chemical energy. A major challenge for artificial molecular machines is to find a universal energy source akin to ATP for biological organisms and electricity for electromechanical machines. More than 20 years ago, DNA was first used as fuel to drive nanomechanical devices1,2 and catalytic reactions3. However, each system requires distinct fuel sequences, preventing DNA alone from becoming a universal energy source. Despite extensive efforts4, we still lack an ATP-like or electricity-like power supply to sustain diverse molecular machines. Here we show that heat can restore enzyme-free DNA circuits from equilibrium to out-of-equilibrium states. During heating and cooling, nucleic acids with strong secondary structures reach kinetically trapped states5,6, providing energy for subsequent computation. We demonstrate that complex logic circuits and neural networks, involving more than 200 distinct molecular species, can respond to a temperature ramp and recharge within minutes, allowing at least 16 rounds of computation with varying sequential inputs. Our strategy enables diverse systems to be powered by the same energy source without problematic waste build-up, thereby ensuring consistent performance over time. This scalable approach supports the sustained operation of enzyme-free molecular circuits and opens opportunities for advanced autonomous behaviours, such as iterative computation and unsupervised learning in artificial chemical systems.

Tags: DNA computing, Thermodynamics

413. Predictive model for the discovery of sinter-resistant supports for metallic nanoparticle catalysts by interpretable machine learning, Nature Catalysis (September 29, 2025)

Authors: Chenggong Jiang, Bill Yan, Bryan R. Goldsmith, Suljo Linic

Pages: 1-13

Abstract: The activity and stability of supported metal catalysts is in large part influenced by their interaction with the support. Now, neural network molecular dynamics simulations are combined with interpretable machine learning to reveal the governing factors of metal–support interactions for Pt nanoparticles on various oxide supports, identifying key features and proposing sinter-resistant supports.

Tags: None

412. InterPLM: discovering interpretable features in protein language models via sparse autoencoders, Nature Methods (September 29, 2025)

Authors: Elana Simon, James Zou

Pages: 1-11

Abstract: InterPLM is a computational framework to extract and analyze interpretable features from protein language models using sparse autoencoders. By training sparse autoencoders on ESM-2 embeddings, this study identifies thousands of interpretable biological features learned by the different layers of the ESM-2 model.

Tags: None

411. SimpleFold: Folding Proteins is Simpler than You Think, Preprint (September 27, 2025)

Authors: Yuyang Wang, Jiarui Lu, Navdeep Jaitly, Josh Susskind, Miguel Angel Bautista

Abstract: Protein folding models have achieved groundbreaking results typically via a combination of integrating domain knowledge into the architectural blocks and training pipelines. Nonetheless, given the success of generative models across different but related problems, it is natural to question whether these architectural designs are a necessary condition to build performant models. In this paper, we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving triangular updates, explicit pair representations or multiple training objectives curated for this specific domain. Instead, SimpleFold employs standard transformer blocks with adaptive layers and is trained via a generative flow-matching objective with an additional structural term. We scale SimpleFold to 3B parameters and train it on approximately 9M distilled protein structures together with experimental PDB data. On standard folding benchmarks, SimpleFold-3B achieves competitive performance compared to state-of-the-art baselines, in addition SimpleFold demonstrates strong performance in ensemble prediction which is typically difficult for models trained via deterministic reconstruction objectives. Due to its general-purpose architecture, SimpleFold shows efficiency in deployment and inference on consumer-level hardware. SimpleFold challenges the reliance on complex domain-specific architectures designs in protein folding, opening up an alternative design space for future progress.

Tags: Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods

410. Design of a potent interleukin-21 mimic for cancer immunotherapy, Science Immunology (September 26, 2025)

Authors: Jung-Ho Chun, Birkley S. Lim, Suyasha Roy, Michael J. Walsh, Gita C. Abhiraman, Kevin Zhangxu, Tavus Atajanova, Or-Yam Revach, Elisa C. Clark, Peng Li, Claire A. Palin, Asheema Khanna, Samantha Tower, Rakeeb Kureshi, Megan T. Hoffman, Tatyana Sharova, Aleigha Lawless, Sonia Cohen, Genevieve M. Boland, Tina Nguyen, Frank Peprah, Julissa G. Tello, Samantha Y. Liu, Chan Johng Kim, Hojeong Shin, Alfredo Quijano-Rubio, Kevin M. Jude, Stacey Gerben, Analisa Murray, Piper Heine, Michelle DeWitt, Umut Y. Ulge, Lauren Carter, Neil P. King, Daniel-Adriano Silva, Hao Yuan Kueh, Vandana Kalia, Surojit Sarkar, Russell W. Jenkins, K. Christopher Garcia, Warren J. Leonard, Michael Dougan, Stephanie K. Dougan, David Baker

Volume & Issue: Volume 10, Issue 111

Pages: eadx1582

Abstract: Long-standing goals of cancer immunotherapy are to activate cytotoxic antitumor T cells across a range of affinities for tumor antigens while suppressing regulatory T cells. Computational protein design has enabled the precise tailoring of proteins to meet specific needs. Here, we report a de novo designed IL-21 mimic, 21h10, with high stability and signaling potency in humans and mice. In murine and ex vivo human organotypic tumor models, 21h10 showed robust antitumor activity, with more prolonged signaling and stronger antitumor activity than native IL-21. 21h10 induced pancreatitis that could be mitigated by TNF blockade without compromising antitumor efficacy. Although antidrug antibodies to 21h10 formed, they were not neutralizing. 21h10 induced highly cytotoxic T cells with a range of affinities, robustly expanding intratumoral low-affinity cytotoxic T cells and driving high expression of IFN-γ and granzyme B compared with native IL-21, while increasing the frequency of IFN-γ+ T helper 1 cells and reducing regulatory T cells. The full human-mouse cross-reactivity, high stability and potency, and low-affinity antitumor responses support the translational potential of 21h10.

Tags: None

409. EpiAgent: foundation model for single-cell epigenomics, Nature Methods (September 25, 2025)

Authors: Xiaoyang Chen, Keyi Li, Xuejian Cui, Zian Wang, Qun Jiang, Jiacheng Lin, Zhen Li, Zijing Gao, Hairong Lv, Rui Jiang

Pages: 1-12

Abstract: EpiAgent is a foundation model for analyzing scATAC-seq data that excels in standard downstream tasks such as feature extraction, cell type annotation and data imputation while also enabling simulation of cCRE knockouts and cell state changes.

Tags: None

408. Activation entropy of dislocation glide in body-centered cubic metals from atomistic simulations, Nature Communications (September 24, 2025)

Authors: Arnaud Allera, Thomas D. Swinburne, Alexandra M. Goryaeva, Baptiste Bienvenu, Fabienne Ribeiro, Michel Perez, Mihai-Cosmin Marinica, David Rodney

Volume & Issue: Volume 16, Issue 1

Pages: 8367

Abstract: The activation entropy of dislocation glide, a key process controlling the strength of many metals, is often assumed to be constant or linked to enthalpy through the empirical Meyer-Neldel law-both of which are simplified approximations. In this study, we take a more direct approach by calculating the activation Gibbs energy for kink-pair nucleation on screw dislocations of two body-centered cubic metals, iron and tungsten. To ensure reliability, we develop machine learning interatomic potentials for both metals, carefully trained on dislocation data from density functional theory. Our findings reveal that dislocations undergo harmonic transitions between Peierls valleys, with an activation entropy that remains largely constant, regardless of temperature or applied stress. We use these results to parameterize a thermally-activated model of yield stress, which consistently matches experimental data in both iron and tungsten. Our work challenges recent studies using classical potentials, which report highly varying activation entropies, and suggests that simulations relying on classical potentials-widely used in materials modeling-could be significantly influenced by overestimated entropic effects.

Tags: Atomistic models, Materials chemistry, Metals and alloys

407. Design of facilitated dissociation enables timing of cytokine signalling, Nature (September 24, 2025)

Authors: Adam J. Broerman, Christoph Pollmann, Yang Zhao, Mauriz A. Lichtenstein, Mark D. Jackson, Maxx H. Tessmer, Won Hee Ryu, Masato Ogishi, Mohamad H. Abedi, Danny D. Sahtoe, Aza Allen, Alex Kang, Joshmyn De La Cruz, Evans Brackenbrough, Banumathi Sankaran, Asim K. Bera, Daniel M. Zuckerman, Stefan Stoll, K. Christopher Garcia, Florian Praetorius, Jacob Piehler, David Baker

Pages: 1-8

Abstract: Protein design has focused on the design of ground states, ensuring that they are sufficiently low energy to be highly populated1. Designing the kinetics and dynamics of a system requires, in addition, the design of excited states that are traversed in transitions from one low-lying state to another2,3. This is a challenging task because such states must be sufficiently strained to be poorly populated, but not so strained that they are not populated at all, and because protein design methods have focused on generating near-ideal structures4–7. Here we describe a general approach for designing systems that use an induced-fit power stroke8 to generate a structurally frustrated9 and strained excited state, allosterically driving protein complex dissociation. X-ray crystallography, double electron–electron resonance spectroscopy and kinetic binding measurements show that incorporating excited states enables the design of effector-induced increases in dissociation rates as high as 5,700-fold. We highlight the power of this approach by designing rapid biosensors, kinetically controlled circuits and cytokine mimics that can be dissociated from their receptors within seconds, enabling dissection of the temporal dynamics of interleukin-2 signalling.

Tags: Protein design, Deformation dynamics, Interleukins, Kinetics, X-ray crystallography

406. Tailoring polymer electrolyte solvation for 600 Wh kg−1 lithium batteries, Nature (September 24, 2025)

Authors: Xue-Yan Huang, Chen-Zi Zhao, Wei-Jin Kong, Nan Yao, Zong-Yao Shuang, Pan Xu, Shuo Sun, Yang Lu, Wen-Ze Huang, Jin-Liang Li, Liang Shen, Xiang Chen, Jia-Qi Huang, Lynden A. Archer, Qiang Zhang

Pages: 1-8

Abstract: An in-built fluoropolyether-based quasi-solid-state polymer electrolyte enables high-capacity lithium-rich manganese-based layered oxide cathodes with stable interfaces, achieving 604 Wh kg−1 pouch-cell energy density.

Tags: None

405. EDBench: Large-Scale Electron Density Data for Molecular Modeling, Preprint (September 24, 2025)

Authors: Hongxin Xiang, Ke Li, Mingquan Liu, Zhixiang Cheng, Bin Yao, Wenjie Du, Jun Xia, Li Zeng, Xin Jin, Xiangxiang Zeng

Abstract: Existing molecular machine learning force fields (MLFFs) generally focus on the learning of atoms, molecules, and simple quantum chemical properties (such as energy and force), but ignore the importance of electron density (ED) $\rho(r)$ in accurately understanding molecular force fields (MFFs). ED describes the probability of finding electrons at specific locations around atoms or molecules, which uniquely determines all ground state properties (such as energy, molecular structure, etc.) of interactive multi-particle systems according to the Hohenberg-Kohn theorem. However, the calculation of ED relies on the time-consuming first-principles density functional theory (DFT) which leads to the lack of large-scale ED data and limits its application in MLFFs. In this paper, we introduce EDBench, a large-scale, high-quality dataset of ED designed to advance learning-based research at the electronic scale. Built upon the PCQM4Mv2, EDBench provides accurate ED data, covering 3.3 million molecules. To comprehensively evaluate the ability of models to understand and utilize electronic information, we design a suite of ED-centric benchmark tasks spanning prediction, retrieval, and generation. Our evaluation on several state-of-the-art methods demonstrates that learning from EDBench is not only feasible but also achieves high accuracy. Moreover, we show that learning-based method can efficiently calculate ED with comparable precision while significantly reducing the computational cost relative to traditional DFT calculations. All data and benchmarks from EDBench will be freely available, laying a robust foundation for ED-driven drug discovery and materials science.

Tags: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Physics - Chemical Physics

404. Structural constraint integration in a generative model for the discovery of quantum materials, Nature Materials (September 22, 2025)

Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Manasi Mandal, Kiran Mak, Denisse Córdova Carrizales, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

Pages: 1-8

Abstract: This work presents SCIGEN, a machine learning framework integrating geometric constraints into material generation. The framework enables the discovery of stable quantum material candidates, and the authors synthesize two predicted magnetic materials.

Tags: None

403. Active Learning for Machine Learning Driven Molecular Dynamics, Preprint (September 21, 2025)

Authors: Kevin Bachelor, Sanya Murdeshwar, Daniel Sabo, Razvan Marinescu

Abstract: Machine learned coarse grained (CG) potentials are fast, but degrade over time when simulations reach undersampled biomolecular conformations, and generating widespread all atom (AA) data to combat this is computationally infeasible. We propose a novel active learning framework for CG neural network potentials in molecular dynamics (MD). Building on the CGSchNet model, our method employs root mean squared deviation (RMSD) based frame selection from MD simulations in order to generate data on the fly by querying an oracle during the training of a neural network potential. This framework preserves CG level efficiency while correcting the model at precise, RMSD identified coverage gaps. By training CGSchNet, a coarse grained neural network potential, we empirically show that our framework explores previously unseen configurations and trains the model on unexplored regions of conformational space. Our active learning framework enables a CGSchNet model trained on the Chignolin protein to achieve a 33.05% improvement in the Wasserstein 1 (W1) metric in Time lagged Independent Component Analysis (TICA) space on an in house benchmark suite.

Tags: Computer Science - Machine Learning, Physics - Atomic and Molecular Clusters

402. 3D multi-omic mapping of whole nondiseased human fallopian tubes at cellular resolution reveals a large incidence of ovarian cancer precursors, Preprint (September 21, 2025)

Authors: André Forjaz, Vasco Queiroga, Yajuan Li, Alfredo Hernandez, Ashleigh Crawford, Xiaoyu Qin, Mei Zhong, Margarita Tsapatsis, Saurabh Joshi, Donald Kramer, Oliver Nizet, Habin Bea, Yuhan Li, Shuo Qin, Robert O’Flynn, Mingder Yang, Brittany Pratt, Fan Wu, Paul Gensbigler, Max Blecher, Pei-Hsun Wu, Luciane T. Kagohara, Ie-Ming Shih, David Zwicker, Mark Atkinson, Lingyan Shi, Rong Fan, Ashley L. Kiemen, Denis Wirtz

Abstract: Uncovering the spatial and molecular landscape of precancerous lesions is essential for developing meaningful cancer prevention and early detection strategies. High-Grade Serous Carcinoma (HGSC), the most lethal gynecologic malignancy, often originates from Serous Tubal Intraepithelial Carcinomas (STICs) in the fallopian tubes, yet their minute size and our historical reliance on standard 2D histology contribute to their underreporting. Here, we present a spatially resolved, multi-omics framework that integrates whole-organ 3D imaging at cellular resolution with targeted proteomic, metabolomic, and transcriptomic profiling to detect and characterize microscopic tubal lesions. Using this platform, we identified a total of 99 STICs and their presumed precursors that harbor TP53 mutations in morphologically unremarkable tubal epithelium in all five specimens obtained from cancer-free organ donors with average-risk of developing ovarian cancer. Although these lesions comprised only 0.2% of the epithelial compartment, they displayed geographic diversity, immune exclusion, metabolic rewiring, and DNA copy number changes among lesions and normal fallopian tube epithelium discovered alterations in STIC-associated genes and the pathways they control. In sum, this platform provides a comprehensive 3D atlas of early neoplastic transformation, yielding mechanistic insights into tumor initiation and informing clinical screening strategies for detecting cancer precursors in whole organs at cellular resolution.

Tags: None

401. De novo Design of All-atom Biomolecular Interactions with RFdiffusion3, Preprint (September 18, 2025)

Authors: Jasper Kenneth Veje Butcher, Rohith Krishna, Raktim Mitra, Rafael Isaac Brent, Yanjing Li, Nathaniel Corley, Paul Kim, Jonathan Funk, Simon Valentin Mathis, Saman Salike, Aiko Muraishi, Helen Eisenach, Tuscan Rock Thompson, Jie Chen, Yuliya Politanska, Enisha Sehgal, Brian Coventry, Odin Zhang, Bo Qiang, Kieran Didi, Maxwell Kazman, Frank DiMaio, David Baker

Abstract: Deep learning has accelerated protein design, but most existing methods are restricted to generating protein backbone coordinates and often neglect interactions with other biomolecules. We present RFdiffusion3 (RFD3), a diffusion model that generates protein structures in the context of ligands, nucleic acids and other non-protein constellations of atoms. Because all polymer atoms are modeled explicitly, conditioning the model on complex sets of atom-level constraints for enzyme design and other challenges is both simpler and more effective than previous approaches. RFD3 achieves improved performance compared to prior approaches on a range of in silico benchmarks with one tenth the computational cost. Finally, we demonstrate the broad applicability of RFD3 by designing and experimentally characterizing DNA binding proteins and cysteine hydrolases. The ability to rapidly generate protein structures guided by complex sets of atom-level constraints in the context of arbitrary non-protein atoms should further expand the range of functions attainable through protein design.

Tags: None

400. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning, Nature (September 17, 2025)

Authors: Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Honghui Ding, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jingchang Chen, Jingyang Yuan, Jinhao Tu, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaichao You, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingxu Zhou, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang

Volume & Issue: Volume 645, Issue 8081

Pages: 633-638

Abstract: General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models.

Tags: Computer science, Electrical and electronic engineering

399. Generative design of novel bacteriophages with genome language models, Preprint (September 17, 2025)

Authors: Samuel H. King, Claudia L. Driscoll, David B. Li, Daniel Guo, Aditi T. Merchant, Garyk Brixi, Max E. Wilkinson, Brian L. Hie

Abstract: Many important biological functions arise not from single genes, but from complex interactions encoded by entire genomes. Genome language models have emerged as a promising strategy for designing biological systems, but their ability to generate functional sequences at the scale of whole genomes has remained untested. Here, we report the first generative design of viable bacteriophage genomes. We leveraged frontier genome language models, Evo 1 and Evo 2, to generate whole-genome sequences with realistic genetic architectures and desirable host tropism, using the lytic phage ΦX174 as our design template. Experimental testing of AI-generated genomes yielded 16 viable phages with substantial evolutionary novelty. Cryo-electron microscopy revealed that one of the generated phages utilizes an evolutionarily distant DNA packaging protein within its capsid. Multiple phages demonstrate higher fitness than ΦX174 in growth competitions and in their lysis kinetics. A cocktail of the generated phages rapidly overcomes ΦX174-resistance in three E. coli strains, demonstrating the potential utility of our approach for designing phage therapies against rapidly evolving bacterial pathogens. This work provides a blueprint for the design of diverse synthetic bacteriophages and, more broadly, lays a foundation for the generative design of useful living systems at the genome scale.

Tags: None

398. Modeling-Making-Modulating High-Entropy Alloy with Activated Water-Dissociation Centers for Superior Electrocatalysis, Journal of the American Chemical Society (September 17, 2025)

Authors: Ho Ngoc Nam, Ravi Nandan, Lei Fu, Yingji Zhao, Yunqing Kang, Tetsuya Fukushima, Kazunori Sato, Yusuke Asakura, Ovidiu Cretu, Jun Kikkawa, Joel Henzie, Jonathan P. Hill, Takeshi Yanai, Quan Manh Phung, Yusuke Yamauchi

Volume & Issue: Volume 147, Issue 37

Pages: 33545-33558

Abstract: High-entropy alloys (HEAs) have recently emerged as promising electrocatalysts for complex reactions owing to their tunable electronic structures and diverse, unique binding sites. However, their vast compositional space, in terms of both elemental variety and atomic ratios, presents a major challenge to the rational design of high-performance catalysts, as experimental efforts are often hindered by ambiguous element selection and inefficient trial-and-error methods. In this work, a bottom-up research strategy using machine learning-assisted first-principles calculations was applied to accelerate the design of quinary HEAs toward efficient multielectron transfer reactions. Here, we report the design of PtPdRhRuMo, which exhibits key physicochemical properties favoring the methanol oxidation reaction. Notably, the incorporation of Mo as the fifth element significantly activates specific binding sites on HEA surfaces, enhancing methanol adsorption and, in particular, the water dissociation ability. This facilitates hydroxyl species formation, which effectively mitigates CO intermediate adherence while promoting the complete oxidation of CH3OH to CO2 via alternative reaction pathways. Guided by theoretical predictions, experimental samples with different morphologies of mesoporous PtPdRhRuMo catalyst (m-HEANP(Mo) nanoparticles and m-HEAF(Mo) thin film) were then synthesized, demonstrating superior electrocatalysis with a large current density of up to 18.20 mA cm–2 and a mass activity of 9.89 A mgPt–1, alongside the long-term durability for efficient methanol electrooxidation applications.

Tags: None

397. Learning the natural history of human disease with generative transformers, Nature (September 17, 2025)

Authors: Artem Shmatko, Alexander Wolfgang Jung, Kumar Gaurav, Søren Brunak, Laust Hvas Mortensen, Ewan Birney, Tom Fitzgerald, Moritz Gerstung

Pages: 1-9

Abstract: Decision-making in healthcare relies on understanding patients’ past and current health states to predict and, ultimately, change their future course1–3. Artificial intelligence (AI) methods promise to aid this task by learning patterns of disease progression from large corpora of health records4,5. However, their potential has not been fully investigated at scale. Here we modify the GPT6 (generative pretrained transformer) architecture to model the progression and competing nature of human diseases. We train this model, Delphi-2M, on data from 0.4 million UK Biobank participants and validate it using external data from 1.9 million Danish individuals with no change in parameters. Delphi-2M predicts the rates of more than 1,000 diseases, conditional on each individual’s past disease history, with accuracy comparable to that of existing single-disease models. Delphi-2M’s generative nature also enables sampling of synthetic future health trajectories, providing meaningful estimates of potential disease burden for up to 20 years, and enabling the training of AI models that have never seen actual data. Explainable AI methods7 provide insights into Delphi-2M’s predictions, revealing clusters of co-morbidities within and across disease chapters and their time-dependent consequences on future health, but also highlight biases learnt from training data. In summary, transformer-based models appear to be well suited for predictive and generative health-related tasks, are applicable to population-scale datasets and provide insights into temporal dependencies between disease events, potentially improving the understanding of personalized health risks and informing precision medicine approaches.

Tags: Computational science, Risk factors, Diseases

396. Discovery of Unstable Singularities, Preprint (September 17, 2025)

Authors: Yongji Wang, Mehdi Bennani, James Martens, Sébastien Racanière, Sam Blackwell, Alex Matthews, Stanislav Nikolov, Gonzalo Cao-Labora, Daniel S. Park, Martin Arjovsky, Daniel Worrall, Chongli Qin, Ferran Alet, Borislav Kozlovskii, Nenad Tomašev, Alex Davies, Pushmeet Kohli, Tristan Buckmaster, Bogdan Georgiev, Javier Gómez-Serrano, Ray Jiang, Ching-Yao Lai

Abstract: Whether singularities can form in fluids remains a foundational unanswered question in mathematics. This phenomenon occurs when solutions to governing equations, such as the 3D Euler equations, develop infinite gradients from smooth initial conditions. Historically, numerical approaches have primarily identified stable singularities. However, these are not expected to exist for key open problems, such as the boundary-free Euler and Navier-Stokes cases, where unstable singularities are hypothesized to play a crucial role. Here, we present the first systematic discovery of new families of unstable singularities. A stable singularity is a robust outcome, forming even if the initial state is slightly perturbed. In contrast, unstable singularities are exceptionally elusive; they require initial conditions tuned with infinite precision, being in a state of instability whereby infinitesimal perturbations immediately divert the solution from its blow-up trajectory. In particular, we present multiple new, unstable self-similar solutions for the incompressible porous media equation and the 3D Euler equation with boundary, revealing a simple empirical asymptotic formula relating the blow-up rate to the order of instability. Our approach combines curated machine learning architectures and training schemes with a high-precision Gauss-Newton optimizer, achieving accuracies that significantly surpass previous work across all discovered solutions. For specific solutions, we reach near double-float machine precision, attaining a level of accuracy constrained only by the round-off errors of the GPU hardware. This level of precision meets the requirements for rigorous mathematical validation via computer-assisted proofs. This work provides a new playbook for exploring the complex landscape of nonlinear partial differential equations (PDEs) and tackling long-standing challenges in mathematical physics.

Tags: Mathematics - Analysis of PDEs, Physics - Fluid Dynamics

395. Automated and modular protein binder design with BinderFlow, Preprint (September 16, 2025)

Authors: Carlos Chacon-Sanchez, Nayim Gonzalez-Rodriguez, Oscar Llorca, Rafael Fernandez-Leiro

Abstract: Deep learning has revolutionised de novo protein design, with new models achieving unprecedented success in creating novel proteins with specific functions, including artificial protein binders. However, current methods remain computationally demanding and challenging to operate without specialised infrastructure and expertise. To overcome these limitations, we developed BinderFlow, a structured and parallelised pipeline for protein binder design. Its batch-based nature enables live monitoring of design campaigns, seamless coexistence with other GPU-intensive processes, and reduces human intervention. Furthermore, BinderFlow’s modular structure enables straightforward modifications to the design pipeline to incorporate new models and tools or to implement alternative design strategies. Complementing this, we developed BFmonitor, a web-based dashboard that simplifies campaign monitoring, design evaluation, and hit selection. Together, these tools lower the entry barrier for non-specialised users and streamline expert workflows, making generative protein design more accessible, scalable and practical for both exploratory and production-level research.

Tags: None

394. A Generative Foundation Model for Antibody Design, Preprint (September 16, 2025)

Authors: Rubo Wang, Fandi Wu, Jiale Shi, Yidong Song, Yu Kong, Jian Ma, Bing He, Qihong Yan, Tianlei Ying, Xingyu Gao, Peilin Zhao, Jianhua Yao

Abstract: Antibodies are indispensable components of the immune system, yet the design of high-affinity antibodies remains a time-consuming and experimentally intensive process. To address this challenge, we present IgGM, a novel generative foundation model designed to accelerate high-affinity antibody engineering. IgGM learns the complex relationships underlying the binding interactions between antigens and antibodies, as well as the mapping between antibody sequences and structures. By conditioning on different inputs, IgGM supports a wide range of antibody design tasks, including complex structure prediction, inverse design, affinity maturation, framework optimization, humanization, and de novo antibody design. It is compatible with both conventional antibodies and nanobodies, and allows user-defined CDR loop lengths for flexible design. To prioritize candidates, we introduce a frequency-based computational screening strategy that enhances design efficiency. Extensive evaluation through both in silico benchmarks and in vitro experiments across diverse antigens such as PD-L1, Protein A, TNF-alpha, IL-33, SARS-CoV-2 RBD and its variants demonstrates that IgGM consistently generates antibodies or nanobodies with high measured affinity. These results underscore IgGM’s versatility and effectiveness as a powerful tool for next-generation antibody discovery and optimization.

Tags: None

393. MSnLib: efficient generation of open multi-stage fragmentation mass spectral libraries, Nature Methods (September 15, 2025)

Authors: Corinna Brungs, Robin Schmid, Steffen Heuckeroth, Aninda Mazumdar, Matúš Drexler, Pavel Šácha, Pieter C. Dorrestein, Daniel Petras, Louis-Felix Nothias, Václav Veverka, Radim Nencka, Zdeněk Kameník, Tomáš Pluskal

Pages: 1-4

Abstract: Untargeted high-resolution mass spectrometry is a key tool in clinical metabolomics, natural product discovery and exposomics, with compound identification remaining the major bottleneck. Currently, the standard workflow applies spectral library matching against tandem mass spectrometry (MS2) fragmentation data. Multi-stage fragmentation (MSn) yields more profound insights into substructures, enabling validation of fragmentation pathways; however, the community lacks open MSn reference data of diverse natural products and other chemicals. Here we describe MSnLib, a machine learning-ready open resource of >2 million spectra in MSn trees of 30,008 unique small molecules, built with a high-throughput data acquisition and processing pipeline in the open-source software mzmine.

Tags: Mass spectrometry, Metabolomics

392. Spatial gene expression at single-cell resolution from histology using deep learning with GHIST, Nature Methods (September 15, 2025)

Authors: Xiaohang Fu, Yue Cao, Beilei Bian, Chuhan Wang, Dinny Graham, Nirmala Pathmanathan, Ellis Patrick, Jinman Kim, Jean Yee Hwa Yang

Pages: 1-11

Abstract: The increased use of spatially resolved transcriptomics provides new biological insights into disease mechanisms. However, the high cost and complexity of these methods are barriers to broader application. Consequently, methods have been created to predict spot-based gene expression from routinely collected histology images. Recent benchmarking showed that current methodologies have limited accuracy and spatial resolution, constraining translational capacity. Here, we introduce GHIST, a deep learning-based framework that predicts spatial gene expression at single-cell resolution by leveraging subcellular spatial transcriptomics and synergistic relationships between multiple layers of biological information. We validated GHIST using public datasets and The Cancer Genome Atlas data, demonstrating its flexibility across different spatial resolutions and superior performance. Our results underscore the utility of in silico generation of single-cell spatial gene expression measurements and the capacity to enrich existing datasets with a spatially resolved omics modality, paving the way for scalable multi-omics analysis and biomarker identification.

Tags: Machine learning, Image processing, Computational models, Functional genomics, Gene expression

391. Accelerated multi-objective alloy discovery through efficient bayesian methods: Application to the FCC high entropy alloy space, Acta Materialia (September 15, 2025)

Authors: Trevor Hastings, Mrinalini Mulukutla, Danial Khatamsaz, Daniel Salas, Wenle Xu, Daniel Lewis, Nicole Person, Matthew Skokan, Braden Miller, James Paramore, Brady Butler, Douglas Allaire, Vahid Attari, Ibrahim Karaman, George Pharr, Ankit Srivastava, Raymundo Arróyave

Volume & Issue: Volume 297

Pages: 121173

Abstract: This study introduces BIRDSHOT, an integrated Bayesian materials discovery framework designed to efficiently explore complex compositional spaces while optimizing multiple material properties. We applied this framework to the CoCrFeNiVAl FCC high entropy alloy (HEA) system, targeting three key performance objectives: ultimate tensile strength/yield strength ratio, hardness, and strain rate sensitivity. The experimental campaign employed an integrated cyber-physical approach that combined vacuum arc melting (VAM) for alloy synthesis with advanced mechanical testing, including tensile and high-strain-rate nanoindentation testing. By incorporating batch Bayesian optimization schemes that allowed the parallel exploration of the alloy space, we completed five iterative design-make-test-learn loops, identifying a non-trivial three-objective Pareto set in a high-dimensional alloy space. Notably, this was achieved by exploring only 0.15 % of the feasible design space, representing a significant acceleration in discovery rate relative to traditional methods. This work demonstrates the capability of BIRDSHOT to navigate complex, multi-objective optimization challenges and highlights its potential for broader application in accelerating materials discovery.

Tags: Machine learning, Bayesian optimization, High-throughput, Materials discovery, Multi-objective optimization

390. Bridging histology and spatial gene expression across scales, Nature Methods (September 15, 2025)

Authors: Ying Ma

Pages: 1-3

Abstract: Two deep-learning frameworks — GHIST and iSCALE — turn routine histology images into a rich molecular resource, and predict spatial gene expression at single-cell resolution (GHIST) and at super-resolution across large tissue sections (iSCALE), for scalable, data-driven tissue biology.

Tags: None

389. Structural Insights into Autophagy in the AlphaFold Era, Journal of Molecular Biology (September 15, 2025)

Authors: Tatsuro Maruyama, Nobuo N. Noda

Volume & Issue: Volume 437, Issue 18

Pages: 169235

Abstract: Autophagy, a lysosomal intracellular degradation system, is characterized by the de novo biogenesis of autophagosomes. This biogenesis is mediated by approximately 20 core Atg proteins, exhibiting a high degree of conservation from yeast to humans. Over the preceding two decades, the structural biology of autophagy has been investigated predominantly through X-ray crystallography, NMR spectroscopy, and, more recently, cryo-electron microscopy, collectively contributing to the elucidation of the structural basis of core Atg proteins. The recent introduction of AlphaFold has significantly improved the precision of structure prediction and is transforming structural biology research. In this review, we aim to synthesize the structural basis of the operational mechanisms of the core Atg proteins, employing structures predicted by AlphaFold, and to discuss the molecular mechanisms that drive autophagosome biogenesis.

Tags: AlphaFold, AlphaFold2, AlphaFold3, autophagosome formation, autophagy

388. Scaling up spatial transcriptomics for large-sized tissues: uncovering cellular-level tissue architecture beyond conventional platforms with iSCALE, Nature Methods (September 15, 2025)

Authors: Amelia Schroeder, Melanie L. Loth, Chunyu Luo, Sicong Yao, Hanying Yan, Daiwei Zhang, Sarbottam Piya, Edward Plowey, Wenxing Hu, Jean R. Clemenceau, Inyeop Jang, Minji Kim, Isabel Barnfather, Su Jing Chan, Taylor L. Reynolds, Thomas Carlile, Patrick Cullen, Ji-Youn Sung, Hui-Hsin Tsai, Jeong Hwan Park, Tae Hyun Hwang, Baohong Zhang, Mingyao Li

Pages: 1-12

Abstract: Recent advances in spatial transcriptomics (ST) technologies have transformed our ability to profile gene expression while preserving crucial spatial context within tissues. However, existing ST platforms are constrained by high costs, long turnaround times, low resolution, limited gene coverage and inherently small tissue capture areas, which hinder their broad applications. Here we present iSCALE, a method that reconstructs large-scale, super-resolution gene expression landscapes and automatically annotates cellular-level tissue architecture in samples exceeding capture areas of current ST platforms. The performance of iSCALE was assessed by comprehensive evaluations involving benchmarking experiments, immunohistochemistry staining and manual annotations by pathologists. When applied to multiple sclerosis human brain samples, iSCALE uncovered lesion-associated cellular characteristics undetectable by conventional ST experiments. Our results demonstrate the utility of iSCALE in analyzing large tissues by enabling unbiased annotation, resolving cell type composition, mapping cellular microenvironments and revealing spatial features beyond the reach of standard ST analysis or routine histopathological assessment.

Tags: Machine learning, Gene expression analysis, RNA sequencing, Transcriptomics

387. Integrating diverse experimental information to assist protein complex structure prediction by GRASP, Nature Methods (September 15, 2025)

Authors: Yuhao Xie, Chengwei Zhang, Shimian Li, Xinyu Du, Yanjiao Lu, Min Wang, Yingtong Hu, Zhenyu Chen, Sirui Liu, Yi Qin Gao

Pages: 1-13

Abstract: GRASP is a protein complex structure prediction model that integrates diverse forms of simulated and real-world experimental restraints, including those from crosslinking, covalent labeling, chemical shift perturbation and deep mutational scanning.

Tags: None

386. De novo discovery of conserved gene clusters in microbial genomes with Spacedust, Nature Methods (September 15, 2025)

Authors: Ruoshi Zhang, Milot Mirdita, Johannes Söding

Pages: 1-9

Abstract: Metagenomics has revolutionized environmental and human-associated microbiome studies. However, the limited fraction of proteins with known biological processes and molecular functions presents a major bottleneck. In prokaryotes and viruses, evolution favors keeping genes participating in the same biological processes colocalized as conserved gene clusters. Conversely, conservation of gene neighborhood indicates functional association. Here we present Spacedust, a tool for systematic, de novo discovery of conserved gene clusters. To find homologous protein matches, Spacedust uses fast and sensitive structure comparison with Foldseek. Partially conserved clusters are detected using novel clustering and order conservation P values. We demonstrate Spacedust’s sensitivity with an all-versus-all analysis of 1,308 bacterial genomes, identifying 72,843 conserved gene clusters containing 58% of the 4.2 million genes. It recovered 95% of antiviral defense system clusters annotated by the specialized tool PADLOC. Spacedust’s high sensitivity and speed will facilitate the annotation of large numbers of sequenced bacterial, archaeal and viral genomes.

Tags: Software, Genome informatics, Metagenomics

385. MIFA: Metadata, Incentives, Formats and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis, Nature Methods (September 15, 2025)

Authors: Teresa Zulueta-Coarasa, Florian Jug, Aastha Mathur, Josh Moore, Arrate Muñoz-Barrutia, Liviu Anita, Kolawole Babalola, Peter Bankhead, Perrine Gilloteaux, Nodar Gogoberidze, Martin L. Jones, Gerard J. Kleywegt, Paul Korir, Anna Kreshuk, Aybüke Küpcü Yoldaş, Luca Marconato, Kedar Narayan, Nils Norlin, Bugra Oezdemir, Jessica L. Riesterer, Craig Russell, Norman Rzepka, Ugis Sarkans, Beatriz Serrano-Solano, Christian Tischer, Virginie Uhlmann, Vladimír Ulman, Matthew Hartley

Pages: 1-8

Abstract: This Perspective discusses challenges associated with sharing annotated image datasets and offers specific guidance to improve the reuse of bioimages and annotations for AI applications.

Tags: Image processing, Data publication and archiving, Standards

384. Guided multi-agent AI invents highly accurate, uncertainty-aware transcriptomic aging clocks, Preprint (September 12, 2025)

Authors: Vinayak Agarwal, Orion Li, Christopher A. Petty, Timothy Kassis, Paul W.K.Rothemund, David A. Sinclair, Ashwin Gopinath

Abstract: Scientific discovery has long relied on human creativity, with computation limited to analysis. Here we report an AI-guided system, K-Dense, that accelerates hypothesis testing and delivers robust scientific discoveries. Trained on ARCHS4 (57,584 samples, 28 tissues, 1,039 cohorts, ages 1–114), the unified ensemble clock achieved R2 = 0.854 and MAE = 4.26 years—while uniquely providing calibrated confidence intervals. This self-aware design flags predictions made at transitional or extreme ages, where biological heterogeneity peaks, suggesting clinical utility for uncertainty itself. Development revealed stage-specific markers, including CDKN2A/p16 (senescence), AMPD3 (muscle wasting), MIR29B2CHG (progeroid traits), and SEPTIN3 (neurodegeneration resistance). Sliding-window analysis across 85 overlapping ranges uncovered wave-like shifts in gene importance, showing that transcriptomic aging signatures evolve continuously, not discretely. By transforming biological age assessment from static point estimates to calibrated predictions with explicit uncertainty, this approach establishes reliable and interpretable clocks. Beyond the clock itself, K-Dense demonstrates how guided AI can compress months of exploration into weeks, pointing toward a scalable framework for accelerated scientific discovery.

Tags: None

383. Flexynesis: A deep learning toolkit for bulk multi-omics data integration for precision oncology and beyond, Nature Communications (September 12, 2025)

Authors: Bora Uyar, Taras Savchyn, Amirhossein Naghsh Nilchi, Ahmet Sarigun, Ricardo Wurmus, Mohammed Maqsood Shaik, Björn Grüning, Vedran Franke, Altuna Akalin

Volume & Issue: Volume 16, Issue 1

Pages: 8261

Abstract: Accurate decision making in precision oncology depends on integration of multimodal molecular information, for which various deep learning methods have been developed. However, most deep learning-based bulk multi-omics integration methods lack transparency, modularity, deployability, and are limited to narrow tasks. To address these limitations, we introduce Flexynesis, which streamlines data processing, feature selection, hyperparameter tuning, and marker discovery. Users can choose from deep learning architectures or classical supervised machine learning methods with a standardized input interface for single/multi-task training and evaluation for regression, classification, and survival modeling. We showcase the tool’s capability across diverse use-cases in precision oncology. To maximize accessibility, Flexynesis is available on PyPi, Guix, Bioconda, and the Galaxy Server (https://usegalaxy.eu/). This toolset makes deep-learning based bulk multi-omics data integration in clinical/pre-clinical research more accessible to users with or without deep-learning experience. Flexynesis is available at https://github.com/BIMSBbioinfo/flexynesis.

Tags: Machine learning, Software, Cancer genomics, Data integration

382. Biophysics-based protein language models for protein engineering, Nature Methods (September 11, 2025)

Authors: Sam Gelman, Bryce Johnson, Chase R. Freschlin, Arnav Sharma, Sameer D’Costa, John Peters, Anthony Gitter, Philip A. Romero

Pages: 1-12

Abstract: Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose mutational effect transfer learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics. We fine-tune METL on experimental sequence–function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL’s ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.

Tags: Machine learning, Protein design

381. Towards agentic science for advancing scientific discovery, Nature Machine Intelligence (September 10, 2025)

Authors: Hongliang Xin, John R. Kitchin, Heather J. Kulik

Pages: 1-3

Abstract: Artificial intelligence is transforming scientific discovery through (semi-)autonomous agents capable of reasoning, planning, and interacting with digital and physical environments. This Comment explores the foundations and frontiers of agentic science, outlining its emerging directions, current limitations, and the pathways for responsible integration into scientific practice.

Tags: None

380. Molecular-dynamics-simulation-guided directed evolution of flavoenzymes for atroposelective desaturation, Nature Synthesis (September 10, 2025)

Authors: Hong-Ning Yin, Zhao Chen, Xiang Zhao, Zihan Liu, Feng Yu, Sanduo Zheng, Niu Huang, Zhen Liu

Pages: 1-9

Abstract: Molecular-dynamics-simulation-guided evolution of flavoenzymes produces efficient catalysts for non-C2-symmetric biaryl synthesis with excellent atroposelectivity, offering promise for natural product synthesis and pharmaceutical applications.

Tags: None

379. RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation, Preprint (September 10, 2025)

Authors: Zongzheng Zhang, Chenghao Yue, Haobo Xu, Minwen Liao, Xianglin Qi, Huan-ang Gao, Ziwei Wang, Hao Zhao

Abstract: Robotic chemists promise to both liberate human experts from repetitive tasks and accelerate scientific discovery, yet remain in their infancy. Chemical experiments involve long-horizon procedures over hazardous and deformable substances, where success requires not only task completion but also strict compliance with experimental norms. To address these challenges, we propose \textit{RoboChemist}, a dual-loop framework that integrates Vision-Language Models (VLMs) with Vision-Language-Action (VLA) models. Unlike prior VLM-based systems (e.g., VoxPoser, ReKep) that rely on depth perception and struggle with transparent labware, and existing VLA systems (e.g., RDT, pi0) that lack semantic-level feedback for complex tasks, our method leverages a VLM to serve as (1) a planner to decompose tasks into primitive actions, (2) a visual prompt generator to guide VLA models, and (3) a monitor to assess task success and regulatory compliance. Notably, we introduce a VLA interface that accepts image-based visual targets from the VLM, enabling precise, goal-conditioned control. Our system successfully executes both primitive actions and complete multi-step chemistry protocols. Results show 23.57% higher average success rate and a 0.298 average increase in compliance rate over state-of-the-art VLA baselines, while also demonstrating strong generalization to objects and tasks.

Tags: Computer Science - Robotics

378. AI mirrors experimental science to uncover a mechanism of gene transfer crucial to bacterial evolution, Cell (September 09, 2025)

Authors: José R. Penadés, Juraj Gottweis, Lingchen He, Jonasz B. Patkowski, Alexander Daryin, Wei-Hung Weng, Tao Tu, Anil Palepu, Artiom Myaskovsky, Annalisa Pawlosky, Vivek Natarajan, Alan Karthikesalingam, Tiago R. D. Costa

Volume & Issue: Volume 0, Issue 0

Abstract: No abstract available

Tags: large language models, AI-driven discovery, bacterial evolution, cf-PICIs, gene transfer, inter-species transfer, tail hijacking

377. SurFF: a foundation model for surface exposure and morphology across intermetallic crystals, Nature Computational Science (September 09, 2025)

Authors: Jun Yin, Honghao Chen, Jiangjie Qiu, Wentao Li, Peng He, Jiali Li, Iftekhar A. Karimi, Xiaocheng Lan, Tiefeng Wang, Xiaonan Wang

Pages: 1-11

Abstract: A foundation machine learning model, SurFF, enables DFT-accurate predictions of surface energies and morphologies in intermetallic catalysts, achieving over 105-fold acceleration for high-throughput materials screening.

Tags: None

376. High-throughput platforms for machine learning-guided lipid nanoparticle design, Nature Reviews Materials (September 08, 2025)

Authors: Andrew R. Hanna, David A. Issadore, Michael J. Mitchell

Pages: 1-15

Abstract: Discovering lipid nanoparticles for unmet clinical needs relies heavily on the screening of unique formulations incorporating distinct lipids and nucleic acid cargos. This Perspective highlights how automation and parallelization have accelerated the rate of lipid nanoparticle discovery and discusses how coupling these advances with machine learning enable the predictive design of new therapeutic candidates.

Tags: None

375. AI-driven protein design, Nature Reviews Bioengineering (September 08, 2025)

Authors: Huan Yee Koh, Yizhen Zheng, Madeleine Yang, Rohit Arora, Geoffrey I. Webb, Shirui Pan, Li Li, George M. Church

Pages: 1-23

Abstract: Artificial intelligence is revolutionizing protein design by enabling precise navigation of sequence space and accelerating the creation of functional proteins. In this Review, a practical roadmap guides integration of artificial intelligence tools into design workflows, highlighting structural prediction, generative models and case studies spanning engineering therapeutic proteins to designing new ones.

Tags: None

374. Accelerating protein engineering with fitness landscape modelling and reinforcement learning, Nature Machine Intelligence (September 08, 2025)

Authors: Haoran Sun, Liang He, Pan Deng, Guoqing Liu, Zhiyu Zhao, Yuliang Jiang, Chuan Cao, Fusong Ju, Lijun Wu, Haiguang Liu, Tao Qin, Tie-Yan Liu

Pages: 1-15

Abstract: μProtein, combining deep learning and reinforcement learning, is developed to design high-function proteins. This framework, trained only on single-mutation data, discovers multi-site β-lactamase mutants with up to 2,000× growth rates.

Tags: None

373. AI-Driven Defect Engineering for Advanced Thermoelectric Materials, Advanced Materials (September 04, 2025)

Authors: Chu-Liang Fu, Mouyang Cheng, Nguyen Tuan Hung, Eunbi Rha, Zhantao Chen, Ryotaro Okabe, Denisse Córdova Carrizales, Manasi Mandal, Yongqiang Cheng, Mingda Li

Volume & Issue: Volume 37, Issue 35

Pages: 2505642

Abstract: Thermoelectric materials offer a promising pathway to directly convert waste heat to electricity. However, achieving high performance remains challenging due to intrinsic trade-offs between electrical conductivity, the Seebeck coefficient, and thermal conductivity, which are further complicated by the presence of defects. This review explores how artificial intelligence (AI) and machine learning (ML) are transforming thermoelectric materials design. Advanced ML approaches including deep neural networks, graph-based models, and transformer architectures, integrated with high-throughput simulations and growing databases, effectively capture structure-property relationships in a complex multiscale defect space and overcome the “curse of dimensionality”. This review discusses AI-enhanced defect engineering strategies such as composition optimization, entropy and dislocation engineering, and grain boundary design, along with emerging inverse design techniques for generating materials with targeted properties. Finally, it outlines future opportunities in novel physics mechanisms and sustainability, highlighting the critical role of AI in accelerating the discovery of thermoelectric materials.

Tags: machine learning, artificial intelligence, defect engineering, thermoelectrics

372. Artificial Intelligence-Driven Approaches in Semiconductor Research, Advanced Materials (September 04, 2025)

Authors: Yiqiang Zheng, Hao Xu, Zhexin Li, Linlin Li, Yongchao Yu, Pengfei Jiang, Yanmeng Shi, Jing Zhang, Yuqing Huang, Qing Luo, Zheng Lou, Lili Wang

Volume & Issue: Volume 37, Issue 35

Pages: 2504378

Abstract: To address the persistent challenges of scaling and power consumption in integrated circuits and chips, recent research has focused on exploring novel semiconductor materials beyond silicon and designing new device architectures. The vastness of the material and parameter space poses significant challenges in terms of cost and efficiency for traditional experimental and computational methods. The rise of artificial intelligence (AI) offers a highly promising avenue for accelerating semiconductor technology development. AI-driven methods demonstrate significant advantages in analyzing and interpreting large datasets, potentially freeing researchers to focus on more creative endeavors. This review provides a detailed and timely overview of how AI-driven approaches are assisting researchers across the entire semiconductor research pipeline, encompassing materials discovery, semiconductor screening, synthesis, characterization, and device performance optimization, highlighting how their integration facilitates a holistic understanding of the entire processing-structure-property-performance (PSPP) relationship. Remain challenges related to dataset quality, model generalizability, and autonomous experimentation, as well as the under-application of AI to critical needs are discussed in the semiconductor field, such as wafer-scale growth of high-quality, single-crystal semiconductor thin films beyond silicon. Addressing these challenges requires collaborative efforts from researchers across various organizations and disciplines, and represents a key focus for future research.

Tags: high-throughput, machine learning, artificial intelligence, autonomous laboratory, semiconductors

371. Supervised learning in DNA neural networks, Nature (September 03, 2025)

Authors: Kevin M. Cherry, Lulu Qian

Pages: 1-9

Abstract: Learning enables biological organisms to begin life simple yet develop immensely diverse and complex behaviours. Understanding learning principles in engineered molecular systems could enable us to endow non-living physical systems with similar capabilities. Inspired by how the brain processes information, the principles of neural computation have been developed over the past 80 years1, forming the foundation of modern machine learning. More than four decades ago, connections between neural computation and physical systems were established2. More recently, synthetic molecular systems, including nucleic acid and protein circuits, have been investigated for their abilities to implement neural computation3–7. However, in these systems, learning of molecular parameters such as concentrations and reaction rates was performed in silico to generate desired input–output functions. Here we show that DNA molecules can be programmed to autonomously carry out supervised learning in vitro, with the system learning to perform pattern classification from molecular examples of inputs and desired responses. We demonstrate a DNA neural network trained to classify three different sets of 100-bit patterns, integrating training data directly into memories of molecular concentrations and using these memories to process subsequent test data. Our work suggests that molecular circuits can learn tasks more complex than simple adaptive behaviours. This opens the door to molecular machines capable of embedded learning and decision-making in a wide range of physical systems, from biomedicine to soft materials.

Tags: Synthetic biology, DNA computing

370. Machine learning in X-ray diffraction for materials discovery and characterization, Matter (September 03, 2025)

Authors: Connor Davel, Nazanin Bassiri-Gharb, Juan-Pablo Correa-Baena

Volume & Issue: Volume 8, Issue 9

Abstract: No abstract available

Tags: machine learning, autonomous materials characterization, combinatorial characterization, X-ray diffraction

369. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework, Nature Biomedical Engineering (September 02, 2025)

Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Jinbang Li, Fang Yan, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin, Xinrui Jiang, Chenglong Zhao, Danyi Li, Anjia Han, Zhenhui Li, Ronald Cheong Kin Chan, Jiguang Wang, Peng Fei, Kwang-Ting Cheng, Shaoting Zhang, Li Liang, Hao Chen

Pages: 1-20

Abstract: Generalizable Pathology Foundation Model (GPFM) consolidates expertise from a variety of existing models for use in a broad spectrum of computational pathology tasks.

Tags: None

368. PXDesign: Fast, Modular, and Accurate De Novo Design of Protein Binders, Preprint (September 02, 2025)

Authors: Milong Ren, Jinyuan Sun, Jiaqi Guan, Cong Liu, Chengyue Gong, Yuzhe Wang, Lan Wang, Qixu Cai, Xinshi Chen, Wenzhi Xiao, Protenix Team

Abstract: PXDesign achieves nanomolar binder hit rates of 20–73% across five of six diverse protein targets, surpassing prior methods such as AlphaProteo. This experimental success rate is enabled by advances in both binder generation and filtering. We develop both a diffusion-based generative model (PXDesign-d) and a hallucination-based approach (PXDesign-h), each showing strong in silico performance that outperforms existing models. Beyond generation, we systematically analyze confidence-based filtering and ranking strategies from multiple structure predictors, comparing their accuracy, efficiency, and complementarity on datasets spanning de novo binders and mutagenesis. Finally, we validate the full design process experimentally, achieving high hit rates and multiple nanomolar binders. To support future work and community use, we release a unified benchmarking framework at https://github.com/bytedance/PXDesignBench, provide public access to PXDesign via a webserver at https://protenix-server.com, and share all designed binder sequences at https://protenix.github.io/pxdesign. Download figureOpen in new tab

Tags: None

367. Developing machine learning for heterogeneous catalysis with experimental and computational data, Nature Reviews Chemistry (September 2025)

Authors: Carlota Bozal-Ginesta, Sergio Pablo-García, Changhyeok Choi, Albert Tarancón, Alán Aspuru-Guzik

Volume & Issue: Volume 9, Issue 9

Pages: 601-616

Abstract: Machine learning aids heterogeneous catalysis research by linking performance to physicochemical controllable properties. This Review discusses experimental and computational high-throughput and machine learning approaches, comparing them by modelling method, features, dataset size, accuracy and reaction type.

Tags: None

366. Protein evolution as a complex system, Nature Chemical Biology (September 2025)

Authors: Barnabas Gall, Sacha B. Pulsford, Dana S. Matthews, Matthew A. Spence, Joe A. Kaczmarski, John Chen, Mahakaran Sandhu, Eric A. Stone, James Nichols, Colin J. Jackson

Volume & Issue: Volume 21, Issue 9

Pages: 1293-1299

Abstract: Viewing protein evolution through the lens of complex systems theory may offer new insights into the principles driving biological adaptation. In this Comment we explore how characteristics such as sensitivity to initial conditions and self-organization are observed in protein evolution and discuss how machine learning can help model these dynamics to guide protein engineering.

Tags: None

365. Robot-assisted mapping of chemical reaction hyperspaces and networks, Nature (September 2025)

Authors: Yankai Jia, Rafał Frydrych, Yaroslav I. Sobolev, Wai-Shing Wong, Bibek Prajapati, Daniel Matuszczyk, Yasemin Bilgi, Louis Gadina, Juan Carlos Ahumada, Galymzhan Moldagulov, Namhun Kim, Eric S. Larsen, Maxence Deschamps, Yanqiu Jiang, Bartosz A. Grzybowski

Volume & Issue: Volume 645, Issue 8082

Pages: 922-931

Abstract: Despite decades of investigation, it remains unclear (and hard to predict1–4) how the outcomes of chemical reactions change over multidimensional ‘hyperspaces’ defined by reaction conditions5. Whereas human chemists can explore only a limited subset of these manifolds, automated platforms6–12 can generate thousands of reactions in parallel. Yet, purification and yield quantification remain bottlenecks, constrained by time-consuming and resource-intensive analytical techniques. As a result, our understanding of reaction hyperspaces remains fragmentary7,9,13–16. Are yield distributions smooth or corrugated? Do they conceal mechanistically new reactions? Can major products vary across different regions? Here, to address these questions, we developed a low-cost robotic platform using primarily optical detection to quantify yields of products and by-products at unprecedented throughput and minimal cost per condition. Scanning hyperspaces across thousands of conditions, we find and prove mathematically that, for continuous variables (concentrations, temperatures), individual yield distributions are generally slow-varying. At the same time, we uncover hyperspace regions of unexpected reactivity as well as switchovers between major products. Moreover, by systematically surveying substrate proportions, we reconstruct underlying reaction networks and expose hidden intermediates and products—even in reactions studied for well over a century. This hyperspace-scanning approach provides a versatile and scalable framework for reaction optimization and discovery. Crucially, it can help identify conditions under which complex mixtures can be driven cleanly towards different major products, thereby expanding synthetic diversity while reducing chemical input requirements.

Tags: Automation, Cheminformatics

364. Electron flow matching for generative reaction mechanism prediction, Nature (September 2025)

Authors: Joonyoung F. Joung, Mun Hong Fong, Nicholas Casetti, Jordan P. Liles, Ne S. Dassanayake, Connor W. Coley

Volume & Issue: Volume 645, Issue 8079

Pages: 115-123

Abstract: A new tool based on generative machine learning called FlowER uses flow matching to model reactions as the redistribution of electrons between reactants and products, enabling the enforcement of mass conservation in reaction prediction.

Tags: None

363. The Biodiversity Cell Atlas: mapping the tree of life at cellular resolution, Nature (September 2025)

Authors: Arnau Sebé-Pedrós, Amos Tanay, Mara K. N. Lawniczak, Detlev Arendt, Stein Aerts, John Archibald, Maria Ina Arnone, Mark Blaxter, Phillip Cleves, Susana M. Coelho, Mafalda Dias, Casey Dunn, Anamaria Elek, Jonathan Frazer, Toni Gabaldón, Jesse Gillis, Xavier Grau-Bové, Roderic Guigó, Oliver Hobert, Jaime Huerta-Cepas, Manuel Irimia, Allon Klein, Harris Lewin, Christopher J. Lowe, Heather Marlow, Jacob M. Musser, László G. Nagy, Sebastián R. Najle, Lior Pachter, Sadye Paez, Irene Papatheodorou, Michael J. Passalacqua, Nikolaus Rajewsky, Seung Y. Rhee, Thomas A. Richards, Tatjana Sauka-Spengler, Lauren M. Saunders, Eve Seuntjens, Jordi Solana, Yuyao Song, Ulrich Technau, Bo Wang

Volume & Issue: Volume 645, Issue 8082

Pages: 877-885

Abstract: The Biodiversity Cell Atlas aims to create comprehensive single-cell molecular atlases across the eukaryotic tree of life, which will be phylogenetically informed, rely on high-quality genomes and use shared standards to facilitate comparisons across species.

Tags: None

362. Generalizable descriptors for automatic titanium alloys design by learning from texts via large language model, Acta Materialia (September 01, 2025)

Authors: Ping Wang, Yuan Jiang, Weijie Liao, Rong Wang, Minjie Lai, Hongchao Kou, Xiubing Liang, Jinshan Li, Turab Lookman, Ruihao Yuan

Volume & Issue: Volume 296

Pages: 121275

Abstract: Descriptors are essential prerequisites for the success of machine learning-based materials design. However, the automatic construction of generalizable descriptors for various properties, together with their integration with design criteria, remains a long-standing challenge. Here, we overcome this obstacle by devising a framework that integrates large language model, domain theory constraints ([Mo]eq and d-electron theory), and multi-objective global optimization. This framework fuses structured (tabular composition/processing) and unstructured (texts) data to obtain enriched descriptors and facilitate materials design. Using titanium alloys as model case, we first automatically achieve descriptors that generalize well across a variety of properties for optimized prediction models. With which a design pipeline is proposed, several alloys with high promise for enhanced competing properties are identified from a vast chemical and processing space, and their reliability is validated via experimental synthesis. It is revealed that the enriched descriptors, learned from ∼ 50,000 texts without apparent physics, capture expertise related to phase stability, alloying rules for specific properties, and more. Our proposed approach can be applied to the design of other materials where descriptors are currently inadequate and structured data availability is limited.

Tags: Descriptors, Large language model, Natural language processing, Titanium alloy

361. General reactive element-based machine learning potentials for heterogeneous catalysis, Nature Catalysis (September 2025)

Authors: Changxi Yang, Chenyu Wu, Wenbo Xie, Daiqian Xie, P. Hu

Volume & Issue: Volume 8, Issue 9

Pages: 891-904

Abstract: Developing truly universal machine learning potentials for heterogeneous catalysis remains challenging. Here we introduce our element-based machine learning potential (EMLP), trained on a unique random exploration via imaginary chemicals optimization (REICO) sampling strategy. REICO samples diverse local atomic environments to build a representative dataset of atomic interactions, making the EMLP inherently general and reactive, capable of accurately predicting elementary reactions without explicit structural or reaction pathway inputs. We demonstrate the generality and reactivity of our approach by building a Ag-Pd-C-H-O EMLP targeting Pd–Ag catalysts interacting with C/H/O-containing species, achieving quantitative agreement with density functional theory even for complex scenarios such as surface reconstruction, coverage effects and solvent environments, cases for which existing foundation models typically fail. Our method paves the way to replace density functional theory calculations for large and intricate systems in heterogeneous catalysis, and offers a general framework that can readily be extended to other catalytic systems, and to broader fields such as materials science.

Tags: Computational chemistry, Heterogeneous catalysis

360. From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery, Preprint (August 30, 2025)

Authors: Tianshi Zheng, Zheye Deng, Hong Ting Tsang, Weiqi Wang, Jiaxin Bai, Zihao Wang, Yangqiu Song

Abstract: Large Language Models (LLMs) are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI collaboration. This survey systematically charts this burgeoning field, placing a central focus on the changing roles and escalating capabilities of LLMs in science. Through the lens of the scientific method, we introduce a foundational three-level taxonomy-Tool, Analyst, and Scientist-to delineate their escalating autonomy and evolving responsibilities within the research lifecycle. We further identify pivotal challenges and future research trajectories such as robotic automation, self-improvement, and ethical governance. Overall, this survey provides a conceptual architecture and strategic foresight to navigate and shape the future of AI-driven scientific discovery, fostering both rapid innovation and responsible advancement. Github Repository: https://github.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery.

Tags: Computer Science - Computation and Language

359. A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers, Preprint (August 28, 2025)

Authors: Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su, Junzhi Ning, Xinyao Liu, Ye Du, Changkai Ji, Cheng Tang, Huihui Xu, Ziyang Chen, Ziyan Huang, Jiyao Liu, Pengfei Jiang, Yizhou Wang, Chen Tang, Jianyu Wu, Yuchen Ren, Siyuan Yan, Zhonghua Wang, Zhongxing Xu, Shiyan Su, Shangquan Sun, Runkai Zhao, Zhisheng Zhang, Yu Liu, Fudi Wang, Yuanfeng Ji, Yanzhou Su, Hongming Shan, Chunmei Feng, Jiahao Xu, Jiangtao Yan, Wenhao Tang, Diping Song, Lihao Liu, Yanyan Huang, Lequan Yu, Bin Fu, Shujun Wang, Xiaomeng Li, Xiaowei Hu, Yun Gu, Ben Fei, Zhongying Deng, Benyou Wang, Yuewen Cao, Minjie Shen, Haodong Duan, Jie Xu, Yirong Chen, Fang Yan, Hongxia Hao, Jielan Li, Jiajun Du, Yanbo Wang, Imran Razzak, Chi Zhang, Lijun Wu, Conghui He, Zhaohui Lu, Jinhai Huang, Yihao Liu, Fenghua Ling, Yuqiang Li, Aoran Wang, Qihao Zheng, Nanqing Dong, Tianfan Fu, Dongzhan Zhou, Yan Lu, Wenlong Zhang, Jin Ye, Jianfei Cai, Wanli Ouyang, Yu Qiao, Zongyuan Ge, Shixiang Tang, Junjun He, Chunfeng Song, Lei Bai, Bowen Zhou

Abstract: Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge, emphasizing the multimodal, cross-scale, and domain-specific challenges that differentiate scientific corpora from general natural language processing datasets. We systematically review recent Sci-LLMs, from general-purpose foundations to specialized models across diverse scientific disciplines, alongside an extensive analysis of over 270 pre-/post-training datasets, showing why Sci-LLMs pose distinct demands – heterogeneous, multi-scale, uncertainty-laden corpora that require representations preserving domain invariance and enabling cross-modal reasoning. On evaluation, we examine over 190 benchmark datasets and trace a shift from static exams toward process- and discovery-oriented assessments with advanced evaluation protocols. These data-centric analyses highlight persistent issues in scientific data development and discuss emerging solutions involving semi-automated annotation pipelines and expert validation. Finally, we outline a paradigm shift toward closed-loop systems where autonomous agents based on Sci-LLMs actively experiment, validate, and contribute to a living, evolving knowledge base. Collectively, this work provides a roadmap for building trustworthy, continually evolving artificial intelligence (AI) systems that function as a true partner in accelerating scientific discovery.

Tags: Computer Science - Artificial Intelligence, Computer Science - Computation and Language

358. High-power lithium-ion battery characterization dataset for stochastic battery modeling, Scientific Data (August 28, 2025)

Authors: Muhammad Aadil Khan, Sai Thatipamula, Luigi Tresca, Le Xu, Amalie Trewartha, Simona Onori

Volume & Issue: Volume 12, Issue 1

Pages: 1506

Abstract: High-power lithium-ion battery (LIB) applications, such as electric racing cars and electric vertical take-off and landing (eVTOL) aircrafts, are growing rapidly. Degradation in LIBs such as lithium plating, particle cracking, and SEI breakdown is accelerated at high C-rate at different temperatures and depth-of-discharges (DOD); however, high-power cells are designed to better withstand these operating conditions as compared to high-energy cells. Despite this, publicly available datasets of high-power batteries are limited. In this work, we present a characterization dataset of 12 high-power NMC cells which includes capacity tests, high C-rate pulse tests, and impedance tests, all of which are conducted at temperature set points of 5 °C, 25 °C, and 40 °C. Additionally, the dataset captures cell-to-cell variations, enabling the development of stochastic battery models that account for parameter uncertainty and its impact on the cell terminal voltage.

Tags: Batteries, Energy efficiency

357. Graph attention networks decode conductive network mechanism and accelerate design of polymer nanocomposites, npj Computational Materials (August 28, 2025)

Authors: Tang Sui, Shaolong Liu, Bihui Cong, Xiaoke Xu, Dongjing Shan, Giuseppe Milano, Ying Zhao, Shuang Xu, Jiashun Mao

Volume & Issue: Volume 11, Issue 1

Pages: 280

Abstract: Conductive polymer nanocomposites have emerged as essential materials for wearable devices. In this study, we propose a novel approach that combines graph attention networks (GAT) with an improved global pooling strategy and incremental learning. We train the GAT model on homopolymer/carbon nanotube (CNT) nanocomposite data simulated by hybrid particle-field molecular dynamics (hPF-MD) method within the CNT concentration range of 1–8%. We further analyze the conductive network structure by integrating the resistor network approach with the GAT’s attention scores, revealing optimal connectivity at a 7% concentration. The comparative analysis of trained data and the reconstructed network, based on the attention scores, underscores the GAT model’s ability in learning network structural representations. This work not only validates the efficacy of the GAT model in property prediction and interpretable network structure analysis of polymer nanocomposites but also lays a cornerstone for the reverse engineering of polymer composites.

Tags: Electronic properties and materials, Nanoparticles, Carbon nanotubes and fullerenes, Graphene, Molecular self-assembly, Self-assembly

356. Deep generative models design mRNA sequences with enhanced translational capacity and stability, Science (August 28, 2025)

Authors: He Zhang, Hailong Liu, Yushan Xu, Haoran Huang, Yiming Liu, Jia Wang, Yan Qin, Haiyan Wang, Lili Ma, Zhiyuan Xun, Xuzhuang Hou, Timothy K. Lu, Jicong Cao

Volume & Issue: Volume 0, Issue 0

Pages: eadr8470

Abstract: Despite the success of mRNA COVID-19 vaccines, extending this modality to more diseases necessitates substantial enhancements. We present GEMORNA, a generative RNA model that utilizes Transformer architectures tailored for mRNA coding sequences (CDSs) and untranslated regions (UTRs), to design novel mRNAs with enhanced expression and stability. GEMORNA-designed full-length mRNAs exhibited up to a 41-fold increase in firefly luciferase expression compared to an optimized benchmark in vitro. GEMORNA-generated therapeutic mRNAs achieved up to a 15-fold enhancement in human erythropoietin (EPO) expression and substantially elicited antibody titers of COVID vaccine in mice. Additionally, GEMORNA’s versatility extends to circular RNA, substantially enhancing circular EPO expression and boosting anti-tumor cytotoxicity in CAR-T cells. These advancements highlight deep generative AI’s vast potential for mRNA therapeutics.

Tags: None

355. One-shot design of functional protein binders with BindCraft, Nature (August 27, 2025)

Authors: Martin Pacesa, Lennart Nickel, Christian Schellhaas, Joseph Schmidt, Ekaterina Pyatova, Lucas Kissling, Patrick Barendse, Jagrity Choudhury, Srajan Kapoor, Ana Alcaraz-Serna, Yehlin Cho, Kourosh H. Ghamary, Laura Vinué, Brahm J. Yachnin, Andrew M. Wollacott, Stephen Buckley, Adrie H. Westphal, Simon Lindhoud, Sandrine Georgeon, Casper A. Goverde, Georgios N. Hatzopoulos, Pierre Gönczy, Yannick D. Muller, Gerald Schwank, Daan C. Swarts, Alex J. Vecchio, Bernard L. Schneider, Sergey Ovchinnikov, Bruno E. Correia

Pages: 1-10

Abstract: Protein–protein interactions are at the core of all key biological processes. However, the complexity of the structural features that determine protein–protein interactions makes their design challenging. Here we present BindCraft, an open-source and automated pipeline for de novo protein binder design with experimental success rates of 10–100%. BindCraft leverages the weights of AlphaFold2 (ref. 1) to generate binders with nanomolar affinity without the need for high-throughput screening or experimental optimization, even in the absence of known binding sites. We successfully designed binders against a diverse set of challenging targets, including cell-surface receptors, common allergens, de novo designed proteins and multi-domain nucleases, such as CRISPR–Cas9. We showcase the functional and therapeutic potential of designed binders by reducing IgE binding to birch allergen in patient-derived samples, modulating Cas9 gene editing activity and reducing the cytotoxicity of a foodborne bacterial enterotoxin. Last, we use cell-surface-receptor-specific binders to redirect adeno-associated virus capsids for targeted gene delivery. This work represents a significant advancement towards a ‘one design-one binder’ approach in computational design, with immense potential in therapeutics, diagnostics and biotechnology.

Tags: Protein design, Biochemistry

354. Digital Twin for Chemical Science: a case study on water interactions on the Ag(111) surface, Nature Computational Science (August 27, 2025)

Authors: Jin Qian, Asmita Jana, Siddarth Menon, Andrew E. Bogdan, Rebecca Hamlyn, Johannes Mahl, Ethan J. Crumlin

Pages: 1-8

Abstract: Directly visualizing chemical trajectories offers insights into catalysis, gas-phase reactions and photoinduced dynamics. Tracking the transformation of chemical species is best achieved by coupling theory and experiment. Here we developed Digital Twin for Chemical Science (DTCS) v.01, which integrates theory, experiment and their bidirectional feedback loops into a unified platform for chemical characterization. DTCS addresses a core question: given a set of experimental conditions, what is the expected outcome and why? It consists of a forward solver that takes a chemical reaction network and predicts spectra under experimental conditions, and an inverse solver that infers kinetics from measured spectra. We applied DTCS to ambient-pressure X-ray photoelectron spectroscopy measurements of the Ag–H2O interface as an example. This approach enables real-time knowledge extraction and guides experiments until a stopping condition is met based on accuracy and degeneracy. As a step toward autonomous chemical characterization, DTCS provides mechanistic knowledge in a verified, standardized manner.

Tags: Chemistry, Materials science, Energy and society

353. Target-aware 3D molecular generation based on guided equivariant diffusion, Nature Communications (August 25, 2025)

Authors: Qiaoyu Hu, Changzhi Sun, Huan He, Jiazheng Xu, Danlin Liu, Wenqing Zhang, Sumeng Shi, Kai Zhang, Honglin Li

Volume & Issue: Volume 16, Issue 1

Pages: 7928

Abstract: Recent molecular generation models for structure-based drug design (SBDD) often produce unrealistic 3D molecules due to the neglect of structural feasibility and drug-like properties. In this paper, we introduce DiffGui, a target-conditioned E(3)-equivariant diffusion model that integrates bond diffusion and property guidance, to address the above challenges. The combination of atom diffusion and bond diffusion guarantees the concurrent generation of both atoms and bonds by explicitly modeling their interdependencies. Property guidance incorporates the binding affinity and drug-like properties of molecules into the training and sampling processes. Extensive experiments prove that DiffGui outperforms existing methods in generating molecules with high binding affinity, rational chemical structure, and desirable properties. Ablation studies confirm the importance of bond diffusion and property guidance modules. DiffGui demonstrates effectiveness in both de novo drug design and lead optimization, with validation through wet-lab experiments.

Tags: Machine learning, Computational chemistry, Computational models

352. A versatile multimodal learning framework bridging multiscale knowledge for material design, npj Computational Materials (August 25, 2025)

Authors: Yuhui Wu, Minmin Ding, Haonan He, Qijun Wu, Shaohua Jiang, Peng Zhang, Jian Ji

Volume & Issue: Volume 11, Issue 1

Pages: 276

Abstract: Artificial intelligence has achieved remarkable success in materials science, accelerating novel material design. However, real-world material systems exhibit multiscale complexity—spanning composition, processing, structure, and properties—posing significant challenges for modeling. While some approaches fuse multiscale features to improve prediction, important modalities such as microstructure are often missing due to high acquisition costs. Existing methods struggle with incomplete data and lack a framework to bridge multiscale material knowledge. To address this, we propose MatMCL, a structure-guided multimodal learning framework that jointly analyzes multiscale material information and enables robust property prediction with incomplete modalities. Using a self-constructed multimodal dataset of electrospun nanofibers, we demonstrate that MatMCL improves mechanical property prediction without structural information, generates microstructures from processing parameters, and enables cross-modal retrieval. We further extend it via multi-stage learning and apply it to nanofiber-reinforced composite design. MatMCL uncovers processing-structure-property relationships, suggesting its promise as a generalizable approach for AI-driven material design.

Tags: Theory and computation, Materials science

351. GraphVelo allows for accurate inference of multimodal velocities and molecular mechanisms for single cells, Nature Communications (August 22, 2025)

Authors: Yuhao Chen, Yan Zhang, Jiaqi Gan, Ke Ni, Ming Chen, Ivet Bahar, Jianhua Xing

Volume & Issue: Volume 16, Issue 1

Pages: 7831

Abstract: RNA velocities and generalizations emerge as powerful approaches for extracting time-resolved information from high-throughput snapshot single-cell data. Yet, several inherent limitations restrict applying the approaches to genes not suitable for RNA velocity inference due to complex transcriptional dynamics, low expression, or lacking splicing dynamics, or data of non-transcriptomic modality. Here, we present GraphVelo, a graph-based machine learning procedure that uses as input the RNA velocities inferred from existing methods and infers velocity vectors lying in the tangent space of the low-dimensional manifold formed by the single cell data. GraphVelo preserves vector magnitude and direction information during transformations across different data representations. Tests on synthetic and experimental single-cell data, including viral-host interactome, multi-omics, and spatial genomics datasets demonstrate that GraphVelo, together with downstream generalized dynamo analyses, extends RNA velocities to multi-modal data and reveals quantitative nonlinear regulation relations between genes, virus, and host cells, and different layers of gene regulation.

Tags: Machine learning, Computational biophysics, Data processing

350. Capturing short-range order in high-entropy alloys with machine learning potentials, npj Computational Materials (August 21, 2025)

Authors: Yifan Cao, Killian Sheriff, Rodrigo Freitas

Volume & Issue: Volume 11, Issue 1

Pages: 268

Abstract: Chemical short-range order (SRO) affects the distribution of elements throughout the solid-solution phase of metallic alloys, thereby modifying the background against which microstructural evolution occurs. Investigating such chemistry–microstructure relationships requires atomistic models that act at the appropriate length scales while capturing the intricacies of chemical bonds leading to SRO. Here, we consider various approaches for the construction of training data sets for machine learning potentials (MLPs) for CrCoNi and evaluate their performance in capturing SRO and its effects on materials quantities of relevance for mechanical properties, such as stacking-fault energy and phase stability. It is demonstrated that energy accuracy on test sets often does not correlate with accuracy in capturing material properties, which is fundamental in enabling large-scale atomistic simulations of metallic alloys with high physical fidelity. Based on this analysis, we systematically derive design principles for the rational construction of MLPs that capture SRO in the crystal and liquid phases of alloys.

Tags: Theory and computation, Structural materials

349. Machine learning-assisted Ru-N bond regulation for ammonia synthesis, Nature Communications (August 21, 2025)

Authors: Zichuang Li, Mingxin Zhang, Xiaozhi Su, Yangfan Lu, Jiang Li, Qing Zhang, Wenqian Li, Kailong Qian, Xiaojun Lu, Bo Dai, Hideo Hosono, Yanpeng Qi, Miao Xu, Renzhong Tai, Jie-Sheng Chen, Tian-Nan Ye

Volume & Issue: Volume 16, Issue 1

Pages: 7818

Abstract: Ruthenium-bearing intermetallics (Ru-IMCs) featured with well-defined structures and variable compositions offer new opportunities to develop ammonia synthesis catalysts under mild conditions. However, their complex phase nature and the numerous controlling parameters pose major challenges for catalyst design and exploration. Herein, we demonstrate that a combination of machine learning (ML) and model mining techniques can effectively address these challenges. Based on the combination techniques, we generate a two-dimensional activity volcano plot with adsorption energies of N2 and N, and identify the radius of atom coordinating to Ru as a key parameter. The well-designed Sc1/8Nd7/8Ru2 reaches as high as 8.18 mmol m−2 h−1 at 0.1 MPa and 400 °C. Density functional theory (DFT) calculations combined with N2-TPD and FT-IR studies reveal that Ru‒N interaction is controlled by the d-p orbital hybridization between Ru and N. These findings underscore the importance of ML towards material design on exploring IMCs for ammonia synthesis.

Tags: Chemical engineering, Heterogeneous catalysis, Catalyst synthesis

348. Revealing nanostructures in high-entropy alloys via machine-learning accelerated scalable Monte Carlo simulation, npj Computational Materials (August 20, 2025)

Authors: Xianglin Liu, Kai Yang, Yongxiang Liu, Fanli Zhou, Dengdong Fan, Zongrui Pei, Pengxiang Xu, Yonghong Tian

Volume & Issue: Volume 11, Issue 1

Pages: 267

Abstract: First-principles Monte Carlo (MC) simulations at finite temperatures are computationally prohibitive for large systems due to the high cost of quantum calculations and poor parallelizability of sequential Markov chains in MC algorithms. We introduce scalable Monte Carlo at eXtreme (SMC-X), a generalized checkerboard algorithm designed to accelerate MC simulation with arbitrary short-range interactions, including machine learning potentials, on modern accelerator hardware. The GPU implementation, SMC-GPU, harnesses massive parallelism to enable billion-atom simulations when combined with machine-learning surrogates of density functional theory (DFT). We apply SMC-GPU to explore nanostructure evolution in two high-entropy alloys, FeCoNiAlTi and MoNbTaW, revealing diverse morphologies including nanoparticles, 3D-connected NPs, and disorder-stabilized phases. We quantify their size, composition, and morphology, and simulate an atom-probe tomography (APT) specimen for direct comparison with experiments. Our results highlight the potential of large-scale, data-driven MC simulations in exploring nanostructure evolution in complex materials, opening new avenues for computationally guided alloy design.

Tags: Theory and computation, Materials science, Structural materials

347. Bayesian learning-assisted catalyst discovery for efficient iridium utilization in electrochemical water splitting, Science Advances (August 20, 2025)

Authors: Xiangfu Niu, Yanjun Chen, Mingze Sun, Satoshi Nagao, Yuki Aoki, Zhiqiang Niu, Liang Zhang

Volume & Issue: Volume 11, Issue 34

Pages: eadw0894

Abstract: Reducing noble metal dependence in oxygen evolution reaction (OER) catalysts is essential for achieving sustainable and scalable green hydrogen production. Bimetallic oxides, with their potential for high catalytic performance and reduced noble metal content, represent promising alternatives to traditional IrO2-based OER catalysts. However, optimizing these materials remains challenging due to the complex interplay of elemental selection, composition, and chemical ordering. In this study, we integrate density functional theory (DFT) calculations with Bayesian learning to accelerate the discovery of high-performance, low-Ir bimetallic oxides, identifying surface Ir-doped TiO2 as an optimal catalyst. Guided by theoretically optimized surface compositions and oxygen vacancies, we synthesized atomically dispersed Ir on TiO2, achieving a 23-fold increase in Ir mass-specific activity and a 115-millivolt reduction in overpotential compared to commercial IrO2. This work exemplifies a sustainable, data-driven pathway for electrocatalyst design that minimizes noble metal usage while maximizing efficiency, advancing scalable solutions in renewable energy and hydrogen production.

Tags: None

346. A unified pre-trained deep learning framework for cross-task reaction performance prediction and synthesis planning, Nature Machine Intelligence (August 19, 2025)

Authors: Li-Cheng Xu, Miao-Jiong Tang, Junyi An, Fenglei Cao, Yuan Qi

Pages: 1-11

Abstract: Xu et al. present RXNGraphormer, a pre-trained model that learns bond transformation patterns from over 13 million reactions, achieving state-of-the-art accuracy in reaction performance prediction and synthesis planning.

Tags: None

345. Boosting the predictive power of protein representations with a corpus of text annotations, Nature Machine Intelligence (August 18, 2025)

Authors: Haonan Duan, Marta Skreta, Leonardo Cotta, Ella Miray Rajaonson, Nikita Dhawan, Alán Aspuru-Guzik, Chris J. Maddison

Pages: 1-11

Abstract: Although protein language models have enabled major advances, they often rely on indirect signals that may not fully capture functional relevance. Fine-tuning these models on textual annotations is shown to improve their performance on function prediction tasks.

Tags: None

344. Steering towards safe self-driving laboratories, Nature Reviews Chemistry (August 18, 2025)

Authors: Shi Xuan Leong, Caleb E. Griesbach, Rui Zhang, Kourosh Darvish, Yuchi Zhao, Abhijoy Mandal, Yunheng Zou, Han Hao, Varinia Bernales, Alán Aspuru-Guzik

Pages: 1-16

Abstract: Self-driving laboratories promise accelerated discovery. As the scope of chemical processes and level of autonomy in these laboratories expand, a comprehensive safety framework is essential. We discuss here the safety development trajectory of SDLs, identifying opportunities for innovation to shape this rapidly evolving landscape.

Tags: None

343. An automated framework for exploring and learning potential-energy surfaces, Nature Communications (August 18, 2025)

Authors: Yuanbin Liu, Joe D. Morrow, Christina Ertural, Natascia L. Fragapane, John L. A. Gardner, Aakash A. Naik, Yuxing Zhou, Janine George, Volker L. Deringer

Volume & Issue: Volume 16, Issue 1

Pages: 7666

Abstract: Machine learning has become ubiquitous in materials modelling and now routinely enables large-scale atomistic simulations with quantum-mechanical accuracy. However, developing machine-learned interatomic potentials requires high-quality training data, and the manual generation and curation of such data can be a major bottleneck. Here, we introduce an automated framework for the exploration and fitting of potential-energy surfaces, implemented in an openly available software package that we call autoplex (‘automatic potential-landscape explorer’). We discuss design choices, particularly the interoperability with existing software architectures, and the ability for the end user to easily use the computational workflows provided. We show wide-ranging capability demonstrations: for the titanium–oxygen system, SiO2, crystalline and liquid water, as well as phase-change memory materials. More generally, our study illustrates how automation can speed up atomistic machine learning in computational materials science.

Tags: Computational methods, Atomistic models, Computational chemistry

342. Functional and epitope specific monoclonal antibody discovery directly from immune sera using cryo-EM, Science Advances (August 15, 2025)

Authors: James A. Ferguson, Sai Sundar Rajan Raghavan, Garazi Peña Alzua, Disha Bhavsar, Jiachen Huang, Alesandra J. Rodriguez, Jonathan L. Torres, Maria Bottermann, Julianna Han, Florian Krammer, Facundo D. Batista, Andrew B. Ward

Volume & Issue: Volume 11, Issue 33

Pages: eadv8257

Abstract: Antibodies are crucial therapeutics, comprising a substantial portion of approved drugs due to their safety and clinical efficacy. Traditional antibody discovery methods are labor-intensive, limiting scalability and high-throughput analysis. Here, we improved upon our streamlined approach combining structural analysis and bioinformatics to infer heavy and light chain sequences from cryo-EM (cryo–electron microscopy) maps of serum-derived polyclonal antibodies (pAbs) bound to antigens. Using ModelAngelo, an automated structure-building tool, we accelerated pAb sequence determination and identified sequence matches in B cell repertoires via ModelAngelo-derived hidden Markov models (HMMs) associated with pAb structures. Benchmarking against results from a nonhuman primate HIV vaccine trial, our pipeline reduced analysis time from weeks to under a day with higher precision. Validation with murine immune sera from influenza vaccination revealed multiple protective antibodies. This workflow enhances antibody discovery, enabling faster, more accurate mapping of polyclonal responses with broad applications in vaccine development and therapeutic antibody discovery.

Tags: None

341. Accelerated design of gold nanoparticles with enhanced plasmonic performance, Science Advances (August 15, 2025)

Authors: José Luis Montaño-Priede, Anish Rao, Ana Sánchez-Iglesias, Marek Grzelczak

Volume & Issue: Volume 11, Issue 33

Pages: eadx2299

Abstract: Finding the optimal dimensions of metal nanoparticles to maximize their plasmonic performance in targeted applications is a complex and time-consuming process that typically requires a trial-and-error approach. Here, we propose a universal pipeline that integrates Bayesian optimization with electrodynamics simulations to find dimensions of gold bipyramids with superior plasmonic performance in photothermal efficiency, enhancement of Raman scattering and photoluminescence, strong coupling between plasmon and exciton, and aggregation-induced color difference. Our workflow is a straightforward tool for plasmonic nanoparticle design, setting their optimal dimensions for targeted applications.

Tags: None

340. Chem3DLLM: 3D Multimodal Large Language Models for Chemistry, Preprint (August 14, 2025)

Authors: Lei Jiang, Shuzhou Sun, Biqing Qi, Yuchen Fu, Xiaohua Xu, Yuqiang Li, Dongzhan Zhou, Tianfan Fu

Abstract: In the real world, a molecule is a 3D geometric structure. Compared to 1D SMILES sequences and 2D molecular graphs, 3D molecules represent the most informative molecular modality. Despite the rapid progress of autoregressive-based language models, they cannot handle the generation of 3D molecular conformation due to several challenges: 1) 3D molecular structures are incompatible with LLMs’ discrete token space, 2) integrating heterogeneous inputs like proteins, ligands, and text remains difficult within a unified model, and 3) LLMs lack essential scientific priors, hindering the enforcement of physical and chemical constraints during generation. To tackle these issues, we present Chem3DLLM, a unified protein-conditioned multimodal large language model. Our approach designs a novel reversible text encoding for 3D molecular structures using run-length compression, achieving 3x size reduction while preserving complete structural information. This enables seamless integration of molecular geometry with protein pocket features in a single LLM architecture. We employ reinforcement learning with stability-based rewards to optimize chemical validity and incorporate a lightweight protein embedding projector for end-to-end training. Experimental results on structure-based drug design demonstrate state-of-the-art performance with a Vina score of -7.21, validating our unified multimodal approach for practical drug discovery applications.

Tags: Computer Science - Computational Engineering, Finance, and Science

339. Empowering Generalist Material Intelligence with Large Language Models, Advanced Materials (August 14, 2025)

Authors: Wenhao Yuan, Guangyao Chen, Zhilong Wang, Fengqi You

Volume & Issue: Volume 37, Issue 32

Pages: 2502771

Abstract: Large language models (LLMs) are steering the development of generalist materials intelligence (GMI), a unified framework integrating conceptual reasoning, computational modeling, and experimental validation. Central to this framework is the agent-in-the-loop paradigm, where LLM-based agents function as dynamic orchestrators, synthesizing multimodal knowledge, specialized models, and experimental robotics to enable fully autonomous discovery. Drawing from a comprehensive review of LLMs’ transformative impact across representative applications in materials science, including data extraction, property prediction, structure generation, synthesis planning, and self-driven labs, this study underscores how LLMs are revolutionizing traditional tasks, catalyzing the agent-in-the-loop paradigm, and bridging the ontology-concept-computation-experiment continuum. Then the unique challenges of scaling up LLM adoption are discussed, particularly those arising from the misalignment of foundation LLMs with materials-specific knowledge, emphasizing the need to enhance adaptability, efficiency, sustainability, interpretability, and trustworthiness in the pursuit of GMI. Nonetheless, it is important to recognize that LLMs are not universally efficient. Their substantial resource demands and inconsistent performance call for careful deployment based on demonstrated task suitability. To address these realities, actionable strategies and a progressive roadmap for equitably and democratically implementing materials-aware LLMs in real-world practices are proposed.

Tags: autonomous agents, generative artificial intelligence, large language models, material intelligence

338. SAGERank: inductive learning of protein–protein interaction from antibody–antigen recognition, Chemical Science (August 12, 2025)

Authors: Chuance Sun, Xiangyi Li, Honglin Xu, Yike Tang, Ganggang Bai, Yanjing Wang, Buyong Ma

Abstract: Predicting Antibody–Antigen (Ab–Ag) docking and structure-based design represent significant long-term and therapeutically important challenges in computational biology. We present SAGERank, a general, configurable deep learning framework for antibody design using Graph Sample and Aggregate Networks. SAGERank successfully predicted the majority of epitopes in a cancer target dataset. In nanobody–antigen structure prediction, SAGERank, coupled with a protein dynamics structure prediction algorithm, slightly outperforms Alphafold3. Most importantly, our study demonstrates the real potential of inductive deep learning to overcome the small dataset problem in molecular science. The SAGERank models trained for antibody–antigen docking can be used to examine general protein–protein interaction tasks, such as T Cell Receptor-peptide-Major Histocompatibility Complex (TCR-pMHC) recognition, classification of biological versus crystal interfaces, and prediction of ternary complexes of molecular glues. In the cases of ranking docking decoys and identifying biological interfaces, SAGERank is competitive with or outperforms state-of-the-art methods.

Tags: None

337. Probing the limitations of multimodal language models for chemistry and materials research, Nature Computational Science (August 11, 2025)

Authors: Nawaf Alampara, Mara Schilling-Wilhelmi, Martiño Ríos-García, Indrajeet Mandal, Pranav Khetarpal, Hargun Singh Grover, N. M. Anoop Krishnan, Kevin Maik Jablonka

Pages: 1-10

Abstract: Recent advancements in artificial intelligence have sparked interest in scientific assistants that could support researchers across the full spectrum of scientific workflows, from literature review to experimental design and data analysis. A key capability for such systems is the ability to process and reason about scientific information in both visual and textual forms—from interpreting spectroscopic data to understanding laboratory set-ups. Here we introduce MaCBench, a comprehensive benchmark for evaluating how vision language models handle real-world chemistry and materials science tasks across three core aspects: data extraction, experimental execution and results interpretation. Through a systematic evaluation of leading models, we find that although these systems show promising capabilities in basic perception tasks—achieving near-perfect performance in equipment identification and standardized data extraction—they exhibit fundamental limitations in spatial reasoning, cross-modal information synthesis and multi-step logical inference. Our insights have implications beyond chemistry and materials science, suggesting that developing reliable multimodal AI scientific assistants may require advances in curating suitable training data and approaches to training those models.

Tags: Chemistry, Materials science

336. Unconditional latent diffusion models memorize patient imaging data, Nature Biomedical Engineering (August 11, 2025)

Authors: Salman Ul Hassan Dar, Marvin Seyfarth, Isabelle Ayx, Theano Papavassiliu, Stefan O. Schoenberg, Robert Malte Siepmann, Fabian Christopher Laqua, Jannik Kahmann, Norbert Frey, Bettina Baeßler, Sebastian Foersch, Daniel Truhn, Jakob Nikolas Kather, Sandy Engelhardt

Pages: 1-15

Abstract: A comprehensive evaluation of memorization across datasets, including training samples and patient data copies, shows that latent diffusion models can memorize a diverse set of medical images with varying properties.

Tags: None

335. Observation of dendrite formation at Li metal-electrolyte interface by a machine-learning enhanced constant potential framework, Nature Communications (August 11, 2025)

Authors: Taiping Hu, Haichao Huang, Guobing Zhou, Xinyan Wang, Jiaxin Zhu, Zheng Cheng, Fangjia Fu, Xiaoxu Wang, Fuzhi Dai, Kuang Yu, Shenzhen Xu

Volume & Issue: Volume 16, Issue 1

Pages: 7379

Abstract: Uncontrollable dendrites growth during electrochemical cycles leads to low Coulombic efficiency and critical safety issues in Li metal batteries. Hence, a comprehensive understanding of the dendrite formation mechanism is essential for further enhancing the performance of Li metal batteries. Machine learning accelerated molecular dynamics simulations can provide atomic-scale resolution for various key processes at an ab-initio level accuracy. However, traditional molecular dynamics simulation tools hardly capture Li electrochemical depositions, due to lack of an electrochemical constant potential condition. In this work, we propose a constant potential approach that combines a machine learning force field with the charge equilibration method to reveal the dynamic process of dendrites nucleation at Li metal anode surfaces. Our simulations show that inhomogeneous Li depositions, following Li aggregations in amorphous inorganic components of solid electrolyte interphases, can initiate dendrites nucleation. Our study provides microscopic insights for Li dendrites formations in Li metal anodes. More importantly, we present an efficient and accurate simulation method for modeling realistic constant potential conditions, which holds considerable potential for broader applications in modeling complex electrochemical interfaces.

Tags: Computational methods, Batteries, Reaction kinetics and dynamics

334. OmniFluids: Physics Pre-trained Modeling of Fluid Dynamics, Preprint (August 09, 2025)

Authors: Rui Zhang, Qi Meng, Han Wan, Yang Liu, Zhi-Ming Ma, Hao Sun

Abstract: Computational fluid dynamics (CFD) drives progress in numerous scientific and engineering fields, yet high-fidelity simulations remain computationally prohibitive. While machine learning approaches offer computing acceleration, they typically specialize in single physical systems or require extensive training data, hindering their applicability in highly nonlinear and 3D flow scenarios. To overcome these limitations, we propose OmniFluids, a pure physics pre-trained model that captures fundamental fluid dynamics laws and adapts efficiently to diverse downstream tasks with minimal data. We develop a training framework combining physics-only pre-training, coarse-grid operator distillation, and few-shot fine-tuning. This enables OmniFluids to retain broad physics knowledge while delivering fast and accurate predictions. Architecturally, OmniFluids integrates a mixture of operators, a multi-frame decoder, and factorized Fourier layers, seamlessly incorporating physics-based supervision while allowing efficient and scalable modeling of diverse tasks. Extensive tests on a broad range of 2D and 3D benchmarks show that OmniFluids outperforms state-of-the-art AI-driven methods in terms of flow field prediction and turbulence statistics. It delivers 10–100$\times$ speedups over traditional solvers while maintaining a comparable accuracy and accurately identifies unknown physical parameters from sparse, noisy data. This work demonstrates the potential of training a unified CFD solver exclusively from physics knowledge, offering a new approach for efficient and generalizable modeling across complex fluid systems.

Tags: Computer Science - Machine Learning, Physics - Fluid Dynamics

333. Designing Pb-Free High-Entropy Relaxor Ferroelectrics with Machine Learning Assistance for High Energy Storage, Journal of the American Chemical Society (August 06, 2025)

Authors: Banghua Zhu, Xingcheng Wang, Ji Zhang, Huajie Luo, Laijun Liu, Joerg C. Neuefeind, Hui Liu, Jun Chen

Volume & Issue: Volume 147, Issue 31

Pages: 27912-27921

Abstract: High-entropy tactics present exceptional promise in advancing the dielectric energy storage of relaxor ferroelectrics, thereby benefiting various pulsed-power electronic systems. However, their vast composition space poses challenges in the rational design of a high-performance system. Herein, we present a machine learning-supplemented strategy to design high-entropy relaxors, demonstrating an ultrahigh energy-storage density of 17.2 J cm–3 and high efficiency of 87% at a high breakdown strength of 79 kV mm–1. By integrating six A-site and one B-site critical intrinsic features of constituent ions, deduced from a constructed random forest regression model, the (Bi2/5Na1/5K1/5Ba1/5)(Ti,Hf)O3 high-entropy system is identified. Atomic-level local structural analysis reveals that incorporating these certified cations, with diverse local polar and lattice construction characteristics, results in a highly fluctuating local polarization structure. This favorable structure is characterized by pronounced orientation disorder and a broadly distributed length of unit-cell polarization vectors within the expanded lattice framework. Macroscopically, the optimized relaxor displays high dielectric susceptibility and large resistance. Moreover, a large discharge energy density of 5.8 J cm–3 and power energy density of 447 MW cm–3, along with outstanding operational stability, are achieved. This study presents a data-driven model to explore complex intrinsic features and facilitate the design of high-performance relaxors.

Tags: None

332. PreMode predicts mode-of-action of missense variants by deep graph representation learning of protein sequence and structural context, Nature Communications (August 05, 2025)

Authors: Guojie Zhong, Yige Zhao, Demi Zhuang, Wendy K. Chung, Yufeng Shen

Volume & Issue: Volume 16, Issue 1

Pages: 7189

Abstract: Accurate prediction of the functional impact of missense variants is important for disease gene discovery, clinical genetic diagnostics, therapeutic strategies, and protein engineering. Previous efforts have focused on predicting a binary pathogenicity classification, but the functional impact of missense variants is multi-dimensional. Pathogenic missense variants in the same gene may act through different modes of action (i.e., gain/loss-of-function) by affecting different aspects of protein function. They may result in distinct clinical conditions that require different treatments. We develop a new method, PreMode, to perform gene-specific mode-of-action predictions. PreMode models effects of coding sequence variants using SE(3)-equivariant graph neural networks on protein sequences and structures. Using the largest-to-date set of missense variants with known modes of action, we show that PreMode reaches state-of-the-art performance in multiple types of mode-of-action predictions by efficient transfer-learning. Additionally, PreMode’s prediction of G/LoF variants in a kinase is consistent with inactive-active conformation transition energy changes. Finally, we show that PreMode enables efficient study design of deep mutational scans and can be expanded to fitness optimization of non-human proteins with active learning.

Tags: Computational models, Disease genetics

331. Quantifying large language model usage in scientific papers, Nature Human Behaviour (August 04, 2025)

Authors: Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D. Manning, James Zou

Pages: 1-11

Abstract: Liang et al. estimate the prevalence of text modified by large language models in recent scientific papers and preprints, finding widespread use (up to 17.5% of papers in computer science).

Tags: None

330. Modeling protein conformational ensembles by guiding AlphaFold2 with Double Electron Electron Resonance (DEER) distance distributions, Nature Communications (August 02, 2025)

Authors: Tianqi Wu, Richard A. Stein, Te-Yu Kao, Benjamin Brown, Hassane S. Mchaourab

Volume & Issue: Volume 16, Issue 1

Pages: 7107

Abstract: We describe a modified version of AlphaFold2 that incorporates experimental distance distributions into the network architecture for protein structure prediction. Harnessing the OpenFold platform, we fine-tune AlphaFold2 on structurally dissimilar proteins to explicitly model distance distributions between spin labels determined from Double Electron-Electron Resonance (DEER) spectroscopy. We benchmark the performance of the modified AlphaFold2, refer to as DEERFold, in switching the predicted conformations of a set of membrane transporters using experimental DEER distance distributions. Guided by sparse sets of simulated distance distributions, we showcase the generality of DEERFold in predicting conformational ensembles on a large benchmark set of water soluble and membrane proteins. We find that the intrinsic performance of AlphaFold2 substantially reduces the number of required distributions and the accuracy of their widths needed to drive conformational selection thereby increasing the experimental throughput. The blueprint of DEERFold can be generalized to other experimental methods where distance constraints can be represented by distributions.

Tags: Machine learning, Protein structure predictions, Structural biology, Biochemistry, Biophysics

329. Navigating protein landscapes with a machine-learned transferable coarse-grained model, Nature Chemistry (August 2025)

Authors: Nicholas E. Charron, Klara Bonneau, Aldo S. Pasos-Trejo, Andrea Guljas, Yaoyi Chen, Félix Musil, Jacopo Venturin, Daria Gusew, Iryna Zaporozhets, Andreas Krämer, Clark Templeton, Atharva Kelkar, Aleksander E. P. Durumeric, Simon Olsson, Adrià Pérez, Maciej Majewski, Brooke E. Husic, Ankit Patel, Gianni De Fabritiis, Frank Noé, Cecilia Clementi

Volume & Issue: Volume 17, Issue 8

Pages: 1284-1292

Abstract: The most popular and universally predictive protein simulation models employ all-atom molecular dynamics, but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep-learning methods with a large and diverse training set of all-atom protein simulations, we here develop a bottom–up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences not used during model parameterization. We demonstrate that the model successfully predicts metastable states of folded, unfolded and intermediate structures, the fluctuations of intrinsically disordered proteins and relative folding free energies of protein mutants, while being several orders of magnitude faster than an all-atom model. This showcases the feasibility of a universal and computationally efficient machine-learned CG model for proteins.

Tags: Computational chemistry, Computational biophysics

328. Data-driven de novo design of super-adhesive hydrogels, Nature (August 2025)

Authors: Hongguang Liao, Sheng Hu, Hu Yang, Lei Wang, Shinya Tanaka, Ichigaku Takigawa, Wei Li, Hailong Fan, Jian Ping Gong

Volume & Issue: Volume 644, Issue 8075

Pages: 89-95

Abstract: Data-driven methodologies have transformed the discovery and prediction of hard materials with well-defined atomic structures by leveraging standardized datasets, enabling accurate property predictions and facilitating efficient exploration of design spaces1–3. However, their application to soft materials remains challenging because of complex, multiscale structure–property relationships4–6. Here we present a data-driven approach that integrates data mining, experimentation and machine learning to design high-performance adhesive hydrogels from scratch, tailored for demanding underwater environments. By leveraging protein databases, we developed a descriptor strategy to statistically replicate protein sequence patterns in polymer strands by ideal random copolymerization, enabling targeted hydrogel design and dataset construction. Using machine learning, we optimized hydrogel formulations from an initial dataset of 180 bioinspired hydrogels, achieving remarkable improvements in adhesive strength, with a maximum value exceeding 1 MPa. These super-adhesive hydrogels hold immense potential across diverse applications, from biomedical engineering to deep-sea exploration, marking a notable advancement in data-driven innovation for soft materials.

Tags: Cheminformatics, Gels and hydrogels

327. Electron-density-informed effective and reliable de novo molecular design and optimization with ED2Mol, Nature Machine Intelligence (August 2025)

Authors: Mingyu Li, Kun Song, Jixiao He, Mingzhu Zhao, Gengshu You, Jie Zhong, Mengxi Zhao, Arong Li, Yu Chen, Guobin Li, Ying Kong, Jiacheng Wei, Zhaofu Wang, Jiamin Zhou, Hongbing Yang, Shichao Ma, Hailong Zhang, Irakoze Loïca Mélita, Weidong Lin, Yuhang Lu, Zhengtian Yu, Xun Lu, Yujun Zhao, Jian Zhang

Volume & Issue: Volume 7, Issue 8

Pages: 1355-1368

Abstract: A deep generative model is developed for de novo molecular design and optimization by leveraging electron density. Wet-laboratory assays validated its reliability to generate diverse bioactive molecules—orthosteric and allosteric, inhibitors and activators.

Tags: None

326. Kolmogorov–Arnold graph neural networks for molecular property prediction, Nature Machine Intelligence (August 2025)

Authors: Longlong Li, Yipeng Zhang, Guanghui Wang, Kelin Xia

Volume & Issue: Volume 7, Issue 8

Pages: 1346-1354

Abstract: Graph neural networks (GNNs) have shown remarkable success in molecular property prediction as key models in geometric deep learning. Meanwhile, Kolmogorov–Arnold networks (KANs) have emerged as powerful alternatives to multi-layer perceptrons, offering improved expressivity, parameter efficiency and interpretability. To combine the strengths of both frameworks, we propose Kolmogorov–Arnold GNNs (KA-GNNs), which integrate KAN modules into the three fundamental components of GNNs: node embedding, message passing and readout. We further introduce Fourier-series-based univariate functions within KAN to enhance function approximation and provide theoretical analysis to support their expressiveness. Two architectural variants, KA-graph convolutional networks and KA-augmented graph attention networks, are developed and evaluated across seven molecular benchmarks. Experimental results show that KA-GNNs consistently outperform conventional GNNs in terms of both prediction accuracy and computational efficiency. Moreover, our models exhibit improved interpretability by highlighting chemically meaningful substructures. These findings demonstrate that KA-GNNs offer a powerful and generalizable framework for molecular data modelling, drug discovery and beyond.

Tags: Computational models, Applied mathematics

325. An actor–critic algorithm to maximize the power delivered from direct methanol fuel cells, Nature Energy (August 2025)

Authors: Hongbin Xu, Yang Jeong Park, Zhichu Ren, Daniel J. Zheng, Davide Menga, Haojun Jia, Chenru Duan, Guanzhou Zhu, Yuriy Román-Leshkov, Yang Shao-Horn, Ju Li

Volume & Issue: Volume 10, Issue 8

Pages: 951-961

Abstract: Direct methanol fuel cells offer high energy densities but face challenges including catalyst degradation and surface fouling, which reduce performance over time. Here the authors introduce a control system inspired by reinforcement learning to optimize the power output and mitigate degradation of direct methanol fuel cells by dynamically adjusting the voltage.

Tags: None

324. GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis, Preprint (July 31, 2025)

Authors: Haoyang Liu, Yijiang Li, Haohan Wang

Abstract: Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

Tags: Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multiagent Systems, Quantitative Biology - Genomics

323. Deep learning for property prediction of natural fiber polymer composites, Scientific Reports (July 30, 2025)

Authors: Ivan P. Malashin, Dmitry Martysyuk, Vladimir Nelyub, Aleksei Borodulin, Andrei Gantimurov, Vadim Tynchenko

Volume & Issue: Volume 15, Issue 1

Pages: 27837

Abstract: The increasing availability of diverse experimental and computational data has accelerated the application of deep learning (DL) techniques for predicting polymer properties. A literature review was conducted to show recent advances in DL applied to this field. For example, Li et al. (2023) achieved an $R^2>0.96$for predicting stiffness tensors of carbon fiber composites using a hybrid CNN–MLP model trained on microstructure images and two-point statistics. Aligning with this approach, Xue et al. (2023) compared DNN performance with genetic programming and minimax probability machine regression in predicting the lateral confinement coefficient for CFRP-wrapped RC columns, showing competitive predictive capability. These studies demonstrate that specialized architectures, including hybrid CNN–MLP models, feedforward ANNs, graph convolutional networks, and DNNs, provide high accuracy in predicting mechanical, thermal, and chemical properties of polymer composites and biodegradable plastics. Among these, DNNs have consistently shown superior performance in capturing complex nonlinear relationships within heterogeneous datasets, highlighting their suitability for materials characterization and optimization tasks. Building on these insights, this study investigates the effects of four natural fibers (flax, cotton, sisal, hemp) with densities around 1.48–1.54 g/cm$^3$, incorporated at 30 wt.% into three polymer matrices (PLA, PP, epoxy resin) with varying surface treatments (untreated, alkaline, silane). Samples were prepared via extrusion and injection molding (or casting for epoxy) under controlled processing conditions. Mechanical properties (tensile strength, modulus, elongation at break, impact toughness) were measured per ASTM standards, and density was determined by Archimedes’ method. Using 180 experimental samples, augmented up to 1500 using bootstrap technique, several regression models–linear, random forest, gradient boosting, DNNs–were developed to predict mechanical behavior. Best DNN model architecture (four hidden layers (128–64–32–16 neurons), ReLU activation, 20% dropout, a batch size of 64, and the AdamW optimizer with a learning rate of $10^{-3}$) obtained through hyperparameter optimization using Optuna, delivered the best performance (R$^2$up to 0.89) and MAE reductions of 9–12% compared to gradient boosting, driven by the DNN’s ability to capture nonlinear synergies between fiber-matrix interactions, surface treatments, and processing parameters while aligning architectural complexity with multiscale material behavior.

Tags: Mechanical engineering, Mechanical properties, Scientific data

322. Accelerating primer design for amplicon sequencing using large language model-powered agents, Nature Biomedical Engineering (July 30, 2025)

Authors: Yi Wang, Yuejie Hou, Lin Yang, Shisen Li, Weiting Tang, Hui Tang, Qiushun He, Siyuan Lin, Yanyan Zhang, Xingyu Li, Shiwen Chen, Yusheng Huang, Lingsong Kong, Huijun Zhang, Duncan Yu, Feng Mu, Huanming Yang, Jian Wang, Nattiya Hirankarn, Meng Yang

Pages: 1-16

Abstract: The PrimeGen framework is a multi-agent large language model system used to navigate intricate primer design workflows for amplicon sequencing.

Tags: None

321. The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies, Nature (July 29, 2025)

Authors: Kyle Swanson, Wesley Wu, Nash L. Bulaong, John E. Pak, James Zou

Pages: 1-8

Abstract: Human collaboration with a team of artificial intelligence (AI) agents powered by large language models was used to efficiently design a complex interdisciplinary research project leading to the design of novel nanobodies against SARS-CoV-2 spike protein.

Tags: None

320. Geographic-style maps with a local novelty distance help navigate in the materials space, Scientific Reports (July 29, 2025)

Authors: Daniel Widdowson, Vitaliy Kurlin

Volume & Issue: Volume 15, Issue 1

Pages: 27588

Abstract: With the advent of self-driving labs promising to synthesize large numbers of new materials, new automated tools are required for checking potential duplicates in existing structural databases before a material can be claimed as novel. To avoid duplication, we rigorously define the novelty metric of any periodic material as the smallest distance to its nearest neighbor among already known materials. Using ultra-fast structural invariants, all such nearest neighbors can be found within seconds on a typical computer even if a given crystal is disguised by changing a unit cell, perturbing atoms, or replacing chemical elements. This real-time novelty check is demonstrated by finding near-duplicates of the 43 materials produced by Berkeley’s A-lab in the world’s largest collections of inorganic structures, the Inorganic Crystal Structure Database and the Materials Project. To help future self-driving labs successfully identify novel materials, we propose navigation maps of the materials space where any new structure can be quickly located by its invariant descriptors similar to a geographic location on Earth.

Tags: Structure of solids and liquids, Mathematics and computing

319. Biomimetic Intelligent Thermal Management Materials: From Nature-Inspired Design to Machine-Learning-Driven Discovery, Advanced Materials (July 29, 2025)

Authors: Heng Zhang, Qingxia He, Fei Zhang, Yanshuai Duan, Mengmeng Qin, Wei Feng

Volume & Issue: Volume 37, Issue 30

Pages: 2503140

Abstract: The development of biomimetic intelligent thermal management materials (BITMs) is essential for tackling thermal management challenges in electronics and aerospace applications. These materials possess not only exceptional thermal conductivity but also environmental compatibility. However, developing such materials necessitates overcoming intricate challenges, such as precise control over the material structure and optimization of the material’s properties and stability. This review comprehensively overviews the research progress of BITMs, emphasizing the synergy between biomimetic design principles and artificial-intelligence-driven methodologies to enhance their performance. The unique nature-inspired structures are explored and valuable insights are provided into adaptive thermal management strategies, which can be further enhanced through data analytics and machine learning (ML). This review offers insights into overcoming design challenges and outlines future prospects for advanced BITMs by integrating ML and biomimetic design principles.

Tags: machine learning, thermal conductivity, artificial intelligence, biomimetic, thermal management materials

318. Atomistic Generative Diffusion for Materials Modeling, Preprint (July 24, 2025)

Authors: Nikolaj Rønne, Bjørk Hammer

Abstract: We present a generative modeling framework for atomistic systems that combines score-based diffusion for atomic positions with a novel continuous-time discrete diffusion process for atomic types. This approach enables flexible and physically grounded generation of atomic structures across chemical and structural domains. Applied to metallic clusters and two-dimensional materials using the QCD and C2DB datasets, our models achieve strong performance in fidelity and diversity, evaluated using precision-recall metrics against synthetic baselines. We demonstrate atomic type interpolation for generating bimetallic clusters beyond the training distribution, and use classifier-free guidance to steer sampling toward specific crystallographic symmetries in two-dimensional materials. These capabilities are implemented in Atomistic Generative Diffusion (AGeDi), an open-source, extensible software package for atomistic generative diffusion modeling.

Tags: Physics - Computational Physics

317. Decoding nature’s grammar with DNA language models, Proceedings of the National Academy of Sciences (July 22, 2025)

Authors: Peter L. Morrell, Serguei V. Pakhomov

Volume & Issue: Volume 122, Issue 29

Pages: e2512889122

Abstract: No abstract available

Tags: None

316. Uni-Electrolyte: An Artificial Intelligence Platform for Designing Electrolyte Molecules for Rechargeable Batteries, Angewandte Chemie (July 21, 2025)

Authors: Xiang Chen, Mingkang Liu, Shiqiu Yin, Yu-Chen Gao, Nan Yao, Qiang Zhang

Volume & Issue: Volume 137, Issue 30

Pages: e202503105

Abstract: Electrolytes are an essential part of rechargeable batteries, such as lithium batteries. However, electrolyte innovation is facing grand challenges due to the complicated solution chemistry and infinite molecular space (>1060 for small molecules). This work reported an artificial intelligence (AI) platform, namely Uni-Electrolyte, for designing advanced electrolyte molecules, which mainly includes three parts, i.e., EMolCurator, EMolForger, and EMolNetKnittor. New molecules can be designed by combining high-throughput screening and generative AI models from more than 100 million alternative molecules in the EMolCurator module. The molecular properties, including frontier molecular orbital information, formation energy, binding energy with a Li ion, viscosity, and dielectric constant, can be adopted as the screening parameters. The EMolForger and EMolNetKnittor modules can predict the retrosynthesis pathway and solid electrolyte interphase (SEI) formation mechanism for a given molecule, respectively. With the assistance of advanced AI methods, the Uni-Electrolyte is strongly supposed to discover new electrolyte molecules and chemical principles, promoting the practical application of next-generation rechargeable batteries.

Tags: Machine learning, Artificial intelligence, Electrolyte molecules, High-throughput design, Rechargeable batteries

315. AutoMAT: A Hierarchical Framework for Autonomous Alloy Discovery, Preprint (July 21, 2025)

Authors: Penghui Yang, Chendong Zhao, Bijun Tang, Zhonghan Zhang, Xinrun Wang, Yanchen Deng, Yuhao Lu, Cuntai Guan, Zheng Liu, Bo An

Abstract: Alloy discovery is central to advancing modern industry but remains hindered by the vastness of compositional design space and the costly validation. Here, we present AutoMAT, a hierarchical and autonomous framework grounded in and validated by experiments, which integrates large language models, automated CALPHAD-based simulations, and AI-driven search to accelerate alloy design. Spanning the entire pipeline from ideation to validation, AutoMAT achieves high efficiency, accuracy, and interpretability without the need for manually curated large datasets. In a case study targeting a lightweight, high-strength alloy, AutoMAT identifies a titanium alloy with 8.1% lower density and comparable yield strength relative to the state-of-the-art reference, achieving the highest specific strength among all comparisons. In a second case targeting high-yield-strength high-entropy alloys, AutoMAT achieves a 28.2% improvement in yield strength over the base alloy. In both cases, AutoMAT reduces the discovery timeline from years to weeks, illustrating its potential as a scalable and versatile platform for next-generation alloy design.

Tags: Condensed Matter - Materials Science, Computer Science - Artificial Intelligence, Computer Science - Machine Learning

314. DiffuMeta: Algebraic Language Models for Inverse Design of Metamaterials via Diffusion Transformers, Preprint (July 21, 2025)

Authors: Li Zheng, Siddhant Kumar, Dennis M. Kochmann

Abstract: Generative machine learning models have revolutionized material discovery by capturing complex structure-property relationships, yet extending these approaches to the inverse design of three-dimensional metamaterials remains limited by computational complexity and underexplored design spaces due to the lack of expressive representations. Here, we present DiffuMeta, a generative framework integrating diffusion transformers with a novel algebraic language representation, encoding 3D geometries as mathematical sentences. This compact, unified parameterization spans diverse topologies while enabling direct application of transformers to structural design. DiffuMeta leverages diffusion models to generate novel shell structures with precisely targeted stress-strain responses under large deformations, accounting for buckling and contact while addressing the inherent one-to-many mapping by producing diverse solutions. Uniquely, our approach enables simultaneous control over multiple mechanical objectives, including linear and nonlinear responses beyond training domains. Experimental validation of fabricated structures further confirms the efficacy of our approach for accelerated design of metamaterials and structures with tailored properties.

Tags: Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science

313. Accurate prediction of synthesizability and precursors of 3D crystal structures via large language models, Nature Communications (July 15, 2025)

Authors: Zhilong Song, Shuaihua Lu, Minggang Ju, Qionghua Zhou, Jinlan Wang

Volume & Issue: Volume 16, Issue 1

Pages: 6530

Abstract: Accessing the synthesizability of crystal structures is crucial for transforming theoretical materials into real-world applications. Nevertheless, there is a significant gap between actual synthesizability and thermodynamic or kinetic stability commonly used to screen synthesizable structures. Herein, we develop the Crystal Synthesis Large Language Models (CSLLM) framework, which utilizes three specialized LLMs to predict the synthesizability of arbitrary 3D crystal structures, possible synthetic methods, and suitable precursors, respectively. We construct a comprehensive dataset including synthesizable/non-synthesizable crystal structures and develop an efficient text representation for crystal structures to fine-tune LLMs. Our Synthesizability LLM achieves state-of-the-art accuracy (98.6%), significantly outperforming traditional synthesizability screening based on thermodynamic and kinetic stability. Its outstanding generalization ability is further demonstrated in experimental structures with complexity considerably exceeding that of the training data. Furthermore, both the Method and Precursor LLMs exceed 90% accuracy in classifying possible synthetic methods and identifying solid-state synthetic precursors for common binary and ternary compounds, respectively. Leveraging CSLLM, tens of thousands of synthesizable theoretical structures are successfully identified, with their 23 key properties predicted using accurate graph neural network models.

Tags: Computational methods, Method development

312. Generative AI enables medical image segmentation in ultra low-data regimes, Nature Communications (July 14, 2025)

Authors: Li Zhang, Basu Jindal, Ahmed Alaa, Robert Weinreb, David Wilson, Eran Segal, James Zou, Pengtao Xie

Volume & Issue: Volume 16, Issue 1

Pages: 6486

Abstract: Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning automates this task effectively, it struggles in ultra low-data regimes for the scarcity of annotated segmentation masks. To address this, we propose a generative deep learning framework that produces high-quality image-mask pairs as auxiliary training data. Unlike traditional generative models that separate data generation from model training, ours uses multi-level optimization for end-to-end data generation. This allows segmentation performance to guide the generation process, producing data tailored to improve segmentation outcomes. Our method demonstrates strong generalization across 11 medical image segmentation tasks and 19 datasets, covering various diseases, organs, and modalities. It improves performance by 10–20% (absolute) in both same- and out-of-domain settings and requires 8–20 times less training data than existing approaches. This greatly enhances the feasibility and cost-effectiveness of deep learning in data-limited medical imaging scenarios.

Tags: Machine learning, Software, Medical imaging, Cancer imaging

311. La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching, Preprint (July 13, 2025)

Authors: Tomas Geffner, Kieran Didi, Zhonglin Cao, Danny Reidenbach, Zuobai Zhang, Christian Dallago, Emine Kucukbenli, Karsten Kreis, Arash Vahdat

Abstract: Recently, many generative models for de novo protein structure design have emerged. Yet, only few tackle the difficult task of directly generating fully atomistic structures jointly with the underlying amino acid sequence. This is challenging, for instance, because the model must reason over side chains that change in length during generation. We introduce La-Proteina for atomistic protein design based on a novel partially latent protein representation: coarse backbone structure is modeled explicitly, while sequence and atomistic details are captured via per-residue latent variables of fixed dimensionality, thereby effectively side-stepping challenges of explicit side-chain representations. Flow matching in this partially latent space then models the joint distribution over sequences and full-atom structures. La-Proteina achieves state-of-the-art performance on multiple generation benchmarks, including all-atom co-designability, diversity, and structural validity, as confirmed through detailed structural analyses and evaluations. Notably, La-Proteina also surpasses previous models in atomistic motif scaffolding performance, unlocking critical atomistic structure-conditioned protein design tasks. Moreover, La-Proteina is able to generate co-designable proteins of up to 800 residues, a regime where most baselines collapse and fail to produce valid samples, demonstrating La-Proteina’s scalability and robustness.

Tags: Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods

310. AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model, Preprint (July 11, 2025)

Authors: Žiga Avsec, Natasha Latysheva, Jun Cheng, Guido Novati, Kyle R. Taylor, Tom Ward, Clare Bycroft, Lauren Nicolaisen, Eirini Arvaniti, Joshua Pan, Raina Thomas, Vincent Dutordoir, Matteo Perino, Soham De, Alexander Karollus, Adam Gayoso, Toby Sargeant, Anne Mottram, Lai Hong Wong, Pavol Drotár, Adam Kosiorek, Andrew Senior, Richard Tanburn, Taylor Applebaum, Souradeep Basu, Demis Hassabis, Pushmeet Kohli

Abstract: Deep learning models that predict functional genomic measurements from DNA sequence are powerful tools for deciphering the genetic regulatory code. Existing methods trade off between input sequence length and prediction resolution, thereby limiting their modality scope and performance. We present AlphaGenome, which takes as input 1 megabase of DNA sequence and predicts thousands of functional genomic tracks up to single base pair resolution across diverse modalities – including gene expression, transcription initiation, chromatin accessibility, histone modifications, transcription factor binding, chro- matin contact maps, splice site usage, and splice junction coordinates and strength. Trained on human and mouse genomes, AlphaGenome matches or exceeds the strongest respective available external models on 24 out of 26 evaluations on variant effect prediction. AlphaGenome’s ability to simultaneously score variant effects across all modalities accurately recapitulates the mechanisms of clinically-relevant variants near the TAL1 oncogene. To facilitate broader use, we provide tools for making genome track and variant effect predictions from sequence.

Tags: None

309. Scalable emulation of protein equilibrium ensembles with generative deep learning, Science (July 10, 2025)

Authors: Sarah Lewis, Tim Hempel, José Jiménez-Luna, Michael Gastegger, Yu Xie, Andrew Y. K. Foong, Victor García Satorras, Osama Abdin, Bastiaan S. Veeling, Iryna Zaporozhets, Yaoyi Chen, Soojung Yang, Adam E. Foster, Arne Schneuing, Jigyasa Nigam, Federico Barbero, Vincent Stimper, Andrew Campbell, Jason Yim, Marten Lienen, Yu Shi, Shuxin Zheng, Hannes Schulz, Usman Munir, Roberto Sordillo, Ryota Tomioka, Cecilia Clementi, Frank Noé

Volume & Issue: Volume 389, Issue 6761

Pages: eadv9817

Abstract: Following the sequence and structure revolutions, predicting functionally relevant protein structure changes at scale remains an outstanding challenge. We introduce BioEmu, a deep learning system that emulates protein equilibrium ensembles by generating thousands of statistically independent structures per hour on a single graphics processing unit (GPU). BioEmu integrates more than 200 milliseconds of molecular dynamics (MD) simulations, static structures, and experimental protein stabilities using new training algorithms. It captures diverse functional motions—including cryptic pocket formation, local unfolding, and domain rearrangements—and predicts relative free energies with 1 kilocalorie per mole accuracy compared with millisecond-scale MD and experimental data. BioEmu provides mechanistic insights by jointly modeling structural ensembles and thermodynamic properties. This approach amortizes the cost of MD and experimental data generation, demonstrating a scalable path toward understanding and designing protein function.

Tags: None

308. Artificial Intelligence Paradigms for Next-Generation Metal–Organic Framework Research, Journal of the American Chemical Society (July 09, 2025)

Authors: Aydin Ozcan, François-Xavier Coudert, Sven M. J. Rogge, Greta Heydenrych, Dong Fan, Antonios P. Sarikas, Seda Keskin, Guillaume Maurin, George E. Froudakis, Stefan Wuttke, Ilknur Erucar

Volume & Issue: Volume 147, Issue 27

Pages: 23367-23380

Abstract: After the development of the famous “Transformer” network architecture and the meteoric rise of artificial intelligence (AI)-powered chatbots, large language models (LLMs) have become an indispensable part of our daily activities. In this rapidly evolving era, “all we need is attention” as Google’s famous transformer paper’s title [Vaswani et al., Adv. Neural Inf. Process. Syst. 2017, 30] implies: We need to focus on and give “attention” to what we have at hand, then consider what we can do further. What can LLMs offer for immediate short-term adaptation? Currently, the most common applications in metal–organic framework (MOF) research include automating literature reviews and data extraction to accelerate the material discovery process. In this perspective, we discuss the latest developments in machine-learning and deep-learning research on MOF materials and reflect on how their utilization has evolved within the LLM domain from this standpoint. We finally explore future benefits to accelerate and automate materials development research.

Tags: None

307. Accelerated data-driven materials science with the Materials Project, Nature Materials (July 03, 2025)

Authors: Matthew K. Horton, Patrick Huck, Ruo Xi Yang, Jason M. Munro, Shyam Dwaraknath, Alex M. Ganose, Ryan S. Kingsbury, Mingjian Wen, Jimmy X. Shen, Tyler S. Mathis, Aaron D. Kaplan, Karlo Berket, Janosh Riebesell, Janine George, Andrew S. Rosen, Evan W. C. Spotte-Smith, Matthew J. McDermott, Orion A. Cohen, Alex Dunn, Matthew C. Kuner, Gian-Marco Rignanese, Guido Petretto, David Waroquiers, Sinead M. Griffin, Jeffrey B. Neaton, Daryl C. Chrzan, Mark Asta, Geoffroy Hautier, Shreyas Cholia, Gerbrand Ceder, Shyue Ping Ong, Anubhav Jain, Kristin A. Persson

Pages: 1-11

Abstract: Materials design and informatics have become increasingly prominent over the past several decades. Using the Materials Project as an example, this Perspective discusses how properties are calculated and curated, how this knowledge can be used for materials discovery, and the challenges in modelling complex material systems or managing software architecture.

Tags: None

306. Natural-Language-Interfaced Robotic Synthesis for AI-Copilot-Assisted Exploration of Inorganic Materials, Journal of the American Chemical Society (July 02, 2025)

Authors: Lin Huang, Chao Zhang, Yun Fu, Yibin Jiang, Enyu He, Ming-Qiang Qi, Ming-Hao Du, Xiang-Jian Kong, Jun Cheng, Leroy Cronin, Cheng Wang

Volume & Issue: Volume 147, Issue 26

Pages: 23014-23025

Abstract: The automation of chemical synthesis presents opportunities to enhance experimental reproducibility and accelerate discovery. Traditional closed-loop approaches, while effective in specific domains, are often constrained by rigid workflows and the requirement for specialized expertise. Here, we introduce a chemical robotic explorer integrated with an artificial intelligence (AI) copilot to enable a more flexible and adaptive synthesis, simplifying the process from inspiration to experimentation. This modular platform uses a large language model (LLM) to map natural language synthetic descriptions to executable unit operations, including temperature control, stirring, liquid and solid handling, filtration, etc. By integrating AI-driven literature searches, real-time experimental design, conversational human–AI interaction, and feedback-based optimization, we demonstrate the capabilities of AI in successfully synthesizing 13 compounds across four distinct classes of inorganic materials: coordination complexes, metal–organic frameworks, nanoparticles, and polyoxometalates. Notably, this approach enabled the discovery of a previously unreported family of Mn–W polyoxometalate clusters, showing the potential of AI-enhanced robotics as a generalizable and adaptable platform for material innovation.

Tags: None

305. Self-Evolving Discovery of Carrier Biomaterials with Ultra-Low Nonspecific Protein Adsorption for Single Cell Analysis, Advanced Materials (July 02, 2025)

Authors: Songtao Hu, Wenhui Lu, Xijia Ding, Yingying Xue, Congcong Liu, Tian Xie, Yinjun Deng, Haoran Li, Zhuocheng Gong, Yanming Xia, Peishen He, Lingliao Zeng, Zhong Wang, Jian Jin, Zhi Luo, Xi Shi, Zhike Peng, Tao Xu, Xiaobao Cao

Volume & Issue: Volume n/a, Issue n/a

Pages: 2506243

Abstract: Carrier biomaterials used in single-cell analysis face a bottleneck in protein detection sensitivity, primarily attributed to elevated false positives caused by nonspecific protein adsorption. Toward carrier biomaterials with ultra-low nonspecific protein adsorption, a self-evolving discovery is developed to address the challenge of high-dimensional parameter spaces. Automation across nine self-developed or modified workstations is integrated to achieve a “can-do” capability, and develop a synergy-enhanced Bayesian optimization algorithm as the artificial intelligence brain to enable a “can-think” capability for small-data problems inherent to time-consuming biological experiments, thereby establishing a self-evolving discovery for carrier biomaterials. Through this approach, carrier biomaterials with an ultra-low nonspecific protein adsorption index of 0.2537 are successfully discovered, representing an over 80% decrease, while achieving a 10 000-fold reduction in experiment workload. Furthermore, the discovered biomaterials are fabricated into microfluidic-used carriers for protein-analysis applications, showing a 9-fold enhancement in detection sensitivity compared to conventional carriers. This is the very demonstration of a self-evolving discovery for carrier biomaterials, paving the way for advancements in single-cell protein analysis and further its integration with genomics and transcriptomics.

Tags: artificial intelligence, automated experiments, biomaterials, single-cell analysis

304. El Agente: An autonomous agent for quantum chemistry, Matter (July 02, 2025)

Authors: Yunheng Zou, Austin H. Cheng, Abdulrahman Aldossary, Jiaru Bai, Shi Xuan Leong, Jorge Arturo Campos-Gonzalez-Angulo, Changhyeok Choi, Cher Tian Ser, Gary Tom, Andrew Wang, Zijian Zhang, Ilya Yakavets, Han Hao, Chris Crebolder, Varinia Bernales, Alán Aspuru-Guzik

Volume & Issue: Volume 8, Issue 7

Abstract: No abstract available

Tags: artificial intelligence, MAP 4: Demonstrate, density functional theory, large language models, agentic systems, foundation models, quantum computational chemistry, time-dependent density functional theory

303. Zero shot molecular generation via similarity kernels, Nature Communications (July 01, 2025)

Authors: Rokas Elijošius, Fabian Zills, Ilyes Batatia, Sam Walton Norwood, Dávid Péter Kovács, Christian Holm, Gábor Csányi

Volume & Issue: Volume 16, Issue 1

Pages: 5991

Abstract: Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the behaviour of the learnt score is not yet well understood. Here, we analyse the score by training an energy-based diffusion model for molecular generation. We find that during the generation the score resembles a restorative potential initially and a quantum-mechanical force at the end, exhibiting special properties in between that enable the building of large molecules. Building upon these insights, we present Similarity-based Molecular Generation (SiMGen), a new zero-shot molecular generation method. SiMGen combines a time-dependent similarity kernel with local many-body descriptors to generate molecules without any further training. Our approach allows shape control via point cloud priors. Importantly, it can also act as guidance for existing trained models, enabling fragment-biased generation. We also release an interactive web tool, ZnDraw, for online SiMGen generation (https://zndraw.icp.uni-stuttgart.de).

Tags: Cheminformatics, Computational science

302. Enabling large language models for real-world materials discovery, Nature Machine Intelligence (July 2025)

Authors: Santiago Miret, N. M. Anoop Krishnan

Volume & Issue: Volume 7, Issue 7

Pages: 991-998

Abstract: Miret and Krishnan discuss the promise of large language models (LLMs) to revolutionize materials discovery via automated processing of complex, interconnected, multimodal materials data. They also consider critical limitations and research opportunities needed to unblock LLMs for breakthroughs in materials science.

Tags: None

301. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists, Nature Chemistry (July 2025)

Authors: Adrian Mirza, Nawaf Alampara, Sreekanth Kunchapu, Martiño Ríos-García, Benedict Emoekabu, Aswanth Krishnan, Tanya Gupta, Mara Schilling-Wilhelmi, Macjonathan Okereke, Anagha Aneesh, Mehrdad Asgari, Juliane Eberhardt, Amir Mohammad Elahi, Hani M. Elbeheiry, María Victoria Gil, Christina Glaubitz, Maximilian Greiner, Caroline T. Holick, Tim Hoffmann, Abdelrahman Ibrahim, Lea C. Klepsch, Yannik Köster, Fabian Alexander Kreth, Jakob Meyer, Santiago Miret, Jan Matthias Peschel, Michael Ringleb, Nicole C. Roesner, Johanna Schreiber, Ulrich S. Schubert, Leanne M. Stafast, A. D. Dinga Wonanke, Michael Pieler, Philippe Schwaller, Kevin Maik Jablonka

Volume & Issue: Volume 17, Issue 7

Pages: 1027-1034

Abstract: Large language models (LLMs) have gained widespread interest owing to their ability to process human language and perform tasks on which they have not been explicitly trained. However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here we introduce ChemBench, an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2,700 question–answer pairs, evaluated leading open- and closed-source LLMs and found that the best models, on average, outperformed the best human chemists in our study. However, the models struggle with some basic tasks and provide overconfident predictions. These findings reveal LLMs’ impressive chemical capabilities while emphasizing the need for further research to improve their safety and usefulness. They also suggest adapting chemistry education and show the value of benchmarking frameworks for evaluating LLMs in specific domains.

Tags: Cheminformatics, Chemical safety

300. Best practices for multi-fidelity Bayesian optimization in materials and molecular research, Nature Computational Science (July 2025)

Authors: Víctor Sabanza-Gil, Riccardo Barbano, Daniel Pacheco Gutiérrez, Jeremy S. Luterbacher, José Miguel Hernández-Lobato, Philippe Schwaller, Loïc Roch

Volume & Issue: Volume 5, Issue 7

Pages: 572-581

Abstract: Multi-fidelity Bayesian optimization methods are studied on molecular and material discovery tasks, and guidelines are provided to recommend cheaper and informative low-fidelity sources when using this technique in experimental settings.

Tags: None

299. A generalized platform for artificial intelligence-powered autonomous enzyme engineering, Nature Communications (July 01, 2025)

Authors: Nilmani Singh, Stephan Lane, Tianhao Yu, Jingxia Lu, Adrianna Ramos, Haiyang Cui, Huimin Zhao

Volume & Issue: Volume 16, Issue 1

Pages: 5648

Abstract: Proteins are the molecular machines of life with numerous applications in energy, health, and sustainability. However, engineering proteins with desired functions for practical applications remains slow, expensive, and specialist-dependent. Here we report a generally applicable platform for autonomous enzyme engineering that integrates machine learning and large language models with biofoundry automation to eliminate the need for human intervention, judgement, and domain expertise. Requiring only an input protein sequence and a quantifiable way to measure fitness, this automated platform can be applied to engineer a wide array of proteins. As a proof of concept, we engineer Arabidopsis thaliana halide methyltransferase (AtHMT) for a 90-fold improvement in substrate preference and 16-fold improvement in ethyltransferase activity, along with developing a Yersinia mollaretii phytase (YmPhytase) variant with 26-fold improvement in activity at neutral pH. This is accomplished in four rounds over 4 weeks, while requiring construction and characterization of fewer than 500 variants for each enzyme. This platform for autonomous experimentation paves the way for rapid advancements across diverse industries, from medicine and biotechnology to renewable energy and sustainable chemistry.

Tags: Biocatalysis, Machine learning, Protein design, Synthetic biology

298. Machine-learning design of ductile FeNiCoAlTa alloys with high strength, Nature (July 2025)

Authors: Yasir Sohail, Chongle Zhang, Dezhen Xue, Jinyu Zhang, Dongdong Zhang, Shaohua Gao, Yang Yang, Xiaoxuan Fan, Hang Zhang, Gang Liu, Jun Sun, En Ma

Volume & Issue: Volume 643, Issue 8070

Pages: 119-124

Abstract: The pursuit of strong yet ductile alloys has been ongoing for centuries. However, for all alloys developed thus far, including recent high-entropy alloys, those possessing good tensile ductility rarely approach 2-GPa yield strength at room temperature. The few that do are mostly ultra-strong steels1–3; however, their stress–strain curves exhibit plateaus and serrations because their tensile flow suffers from plastic instability (such as Lüders strains)1–4, and the elongation is pseudo-uniform at best. Here we report that a group of carefully engineered multi-principal-element alloys, with a composition of Fe35Ni29Co21Al12Ta3 designed by means of domain knowledge-informed machine learning, can be processed to reach an unprecedented range of simultaneously high strength and ductility. An example of this synergy delivers 1.8-GPa yield strength combined with 25% truly uniform elongation. We achieved strengthening by pushing microstructural heterogeneities to the extreme through unusually large volume fractions of not only coherent L12 nanoprecipitates but also incoherent B2 microparticles. The latter, being multicomponent with a reduced chemical ordering energy, is a deformable phase that accumulates dislocations inside to help sustain a high strain hardening rate that prolongs uniform elongation.

Tags: Materials science, Metals and alloys

297. Tracking 35 years of progress in metallic materials for extreme environments via text mining, Scientific Reports (July 01, 2025)

Authors: Xin Wang, Anshu Raj, Yanqing Su, Shuozhi Xu, Kun Lu

Volume & Issue: Volume 15, Issue 1

Pages: 21219

Abstract: As global energy demands rise, the advancement of new energy technologies increasingly relies on the development of metals that can endure extreme pressures, temperatures, and fluxes of energetic particles and photons, as well as aggressive chemical reactions. One way to assist in the design and manufacturing of metals for the future is by learning from their past. Here we track the progress of metallic materials for extreme environments in the past 35 years using the text mining method, which allows us to discover patterns from a large scale of literature in the field. Specifically, we leverage transfer learning and dynamic word embeddings. Approximately one million relevant abstracts ranging from 1989 to 2023 were collected from the Web of Science. The literature was then mapped to a 200-dimensional vector space, generating time-series word embeddings across six time periods. Subsequent orthogonal Procrustes analysis was employed to align and compare vectors across these periods, overcoming challenges posed by training randomness and the non-uniqueness of singular value decomposition. This enabled the comparison of the semantic evolution of terms related to metals under extreme conditions. The model’s performance was evaluated using inputs categorized into materials, properties, and applications, demonstrating its ability to identify relevant metallic materials to the three input categories. The study also revealed the temporal changes in keyword associations, indicating shifts in research focus or industrial interest towards high-performance alloys for applications in aerospace and biomedical engineering, among others. This showcases the model’s capability to track the progress in metallic materials for extreme environments over time.

Tags: Materials science, Information technology

296. Ultrabroadband and band-selective thermal meta-emitters by machine learning, Nature (July 2025)

Authors: Chengyu Xiao, Mengqi Liu, Kan Yao, Yifan Zhang, Mengqi Zhang, Max Yan, Ya Sun, Xianghui Liu, Xuanyu Cui, Tongxiang Fan, Changying Zhao, Wansu Hua, Yinqiao Ying, Yuebing Zheng, Di Zhang, Cheng-Wei Qiu, Han Zhou

Volume & Issue: Volume 643, Issue 8070

Pages: 80-88

Abstract: An unconventional machine learning-based inverse design framework enables the generation of ultrabroadband and band-selective thermal meta-emitters with complex 3D architectures and diverse material compositions.

Tags: None

295. Large language models to accelerate organic chemistry synthesis, Nature Machine Intelligence (July 2025)

Authors: Yu Zhang, Yang Han, Shuai Chen, Ruijie Yu, Xin Zhao, Xianbin Liu, Kaipeng Zeng, Mengdi Yu, Jidong Tian, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu

Volume & Issue: Volume 7, Issue 7

Pages: 1010-1022

Abstract: Large language models (LLMs) can be useful tools for science, but they often lack expert understanding of complex domains that they were not trained on. Zhang and colleagues fine-tuned a LLaMA-2-7b-based LLM with questions on organic chemistry reactions.

Tags: None

294. UMA: A Family of Universal Models for Atoms, Preprint (June 30, 2025)

Authors: Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick

Abstract: The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, Meta FAIR presents a family of Universal Models for Atoms (UMA), designed to push the frontier of speed, accuracy, and generalization. UMA models are trained on half a billion unique 3D atomic structures (the largest training runs to date) by compiling data across multiple chemical domains, e.g. molecules, materials, and catalysts. We develop empirical scaling laws to help understand how to increase model capacity alongside dataset size to achieve the best accuracy. The UMA small and medium models utilize a novel architectural design we refer to as mixture of linear experts that enables increasing model capacity without sacrificing speed. For example, UMA-medium has 1.4B parameters but only ~50M active parameters per atomic structure. We evaluate UMA models on a diverse set of applications across multiple domains and find that, remarkably, a single model without any fine-tuning can perform similarly or better than specialized models. We are releasing the UMA code, weights, and associated data to accelerate computational workflows and enable the community to continue to build increasingly capable AI models.

Tags: Computer Science - Machine Learning

293. Rethinking chemical research in the age of large language models, Nature Computational Science (June 24, 2025)

Authors: Robert MacKnight, Daniil A. Boiko, Jose Emilio Regio, Liliana C. Gallegos, Théo A. Neukomm, Gabe Gomes

Pages: 1-12

Abstract: This Perspective highlights the potential integrations of large language models (LLMs) in chemical research and provides guidance on the effective use of LLMs as research partners, noting the ethical and performance-based challenges that must be addressed moving forward.

Tags: None

292. Agent-based multimodal information extraction for nanomaterials, npj Computational Materials (June 23, 2025)

Authors: R. Odobesku, K. Romanova, S. Mirzaeva, O. Zagorulko, R. Sim, R. Khakimullin, J. Razlivina, A. Dmitrenko, V. Vinogradov

Volume & Issue: Volume 11, Issue 1

Pages: 194

Abstract: Automating structured data extraction from scientific literature is a critical challenge with broad implications across domains. We introduce nanoMINER, a multi-agent system combining large language models and multimodal analysis to extract essential information from scientific research articles on nanomaterials. This system processes documents end-to-end, utilizing tools such as YOLO for visual data extraction and GPT-4o for linking textual and visual information. At its core, the ReAct agent orchestrates specialized agents to ensure comprehensive data extraction. We demonstrate the efficacy of the system by automating the assembly of nanomaterial and nanozyme datasets previously manually curated by domain experts. NanoMINER achieves high precision in extracting nanomaterial properties like chemical formulas, crystal systems, and surface characteristics. For nanozymes, we obtain near-perfect precision (0.98) for kinetic parameters and essential features such as Cmin and Cmax. To benchmark the system performance, we also compare nanoMINER to several baseline LLMs, including the most recent multimodal GPT-4.1, and show consistently higher extraction precision and recall. Our approach is extensible to other domains of materials science and fields like biomedicine, advancing data-driven research methodologies and automated knowledge extraction.

Tags: Heterogeneous catalysis, Nanoscale materials

291. All-atom Diffusion Transformers: Unified generative modelling of molecules and materials, International Conference on Machine Learning (June 18, 2025)

Authors: Chaitanya K. Joshi, Xiang Fu, Yi-Lun Liao, Vahe Gharakhanyan, Benjamin Kurt Miller, Anuroop Sriram, Zachary Ward Ulissi

Abstract: Diffusion models are the standard toolkit for generative modelling of 3D atomic systems. However, for different types of atomic systems – such as molecules and materials – the generative processes are usually highly specific to the target system despite the underlying physics being the same. We introduce the All-atom Diffusion Transformer (ADiT), a unified latent diffusion framework for jointly generating both periodic materials and non-periodic molecular systems using the same model: (1) An autoencoder maps a unified, all-atom representations of molecules and materials to a shared latent embedding space; and (2) A diffusion model is trained to generate new latent embeddings that the autoencoder can decode to sample new molecules or materials. Experiments on MP20, QM9 and GEOM-DRUGS datasets demonstrate that jointly trained ADiT generates realistic and valid molecules as well as materials, obtaining state-of-the-art results on par with molecule and crystal-specific models. ADiT uses standard Transformers with minimal inductive biases for both the autoencoder and diffusion model, resulting in significant speedups during training and inference compared to equivariant diffusion models. Scaling ADiT up to half a billion parameters predictably improves performance, representing a step towards broadly generalizable foundation models for generative chemistry. Open source code: https://github.com/facebookresearch/all-atom-diffusion-transformer

Tags: None

290. Data-Driven Design of Random Heteropolypeptides as Synthetic Polyclonal Antibodies, Journal of the American Chemical Society (June 18, 2025)

Authors: Haisen Zhou, Guangqi Wu, Zuo Zhang, Zhen Zhu, Tanfeng Zhao, Qiankang Zhou, Haolin Chen, Zhenzhong Zhao, Yuanhao Dai, Xiaodong Jing, Wei Ji, Xuehai Yan, Lijiang Yang, Yiqin Gao, Huan Wang, Hua Lu

Volume & Issue: Volume 147, Issue 24

Pages: 21077-21088

Abstract: Antibodies are indispensable in biomedicine. However, conventional antibody development faces challenges, including high costs and lengthy production timelines spanning months. Here, we present a data-driven workflow for engineering random heteropolypeptides (RHPs) as synthetic polyclonal antibodies (SpAbs) with programmable binding properties. By combining high-throughput synthesis of selenopolypeptide derivatives with algorithm-assisted optimization, we rapidly identified SpAbs targeting human interferon-α (IFN) and tumor necrosis factor-α (TNF-α) within 2 weeks. The SpAbs exhibited binding affinities comparable to natural antibodies, with the top candidate achieving a dissociation constant of 7.9 nM for TNF-α and 418-fold selectivity over human serum albumin, effectively neutralizing TNF-α-induced cytotoxicity. Liquid-phase electron microscopy revealed flexible, intrinsically disordered protein-like conformations and folding-upon-binding dynamics. This study establishes a robust framework for SpAb discovery, demonstrating that sequence-independent RHPs can serve as functional antibody mimics with tunable binding properties, rapid optimization, and broad potential in diagnostics and therapeutics.

Tags: None

289. Novel machine learning driven design strategy for high strength Zn Alloys optimization with multiple constraints, npj Computational Materials (June 06, 2025)

Authors: Chenfeng Pan, Wenwen Lin, Jianxing Zhou, Wei Jian, Ka Chun Chan, Yuk Lun Chan, Lu Ren

Volume & Issue: Volume 11, Issue 1

Pages: 169

Abstract: Zinc (Zn) alloys offer advantages such as abundant resources and low cost. Nevertheless, their current mechanical properties limit application in more advanced fields. Due to the lack of clear compositional design methods, the development of high-performance Zn alloys is urgently needed. To this end, this work proposes a fast and effective design strategy for Zn alloys based on machine learning (ML). The prediction models for the ultimate tensile strength, elongation, and hardness were successfully developed, with accuracies exceeding 90%. Interpretability analysis of the models was performed using the SHAP method with particle swarm optimization (PSO). Furthermore, a ML-based Zn alloy composition design system (ZACDS) was proposed by integrating the Bayesian optimization algorithm. A novel high-strength Zn alloy was successfully designed using ZACDS, demonstrating good agreement between predicted and experimental mechanical properties. This approach offers a new strategy for Zn alloy design under different compositional constraints and performance requirements.

Tags: Computational methods, Materials science

288. Agents for self-driving laboratories applied to quantum computing, Preprint (June 05, 2025)

Authors: Shuxiang Cao, Zijian Zhang, Mohammed Alghadeer, Simone D. Fasciati, Michele Piscitelli, Mustafa Bakr, Peter Leek, Alán Aspuru-Guzik

Abstract: Fully automated self-driving laboratories are promising to enable high-throughput and large-scale scientific discovery by reducing repetitive labour. However, effective automation requires deep integration of laboratory knowledge, which is often unstructured, multimodal, and difficult to incorporate into current AI systems. This paper introduces the k-agents framework, designed to support experimentalists in organizing laboratory knowledge and automating experiments with agents. Our framework employs large language model-based agents to encapsulate laboratory knowledge including available laboratory operations and methods for analyzing experiment results. To automate experiments, we introduce execution agents that break multi-step experimental procedures into agent-based state machines, interact with other agents to execute each step and analyze the experiment results. The analyzed results are then utilized to drive state transitions, enabling closed-loop feedback control. To demonstrate its capabilities, we applied the agents to calibrate and operate a superconducting quantum processor, where they autonomously planned and executed experiments for hours, successfully producing and characterizing entangled quantum states at the level achieved by human scientists. Our knowledge-based agent system opens up new possibilities for managing laboratory knowledge and accelerating scientific discovery.

Tags: Computer Science - Artificial Intelligence, Quantum Physics

287. SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning, Advanced Materials (June 05, 2025)

Authors: Alireza Ghafarollahi, Markus J. Buehler

Volume & Issue: Volume 37, Issue 22

Pages: 2413523

Abstract: A key challenge in artificial intelligence (AI) is the creation of systems capable of autonomously advancing scientific understanding by exploring novel domains, identifying complex patterns, and uncovering previously unseen connections in vast scientific data. In this work, SciAgents, an approach that leverages three core concepts is presented: (1) large-scale ontological knowledge graphs to organize and interconnect diverse scientific concepts, (2) a suite of large language models (LLMs) and data retrieval tools, and (3) multi-agent systems with in-situ learning capabilities. Applied to biologically inspired materials, SciAgents reveals hidden interdisciplinary relationships that were previously considered unrelated, achieving a scale, precision, and exploratory power that surpasses human research methods. The framework autonomously generates and refines research hypotheses, elucidating underlying mechanisms, design principles, and unexpected material properties. By integrating these capabilities in a modular fashion, the system yields material discoveries, critiques and improves existing hypotheses, retrieves up-to-date data about existing research, and highlights strengths and limitations. This is achieved by harnessing a “swarm of intelligence” similar to biological systems, providing new avenues for discovery. How this model accelerates the development of advanced materials by unlocking Nature’s design principles, resulting in a new biocomposite with enhanced mechanical properties and improved sustainability through energy-efficient production is shown.

Tags: large language model, bio-inspired materials, biological design, knowledge graph, materials design, multi-agent system, natural language processing, scientific AI

286. A data-driven platform for automated characterization of polymer electrolytes, Matter (June 04, 2025)

Authors: Michael A. Stolberg, Jeffrey Lopez, Sawyer D. Cawthern, Abraham Herzog-Arbeitman, Ha-Kyung Kwon, Daniel Schweigert, Abraham Anapolosky, Brian D. Storey, Jeremiah A. Johnson, Yang Shao-Horn

Volume & Issue: Volume 8, Issue 6

Abstract: No abstract available

Tags: data-driven materials, high-throughput, ionic conductivity, MAP 2: Benchmark, polymer electrolyte

285. IvoryOS: an interoperable web interface for orchestrating Python-based self-driving laboratories, Nature Communications (June 04, 2025)

Authors: Wenyu Zhang, Lucy Hao, Veronica Lai, Ryan Corkery, Jacob Jessiman, Jiayu Zhang, Junliang Liu, Yusuke Sato, Maria Politi, Matthew E. Reish, Rebekah Greenwood, Noah Depner, Jiyoon Min, Rama El-khawaldeh, Paloma Prieto, Ekaterina Trushina, Jason E. Hein

Volume & Issue: Volume 16, Issue 1

Pages: 5182

Abstract: Self-driving laboratories (SDLs), powered by robotics, automation and artificial intelligence, accelerate scientific discoveries through autonomous experimentation. However, their adoption and transferability are limited by the lack of standardized software across diverse SDLs. In this work, we introduce IvoryOS – an open-source orchestrator that automatically generates web interfaces for Python-based SDLs. It ensures interoperability by dynamically updating the user interfaces with the plugged components and their functionalities. The interfaces enable users to directly control SDLs and design workflows through a drag-and-drop user interface. Additionally, the workflow manager provides no-code configuration for iterative execution, supporting both human-in-the-loop and closed-loop experimentation. We demonstrate the integration of IvoryOS with six SDLs across two institutes, showcasing its adaptability and utility across platforms at various development stages. The plug-and-play and low-code feature of IvoryOS addresses the rapidly evolving demands of SDL development and significantly lowers the barrier to entry for building and managing SDLs.

Tags: Automation, Cheminformatics

284. An unsupervised machine learning based approach to identify efficient spin-orbit torque materials, npj Computational Materials (June 03, 2025)

Authors: Shehrin Sayed, Hannah Calzi Kleidermacher, Giulianna Hashemi-Asasi, Cheng-Hsiang Hsu, Sayeef Salahuddin

Volume & Issue: Volume 11, Issue 1

Pages: 167

Abstract: Materials with large spin–orbit torque (SOT) hold considerable significance for many spintronic applications because of their potential for energy-efficient magnetization switching. Unfortunately, most of the existing materials exhibit an SOT efficiency factor that is much less than unity, requiring a large current for magnetization switching. The search for new materials that can exhibit an SOT efficiency much greater than unity is a topic of active research, and only a few such materials have been identified using conventional approaches. In this paper, we present a machine learning-based approach using a word embedding model that can identify new results by deciphering non-trivial correlations among various items in a specialized scientific text corpus. We show that such a model can be used to identify materials likely to exhibit high SOT and rank them according to their expected SOT strengths. The model captured the essential spintronics knowledge embedded in scientific abstracts within various materials science, physics, and engineering journals and identified 97 new materials to exhibit high SOT. Among them, 16 candidate materials are expected to exhibit an SOT efficiency greater than unity, and one of them has recently been confirmed with experiments with quantitative agreement with the model prediction.

Tags: Electronic devices, Electrical and electronic engineering, Spintronics

283. Biomni: A General-Purpose Biomedical AI Agent, Preprint (June 02, 2025)

Authors: Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Gavin Li, Junze Zhang, Di Yin, Shruti Marwaha, Jennefer N. Carter, Xin Zhou, Matthew Wheeler, Jonathan A. Bernstein, Mengdi Wang, Peng He, Jingtian Zhou, Michael Snyder, Le Cong, Aviv Regev, Jure Leskovec

Abstract: Biomedical research underpins progress in our understanding of human health and disease, drug discovery, and clinical care. However, with the growth of complex lab experiments, large datasets, many analytical tools, and expansive literature, biomedical research is increasingly constrained by repetitive and fragmented workflows that slow discovery and limit innovation, underscoring the need for a fundamentally new way to scale scientific expertise. Here, we introduce Biomni, a general-purpose biomedical AI agent designed to autonomously execute a wide spectrum of research tasks across diverse biomedical subfields. To systematically map the biomedical action space, Biomni first employs an action discovery agent to create the first unified agentic environment – mining essential tools, databases, and protocols from tens of thousands of publications across 25 biomedical domains. Built on this foundation, Biomni features a generalist agentic architecture that integrates large language model (LLM) reasoning with retrieval-augmented planning and code-based execution, enabling it to dynamically compose and carry out complex biomedical workflows – entirely without relying on predefined templates or rigid task flows. Systematic benchmarking demonstrates that Biomni achieves strong generalization across heterogeneous biomedical tasks – including causal gene prioritization, drug repurposing, rare disease diagnosis, micro-biome analysis, and molecular cloning – without any task-specific prompt tuning. Real-world case studies further showcase Biomni’s ability to interpret complex, multi-modal biomedical datasets and autonomously generate experimentally testable protocols. Biomni envisions a future where virtual AI biologists operate alongside and augment human scientists to dramatically enhance research productivity, clinical insight, and healthcare. Biomni is ready to use at https://biomni.stanford.edu, and we invite scientists to explore its capabilities, stress-test its limits, and co-create the next era of biomedical discoveries.

Tags: None

282. A multimodal conversational agent for DNA, RNA and protein tasks, Nature Machine Intelligence (June 2025)

Authors: Bernardo P. de Almeida, Guillaume Richard, Hugo Dalla-Torre, Christopher Blum, Lorenz Hexemer, Priyanka Pandey, Stefan Laurent, Chandana Rajesh, Marie Lopez, Alexandre Laterre, Maren Lang, Uğur Şahin, Karim Beguir, Thomas Pierrot

Volume & Issue: Volume 7, Issue 6

Pages: 928-941

Abstract: Language models are thriving, powering conversational agents that assist and empower humans to solve a number of tasks. Recently, these models were extended to support additional modalities including vision, audio and video, demonstrating impressive capabilities across multiple domains, including healthcare. Still, conversational agents remain limited in biology as they cannot yet fully comprehend biological sequences. Meanwhile, high-performance foundation models for biological sequences have been built through self-supervision over sequencing data, but these need to be fine-tuned for each specific application, preventing generalization between tasks. In addition, these models are not conversational, which limits their utility to users with coding capabilities. Here we propose to bridge the gap between biology foundation models and conversational agents by introducing ChatNT, a multimodal conversational agent with an advanced understanding of biological sequences. ChatNT achieves new state-of-the-art results on the Nucleotide Transformer benchmark while being able to solve all tasks at once, in English, and to generalize to unseen questions. In addition, we have curated a set of more biologically relevant instruction tasks from DNA, RNA and proteins, spanning multiple species, tissues and biological processes. ChatNT reaches performance on par with state-of-the-art specialized methods on those tasks. We also present a perplexity-based technique to help calibrate the confidence of our model predictions. By applying attribution methods through the English decoder and DNA encoder, we demonstrate that ChatNT’s answers are based on biologically coherent features such as detecting the promoter TATA motif or splice site dinucleotides. Our framework for genomics instruction tuning can be extended to more tasks and data modalities (for example, structure and imaging), making it a widely applicable tool for biology. ChatNT provides a potential direction for building generally capable agents that understand biology from first principles while being accessible to users with no coding background.

Tags: Computational models, Systems biology

281. Chinese cities show disparity trends in carbon efficiency: A hybrid approach combining explainable machine learning and econometrics, Sustainable Cities and Society (June 01, 2025)

Authors: Le Ma, Tong Cheng, Longgang Xiang, Chao Wang, Liyuan Wei, Jianing Wang, Chaoya Dang, Wenjie Fu, Huayi Wu

Volume & Issue: Volume 127

Pages: 106455

Abstract: Exploring the disparities in carbon emission intensity (CEI) across different city types and their underlying causes is essential for achieving carbon neutrality. However, the nonlinear drivers of CEI and the spatial spillover effects among different city types remain insufficiently studied. This study employs an innovative methodology by integrating SOM-K-means clustering, explainable machine learning (LightGBM-SHAP), and spatial econometric models to systematically analyze CEI disparities accross various city types, focusing on the nonlinear effects of key influencing factors and their spatial effects. The results reveal significant heterogeneity in CEI among city types. High-service cities exhibit the lowest CEI due to advanced economic structures, yet their concentrated economic activities sustain high total emissions. Resource-based cities have achieved a 46.9 % CEI reduction, but delays in energy structure optimization and smart manufacturing hinder further improvements. Heavy manufacturing cities continue to maintain high CEI, with widening disparities and risks of carbon leakage. Key findings highlight the dual role of urbanization: while CEI declines when urbanization rates range from 30 % to 75 %, exceeding 60 % in resource-based cities reverses this trend. Additionally, a 1 % increase in urbanization raises CEI by 1.05 % and increases emissions in neighboring cities by 0.32 %, underscoring spatial spillover effects. Government investment shows diminishing marginal returns, and technological development initially increases CEI but contributes to long-term reductions. These findings underscore the need for tailored carbon reduction strategies that address the distinct socio-economic and industrial dynamics of each city type. This study offers critical insights for advancing China’s carbon reduction goals and promoting sustainable, low-carbon urban development.

Tags: Carbon emission intensity, City clusters, Explainable machine learning, Low-carbon development, SHAP values, Spatial spillover effects

280. Predicting expression-altering promoter mutations with deep learning, Science (May 29, 2025)

Authors: Kishore Jaganathan, Nicole Ersaro, Gherman Novakovsky, Yuchuan Wang, Terena James, Jeremy Schwartzentruber, Petko Fiziev, Irfahan Kassam, Fan Cao, Johann Hawe, Henry Cavanagh, Ashley Lim, Grace Png, Jeremy McRae, Abhimanyu Banerjee, Arvind Kumar, Jacob Ulirsch, Yan Zhang, Francois Aguet, Pierrick Wainschtein, Laksshman Sundaram, Adriana Salcedo, Sofia Kyriazopoulou Panagiotopoulou, Delasa Aghamirzaie, Evin Padhi, Ziming Weng, Shan Dong, Damian Smedley, Mark Caulfield, Anne O’Donnell-Luria, Heidi L. Rehm, Stephan J. Sanders, Anshul Kundaje, Stephen B. Montgomery, Mark T. Ross, Kyle Kai-How Farh

Volume & Issue: Volume 389, Issue 6760

Pages: eads7373

Abstract: Only a minority of patients with rare genetic diseases are presently diagnosed by exome sequencing, suggesting that additional unrecognized pathogenic variants may reside in noncoding sequence. In this work, we describe PromoterAI, a deep neural network that accurately identifies noncoding promoter variants that dysregulate gene expression. We show that promoter variants with predicted expression-altering consequences produce outlier expression at both the RNA and protein levels in thousands of individuals and that these variants experience strong negative selection in human populations. We observed that clinically relevant genes in patients with rare diseases are enriched for such variants and validated their functional impact through reporter assays. Our estimates suggest that promoter variation accounts for 6% of the genetic burden associated with rare diseases.

Tags: None

Authors: Ming-Chiang Chang, Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Lan Zhou, John M. Gregoire, Carla P. Gomes, R. Bruce van Dover, Michael O. Thompson

Volume & Issue: Volume 11, Issue 1

Pages: 1-13

Abstract: X-ray diffraction (XRD) is a powerful method for determining a material’s crystal structure in high-throughput experimentation, and is widely being incorporated in artificially intelligent agents for autonomous scientific discovery. However, rapid, automated, and reliable analysis of XRD data at rates that match the pace of experimental measurements at a synchrotron source remains a major challenge. To address these issues, we developed CrystalShift for rapid and efficient probabilistic XRD phase labeling employing symmetry-constrained optimization, best-first tree search, and Bayesian model comparison. The algorithm estimates probabilities for phase combinations without requiring additional phase space information or training. We demonstrate that CrystalShift provides robust probability estimates, outperforming existing methods on synthetic and experimental datasets, and can be readily integrated into high-throughput experimental workflows. In addition to efficient phase labeling, CrystalShift offers quantitative insights into materials’ structural parameters, which facilitate both expert evaluation and AI-based modeling of the phase space, ultimately accelerating materials identification and discovery.

Tags: None

278. Data-Driven Design of Mechanically Hard Soft Magnetic High-Entropy Alloys, Advanced Science (May 22, 2025)

Authors: Mian Dai, Yixuan Zhang, Xiaoqing Li, Stephan Schönecker, Liuliu Han, Ruiwen Xie, Chen Shen, Hongbin Zhang

Volume & Issue: Volume 12, Issue 19

Pages: 2500867

Abstract: The design and optimization of mechanically hard soft magnetic materials, which combine high hardness with magnetically soft properties, represent a critical frontier in materials science for advanced technological applications. To address this challenge, a data-driven framework is presented for exploring the vast compositional space of high-entropy alloys (HEAs) and identifying candidates optimized for multifunctionality. The study employs a comprehensive dataset of 1 842 628 density functional theory calculations, comprising 45 886 quaternary and 414 771 quinary equimolar HEAs derived from 42 elements. Using ensemble learning, predictive models are integrated to capture the relationships between composition, crystal structure, mechanical, and magnetic properties. This framework offers a robust pathway for accelerating the discovery of next-generation alloys with high hardness and magnetic softness, highlighting the transformative impact of data-driven strategies in material design.

Tags: machine learning, density functional theory, high-entropy alloys, high-throughput calculations, mechanically hard soft magnets

277. A novel training-free approach to efficiently extracting material microstructures via visual large model, Acta Materialia (May 15, 2025)

Authors: Changtai Li, Xu Han, Chao Yao, Yu Guo, Zixin Li, Lei Jiang, Wei Liu, Haiyou Huang, Huadong Fu, Xiaojuan Ban

Volume & Issue: Volume 290

Pages: 120962

Abstract: The precise quantitative description of material microstructures is essential for deeply exploring the relationship between material composition and property. This significant understanding efficiently enables composition design, process optimization, and property enhancement. Traditionally, the analysis of material microstructures has relied heavily on professional expertise. Even with machine /deep learning (ML/DL)-based analysis methods, substantial expert annotation is required for training, and the trained models often suffer from weak generalizability and poor recognition of new images. This study proposed MatSAM (Materials Segment Anything Model), a novel training-free approach for efficient material microstructure extraction based on the Segment Anything Model (SAM), a type of visual large model (VLM). Integrating region marking and microscopy-adapted points, an automated point-based prompt strategy was developed to achieve accurate and efficient material microstructure recognition. Without any manual annotations, MatSAM precisely identified 11 kinds of metallic material microstructures obtained through various characterization methods. Compared to optimal conventional rule-based methods that do not involve a learning process (non-ML/DL), MatSAM achieved an average relative improvement of 35.4 % in metrics combining the adjusted Rand index (ARI) and Intersection over Union (IoU), outperforming the original SAM by an average of 13.9 %. On four public microstructure segmentation datasets, the IoU of MatSAM showed an average improvement of 7.5 % over corresponding specialist deep models requiring annotations. Meanwhile, MatSAM satisfied the generalization capability of a single model for various microstructures, including grain boundaries, phases, and defects. This approach significantly reduces the labor and computational costs of quantitatively characterizing material microstructures, further accelerating the development of advanced materials.

Tags: Deep learning, Image recognition, Material microstructure, Visual large model

276. Experimentally validated inverse design of FeNiCrCoCu MPEAs and unlocking key insights with explainable AI, npj Computational Materials (May 15, 2025)

Authors: Fangxi Wang, Allana G. Iwanicki, Abhishek T. Sose, Lucas A. Pressley, Tyrel M. McQueen, Sanket A. Deshmukh

Volume & Issue: Volume 11, Issue 1

Pages: 124

Abstract: A computational workflow integrating a stacked ensemble machine learning (SEML) model and a convolutional neural network (CNN) model with evolutionary algorithms has been developed to identify new compositions of FeNiCrCoCu MPEAs with high bulk modulus and unstable stacking fault energies. The identified compositions were synthesized and tested for their crystal structures and mechanical properties (hardness and Young’s modulus), resulting in single-phase face-centered cubic (FCC) structures. Additionally, the measured Young’s moduli were in good qualitative agreement with computational predictions. The SHapley Additive exPlanations (SHAP) analysis of the SEML model revealed a relationship between elemental concentration and USFE. Meanwhile, SHAP analysis of the CNN models uncovered correlations between the local clustering of MPEA elements and their mechanical properties. This computational workflow, along with the fundamental insights gained, can be readily expanded and applied to the design of MPEAs with different elemental compositions, as well as to materials beyond MPEAs.

Tags: Atomistic models, Metals and alloys

275. Interpretable Machine Learning Applications: A Promising Prospect of AI for Materials, Advanced Functional Materials (May 13, 2025)

Authors: Xue Jiang, Huadong Fu, Yang Bai, Lei Jiang, Hongtao Zhang, Weiren Wang, Peiwen Yun, Jingjin He, Dezhen Xue, Turab Lookman, Yanjing Su, Jianxin Xie

Volume & Issue: Volume n/a, Issue n/a

Pages: 2507734

Abstract: In recent years, data-driven machine learning has significantly advanced the design of new materials and transformed the research and development landscape. However, its heavy reliance on data and the “black-box” nature of its model-mapping mechanisms have hindered its application in materials science research. Integrating material knowledge with machine learning to enhance model generalization and prediction accuracy remains an important objective. Such integration can deepen the understanding of material mechanisms by screening physical and chemical features to uncover explicit intrinsic relationships. Thus, it promotes the advancement of materials science, representing a promising avenue for artificial intelligence (AI) applications in this field. In this review, the algorithms, functionalities, and applications in materials underlying interpretable machine learning approaches are summarized and analyzed. The impact of composition and microstructure on material properties is explored and mathematical expressions for intrinsic relationships of materials are developed. In addition, recent advancements in data- and knowledge-driven strategies for new material discovery, key property enhancement, multi-objective design trade-offs, and optimizing the entire preparation and processing workflow are reviewed. Finally, the future prospects and challenges associated with applying AI in materials science and its broader implications for the field are discussed.

Tags: AI for materials, data- and knowledge-driven approaches, interpretable machine learning

274. Token-Mol 1.0: tokenized drug design with large language models, Nature Communications (May 13, 2025)

Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Jingxuan Ge, Zhourui Wu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

Volume & Issue: Volume 16, Issue 1

Pages: 4416

Abstract: The integration of large language models (LLMs) into drug design is gaining momentum; however, existing approaches often struggle to effectively incorporate three-dimensional molecular structures. Here, we present Token-Mol, a token-only 3D drug design model that encodes both 2D and 3D structural information, along with molecular properties, into discrete tokens. Built on a transformer decoder and trained with causal masking, Token-Mol introduces a Gaussian cross-entropy loss function tailored for regression tasks, enabling superior performance across multiple downstream applications. The model surpasses existing methods, improving molecular conformation generation by over 10% and 20% across two datasets, while outperforming token-only models by 30% in property prediction. In pocket-based molecular generation, it enhances drug-likeness and synthetic accessibility by approximately 11% and 14%, respectively. Notably, Token-Mol operates 35 times faster than expert diffusion models. In real-world validation, it improves success rates and, when combined with reinforcement learning, further optimizes affinity and drug-likeness, advancing AI-driven drug discovery.

Tags: Machine learning, Cheminformatics, Drug discovery and development

273. Exploration of crystal chemical space using text-guided generative artificial intelligence, Nature Communications (May 12, 2025)

Authors: Hyunsoo Park, Anthony Onwuli, Aron Walsh

Volume & Issue: Volume 16, Issue 1

Pages: 4379

Abstract: The vastness of chemical space presents a long-standing challenge for the exploration of new compounds with pre-determined properties. In materials science, crystal structure prediction has become a mature tool for mapping from composition to structure based on global optimisation techniques. Generative artificial intelligence now offers the means to efficiently navigate larger regions of crystal chemical space informed by structure-property datasets of materials. Here, we introduce a model, named Chemeleon, designed to generate chemical compositions and crystal structures by learning from both textual descriptions and three-dimensional structural data. The model employs denoising diffusion techniques for compound generation using textual inputs aligned with structural data via cross-modal contrastive learning. The potential of this approach is demonstrated for multi-component compound generation, including the Zn-Ti-O ternary space, and the prediction of stable phases in the Li-P-S-Cl quaternary space of relevance to solid-state batteries.

Tags: Atomistic models, Computational chemistry

272. Using GNN property predictors as molecule generators, Nature Communications (May 08, 2025)

Authors: Félix Therrien, Edward H. Sargent, Oleksandr Voznyy

Volume & Issue: Volume 16, Issue 1

Pages: 4301

Abstract: Graph neural networks (GNNs) have emerged as powerful tools to accurately predict materials and molecular properties in computational and automated discovery pipelines. In this article, we exploit the invertible nature of these neural networks to directly generate molecular structures with desired electronic properties. Starting from a random graph or an existing molecule, we perform a gradient ascent while holding the GNN weights fixed in order to optimize its input, the molecular graph, towards the target property. Valence rules are enforced strictly through a judicious graph construction. The method relies entirely on the property predictor; no additional training is required on molecular structures. We demonstrate the application of this method by generating molecules with specific energy gaps verified with density functional theory (DFT) and with specific octanol-water partition coefficients (logP). Our approach hits target properties with rates comparable to or better than state-of-the-art generative models while consistently generating more diverse molecules. Moreover, while validating our framework we created a dataset of 1617 new molecules and their corresponding DFT-calculated properties that could serve as an out-of-distribution test set for QM9-trained models.

Tags: Cheminformatics, Computational science

271. Discovery of Sustainable Energy Materials Via the Machine-Learned Material Space, Small (May 05, 2025)

Authors: Malte Grunert, Max Großmann, Erich Runge

Volume & Issue: Volume n/a, Issue n/a

Pages: 2412519

Abstract: Does a machine learning (ML) model capture the intrinsic structure of the material space? The example of the OptiMate model, a graph attention network trained to predict the optical properties of semiconductors and insulators, provides an affirmative answer. By applying the UMAP dimensionality reduction technique to its latent embeddings, it is demonstrated that the model captures a nuanced and interpretable representation of the materials space, reflecting chemical and physical principles, without any user-induced bias. This enables clustering of almost 10,000 materials based on optical properties and chemical similarities. Furthermore, it is shown how the learned material space can be used to identify more sustainable alternatives to critical materials in energy-related technologies, such as photovoltaics. These findings demonstrate the dual utility of ML models in materials science: Accurately predicting material properties while providing insights into the underlying materials space. The approach demonstrates the broader potential of leveraging learned materials spaces for the discovery and design of materials for diverse applications, and is easily applicable to any state-of-the-art ML model.

Tags: machine learning, energy materials, material space, optical materials, sustainability

270. End-to-end data-driven weather prediction, Nature (May 2025)

Authors: Anna Allen, Stratis Markou, Will Tebbutt, James Requeima, Wessel P. Bruinsma, Tom R. Andersson, Michael Herzog, Nicholas D. Lane, Matthew Chantry, J. Scott Hosking, Richard E. Turner

Volume & Issue: Volume 641, Issue 8065

Pages: 1172-1179

Abstract: Weather prediction is critical for a range of human activities, including transportation, agriculture and industry, as well as for the safety of the general public. Machine learning transforms numerical weather prediction (NWP) by replacing the numerical solver with neural networks, improving the speed and accuracy of the forecasting component of the prediction pipeline1–6. However, current models rely on numerical systems at initialization and to produce local forecasts, thereby limiting their achievable gains. Here we show that a single machine learning model can replace the entire NWP pipeline. Aardvark Weather, an end-to-end data-driven weather prediction system, ingests observations and produces global gridded forecasts and local station forecasts. The global forecasts outperform an operational NWP baseline for several variables and lead times. The local station forecasts are skilful for up to ten days of lead time, competing with a post-processed global NWP baseline and a state-of-the-art end-to-end forecasting system with input from human forecasters. End-to-end tuning further improves the accuracy of local forecasts. Our results show that skilful forecasting is possible without relying on NWP at deployment time, which will enable the realization of the full speed and accuracy benefits of data-driven models. We believe that Aardvark Weather will be the starting point for a new generation of end-to-end models that will reduce computational costs by orders of magnitude and enable the rapid, affordable creation of customized models for a range of end users.

Tags: Computer science, Atmospheric science, Environmental sciences, Natural hazards

269. Engineering principles for self-driving laboratories, Nature Chemical Engineering (May 2025)

Authors: Fernando Delgado-Licona, Daniel Addington, Abdulrahman Alsaiari, Milad Abolhasani

Volume & Issue: Volume 2, Issue 5

Pages: 277-280

Abstract: The chemical engineering field stands at a pivotal moment of transformation, driven in part by the convergence of process automation, robotics and artificial intelligence into self-driving laboratories (SDLs) to accelerate scientific discoveries. This Comment explores how process intensification principles can guide the development of SDLs to accelerate innovation while ensuring efficient use of resources across the multiscale chemical domain.

Tags: None

268. Large language model-driven database for thermoelectric materials, Computational Materials Science (May 01, 2025)

Authors: Suman Itani, Yibo Zhang, Jiadong Zang

Volume & Issue: Volume 253

Pages: 113855

Abstract: Thermoelectric materials have the ability to convert waste heat into electricity, offering a valuable solution for energy harvesting. However, their widespread use is hindered by low conversion efficiency, the reliance on expensive rare earth elements, and the environmental and regulatory concerns associated with lead-based materials. A fast and cost-effective way to identify highly efficient thermoelectric materials is through data-driven methods. These approaches rely on robust and comprehensive datasets to train models. Although there are several databases on thermoelectric materials, there is still a need to collect and integrate experimental data from peer-reviewed research articles to capture diverse compositions and properties of materials. Here we developed a comprehensive database of 7,123 thermoelectric compounds, containing key information such as chemical composition, structural detail, seebeck coefficient, electrical and thermal conductivity, power factor, and figure of merit (ZT). We used the GPTArticleExtractor workflow, powered by large language models (LLM), to extract and curate data automatically from the scientific literature published in Elsevier journals. This process enabled the creation of a structured database that addresses the challenges of manual data collection. The open access database could stimulate data-driven research and advance thermoelectric material analysis and discovery.

Tags: Large language model, Database, Thermoelectric materials

267. Leveraging generative models with periodicity-aware, invertible and invariant representations for crystalline materials design, Nature Computational Science (May 2025)

Authors: Zhilong Wang, Fengqi You

Volume & Issue: Volume 5, Issue 5

Pages: 365-376

Abstract: The inverse design of functional crystalline materials via generative models is a rapidly growing field, but one that faces challenges in representation and generation architectures. This Perspective systematically examines these limitations and explores strategies for future improvement.

Tags: None

266. Automated processing and transfer of two-dimensional materials with robotics, Nature Chemical Engineering (May 2025)

Authors: Yixuan Zhao, Junhao Liao, Saiyu Bu, Zhaoning Hu, Jingyi Hu, Qi Lu, Mingpeng Shang, Bingbing Guo, Ge Chen, Qian Zhao, Kaicheng Jia, Guorui Wang, Ethan Errington, Qin Xie, Yanfeng Zhang, Miao Guo, Boyang Mao, Li Lin, Zhongfan Liu

Volume & Issue: Volume 2, Issue 5

Pages: 296-308

Abstract: Robust, high-throughput processing of two-dimensional materials produced by chemical vapor deposition requires a reliable and scalable technique to transfer the materials to a target substrate. An automated system for transferring chemical-vapor-deposited two-dimensional materials using robotics is developed, demonstrating high production capability with uniformity and repeatability of the transferred materials.

Tags: None

265. Self-driving nanoparticle synthesis, Nature Chemical Engineering (May 2025)

Authors: Tong Zhao, Yan Zeng

Volume & Issue: Volume 2, Issue 5

Pages: 290-291

Abstract: The multidimensional chemical parameter space for nanoparticle synthesis is too extensive for traditional exploration, but integrating robotic automation, microfluidics and machine learning can accelerate discovery and improve synthesis controllability.

Tags: None

Authors: Yingheng Tang, Wenbin Xu, Jie Cao, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W. Mahoney, Andy Nonaka, Zhi Yao

Abstract: Understanding and predicting the properties of inorganic materials is crucial for accelerating advancements in materials science and driving applications in energy, electronics, and beyond. Integrating material structure data with language-based information through multi-modal large language models (LLMs) offers great potential to support these efforts by enhancing human-AI interaction. However, a key challenge lies in integrating atomic structures at full resolution into LLMs. In this work, we introduce MatterChat, a versatile structure-aware multi-modal LLM that unifies material structural data and textual inputs into a single cohesive model. MatterChat employs a bridging module to effectively align a pretrained machine learning interatomic potential with a pretrained LLM, reducing training costs and enhancing flexibility. Our results demonstrate that MatterChat significantly improves performance in material property prediction and human-AI interaction, surpassing general-purpose LLMs such as GPT-4. We also demonstrate its usefulness in applications such as more advanced scientific reasoning and step-by-step material synthesis.

Tags: Computer Science - Artificial Intelligence, Computer Science - Machine Learning

263. Towards AI-driven autonomous growth of 2D materials based on a graphene case study, Communications Physics (April 25, 2025)

Authors: Leonardo Sabattini, Annalisa Coriolano, Corneel Casert, Stiven Forti, Edward S. Barnard, Fabio Beltram, Massimiliano Pontil, Stephen Whitelam, Camilla Coletti, Antonio Rossi

Volume & Issue: Volume 8, Issue 1

Pages: 180

Abstract: The scalable synthesis of two-dimensional (2D) materials remains a key challenge for their integration into solid-state technology. While exfoliation techniques have driven much of the scientific progress, they are impractical for large-scale applications. Advances in artificial intelligence (AI) now offer new strategies for materials synthesis. This study explores the use of an artificial neural network (ANN) trained via evolutionary methods to optimize graphene growth. The ANN autonomously refines a time-dependent synthesis protocol without prior knowledge of effective recipes. The evaluation is based on Raman spectroscopy, where outcomes resembling monolayer graphene receive higher scores. This feedback mechanism enables iterative improvements in synthesis conditions, progressively enhancing sample quality. By integrating AI-driven optimization into material synthesis, this work contributes to the development of scalable approaches for 2D materials, demonstrating the potential of machine learning in guiding experimental processes.

Tags: Condensed-matter physics, Synthesis of graphene

262. Science acceleration and accessibility with self-driving labs, Nature Communications (April 24, 2025)

Authors: Richard B. Canty, Jeffrey A. Bennett, Keith A. Brown, Tonio Buonassisi, Sergei V. Kalinin, John R. Kitchin, Benji Maruyama, Robert G. Moore, Joshua Schrier, Martin Seifrid, Shijing Sun, Tejs Vegge, Milad Abolhasani

Volume & Issue: Volume 16, Issue 1

Pages: 3856

Abstract: In the evolving landscape of scientific research, the complexity of global challenges demands innovative approaches to experimental planning and execution. Self-Driving Laboratories (SDLs) automate experimental tasks in chemical and materials sciences and the design and selection of experiments to optimize research processes and reduce material usage. This perspective explores improving access to SDLs via centralized facilities and distributed networks. We discuss the technical and collaborative challenges in realizing SDLs’ potential to enhance human–machine and human–human collaboration, ultimately fostering a more inclusive research community and facilitating previously untenable research projects.

Tags: Education, Research management, Techniques and instrumentation

261. Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction, Preprint (April 23, 2025)

Authors: Xiang Fu, Brandon M. Wood, Luis Barroso-Luque, Daniel S. Levine, Meng Gao, Misko Dzamba, C. Lawrence Zitnick

Abstract: Machine learning interatomic potentials (MLIPs) have become increasingly effective at approximating quantum mechanical calculations at a fraction of the computational cost. However, lower errors on held out test sets do not always translate to improved results on downstream physical property prediction tasks. In this paper, we propose testing MLIPs on their practical ability to conserve energy during molecular dynamic simulations. If passed, improved correlations are found between test errors and their performance on physical property prediction tasks. We identify choices which may lead to models failing this test, and use these observations to improve upon highly-expressive models. The resulting model, eSEN, provides state-of-the-art results on a range of physical property prediction tasks, including materials stability prediction, thermal conductivity prediction, and phonon calculations.

Tags: Physics - Computational Physics, Computer Science - Machine Learning

260. Harnessing database-supported high-throughput screening for the design of stable interlayers in halide-based all-solid-state batteries, Nature Communications (April 17, 2025)

Authors: Longyun Shen, Zilong Wang, Shengjun Xu, Ho Mei Law, Yanguang Zhou, Francesco Ciucci

Volume & Issue: Volume 16, Issue 1

Pages: 3687

Abstract: All-solid-state Li metal batteries (ASSLMBs) promise superior safety and energy density compared to conventional Li-ion batteries. However, their widespread adoption is hindered by detrimental interfacial reactions between solid-state electrolytes (SSEs) and the Li negative electrode, compromising long-term cycling stability. The challenges in directly observing these interfaces impede a comprehensive understanding of reaction mechanisms, necessitating first-principle simulations for designing novel interlayer materials. To overcome these limitations, we develop a database-supported high-throughput screening (DSHTS) framework for identifying stable interlayer materials compatible with both Li and SSEs. Using Li3InCl6 as a model SSE, we identify Li3OCl as a potential interlayer material. Experimental validation demonstrates significantly improved electrochemical performance in both symmetric- and full-cell configurations. A Li|Li3OCl|Li3InCl6|LiCoO2 cell exhibits an initial discharge capacity of 154.4 mAh/g (1.09 mA/cm2, 2.5–4.2 V vs. Li/Li+, 303 K) with 76.36% capacity retention after 1000 cycles. Notably, a cell with a conventional In-Li6PS5Cl interlayer delivers only 132.4 mAh/g and fails after 760 cycles. An additional interlayer-containing battery with Li(Ni0.8Co0.1Mn0.1)O2 as the positive electrode achieves an initial discharge capacity of 151.3 mAh/g (3.84 mA/cm2, 2.5–4.2 V vs. Li/Li+, 303 K), maintaining stable operation over 1650 cycles. The results demonstrate the promise of the DSHTS framework for identifying interlayer materials.

Tags: Batteries

259. Data-driven discovery of biaxially strained single atoms array for hydrogen production, Nature Communications (April 17, 2025)

Authors: Tao Zhang, Qitong Ye, Yipu Liu, Qingyi Liu, Zengyu Han, Dongshuang Wu, Zhiming Chen, Yue Li, Hong Jin Fan

Volume & Issue: Volume 16, Issue 1

Pages: 3644

Abstract: The structure-performance relationship for single atom catalysts has remained unclear due to the averaged coordination information obtained from most single-atom catalysts. Periodic array of single atoms may provide a platform to tackle this inaccuracy. Here, we develop a data-driven approach by incorporating high-throughput density functional theory computations and machine learning to screen candidates based on a library of 1248 sites from single atoms array anchored on biaxial-strained transition metal dichalcogenides. Our screening results in Au atom anchored on biaxial-strained MoSe2 surface via Au-Se3 bonds. Machine learning analysis identifies four key structural features by classifying the ΔGH* data. We show that the average band center of the adsorption sites can be a predictor for hydrogen adsorption energy. This prediction is validated by experiments which show single-atom Au array anchored on biaxial-strained MoSe2 archives 1000 hour-stability at 800 mA cm-2 towards acidic hydrogen evolution. Moreover, active hotspot consisting of Au atoms array and the neighboring Se atoms is unraveled for enhanced activity.

Tags: Computational chemistry, Electrocatalysis

258. A Multiagent-Driven Robotic AI Chemist Enabling Autonomous Chemical Research On Demand, Journal of the American Chemical Society (April 16, 2025)

Authors: Tao Song, Man Luo, Xiaolong Zhang, Linjiang Chen, Yan Huang, Jiaqi Cao, Qing Zhu, Daobin Liu, Baicheng Zhang, Gang Zou, Guoqing Zhang, Fei Zhang, Weiwei Shang, Yao Fu, Jun Jiang, Yi Luo

Volume & Issue: Volume 147, Issue 15

Pages: 12534-12545

Abstract: The successful integration of large language models (LLMs) into laboratory workflows has demonstrated robust capabilities in natural language processing, autonomous task execution, and collaborative problem-solving. This offers an exciting opportunity to realize the dream of autonomous chemical research on demand. Here, we report a robotic AI chemist powered by a hierarchical multiagent system, ChemAgents, based on an on-board Llama-3.1-70B LLM, capable of executing complex, multistep experiments with minimal human intervention. It operates through a Task Manager agent that interacts with human researchers and coordinates four role-specific agents─Literature Reader, Experiment Designer, Computation Performer, and Robot Operator─each leveraging one of four foundational resources: a comprehensive Literature Database, an extensive Protocol Library, a versatile Model Library, and a state-of-the-art Automated Lab. We demonstrate its versatility and efficacy through six experimental tasks of varying complexity, ranging from straightforward synthesis and characterization to more complex exploration and screening of experimental parameters, culminating in the discovery and optimization of functional materials. Additionally, we introduce a seventh task, where ChemAgents is deployed in a new robotic chemistry lab environment to autonomously perform photocatalytic organic reactions, highlighting ChemAgents’s scalability and adaptability. Our multiagent-driven robotic AI chemist showcases the potential of on-demand autonomous chemical research to accelerate discovery and democratize access to advanced experimental capabilities across academic disciplines and industries.

Tags: None

257. Generative deep learning for predicting ultrahigh lattice thermal conductivity materials, npj Computational Materials (April 11, 2025)

Authors: Liben Guo, Yuanbin Liu, Zekun Chen, Hongao Yang, Davide Donadio, Bingyang Cao

Volume & Issue: Volume 11, Issue 1

Pages: 97

Abstract: Developing materials with ultrahigh thermal conductivity is crucial for thermal management and energy conversion. The recent development of generative models and machine learning (ML) holds great promise for predicting new functional materials. However, these data-driven methods are not tailored to identifying energetically stable structures and accurately predicting their thermal properties, as they lack physical constraints and information about the complexity of atomic many-body interactions. Here, we show how combining deep generative models of crystal structures with quantum-accurate, fast ML interatomic potentials can accelerate the prediction of materials with ultrahigh lattice thermal conductivity while ensuring energy optimality. We exploit structural symmetry and similarity metrics derived from atomic coordination environments to enable fast exploration of the structural space produced by the generative model. Additionally, we propose an active-learning-based protocol for the on-the-fly training of ML potentials to achieve high-fidelity predictions of stability and lattice thermal conductivity in prospective materials. Applying this method to carbon materials, we screen 100,000 candidates and identify 34 carbon polymorphs, approximately a quarter of which had not been previously predicted, to have lattice thermal conductivity above 800 W m−1 K−1, reaching up to 2,400 W m−1 K−1 aside from diamond. These findings provide a viable pathway toward the ML-assisted prediction of periodic materials with exceptional thermal properties.

Tags: Design, synthesis and processing, Materials science

256. Electronic Structure Guided Inverse Design Using Generative Models, Preprint (April 08, 2025)

Authors: Shuyi Jia, Panchapakesan Ganesh, Victor Fung

Abstract: The electronic structure of a material fundamentally determines its underlying physical, and by extension, its functional properties. Consequently, the ability to identify or generate materials with desired electronic properties would enable the design of tailored functional materials. Traditional approaches relying on human intuition or exhaustive computational screening of known materials remain inefficient and resource-prohibitive for this task. Here, we introduce DOSMatGen, the first instance of a machine learning method which generates crystal structures that match a given desired electronic density of states. DOSMatGen is an E(3)-equivariant joint diffusion framework, and utilizes classifier-free guidance to accurately condition the generated materials on the density of states. Our experiments find this approach can successfully yield materials which are both stable and match closely with the desired density of states. Furthermore, this method is highly flexible and allows for finely controlled generation which can target specific templates or even individual sites within a material. This method enables a more physics-driven approach to designing new materials for applications including catalysts, photovoltaics, and superconductors.

Tags: Condensed Matter - Materials Science

255. Leveraging data mining, active learning, and domain adaptation for efficient discovery of advanced oxygen evolution electrocatalysts, Science Advances (April 04, 2025)

Authors: Rui Ding, Jianguo Liu, Kang Hua, Xuebin Wang, Xiaoben Zhang, Minhua Shao, Yuxin Chen, Junhong Chen

Volume & Issue: Volume 11, Issue 14

Pages: eadr9038

Abstract: Developing advanced catalysts for acidic oxygen evolution reaction (OER) is crucial for sustainable hydrogen production. This study presents a multistage machine learning (ML) approach to streamline the discovery and optimization of complex multimetallic catalysts. Our method integrates data mining, active learning, and domain adaptation throughout the materials discovery process. Unlike traditional trial-and-error methods, this approach systematically narrows the exploration space using domain knowledge with minimized reliance on subjective intuition. Then, the active learning module efficiently refines element composition and synthesis conditions through iterative experimental feedback. The process culminated in the discovery of a promising Ru-Mn-Ca-Pr oxide catalyst. Our workflow also enhances theoretical simulations with domain adaptation strategy, providing deeper mechanistic insights aligned with experimental findings. By leveraging diverse data sources and multiple ML strategies, we demonstrate an efficient pathway for electrocatalyst discovery and optimization. This comprehensive, data-driven approach represents a paradigm shift and potentially benchmark in electrocatalysts research.

Tags: None

254. A high-throughput experimentation platform for data-driven discovery in electrochemistry, Science Advances (April 04, 2025)

Authors: Dian-Zhao Lin, Kai-Jui Pan, Yuyin Li, Lingyu Zhang, Krish N. Jayarapu, Tianchen Li, Jasmine Vy Tran, William A. Goddard, Zhengtang Luo, Yayuan Liu

Volume & Issue: Volume 11, Issue 14

Pages: eadu4391

Abstract: Automating electrochemical analyses combined with artificial intelligence is poised to accelerate discoveries in renewable energy sciences and technologies. This study presents an automated high-throughput electrochemical characterization (AHTech) platform as a cost-effective and versatile tool for rapidly assessing liquid analytes. The Python-controlled platform combines a liquid handling robot, potentiostat, and customizable microelectrode bundles for diverse, reproducible electrochemical measurements in microtiter plates, minimizing chemical consumption and manual effort. To showcase the capability of AHTech, we screened a library of 180 small molecules as electrolyte additives for aqueous zinc metal batteries, generating data for training machine learning models to predict Coulombic efficiencies. Key molecular features governing additive performance were elucidated using Shapley Additive exPlanations and Spearman’s correlation, pinpointing high-performance candidates like cis-4-hydroxy-d-proline, which achieved an average Coulombic efficiency of 99.52% over 200 cycles. The workflow established herein is highly adaptable, offering a powerful framework for accelerating the exploration and optimization of extensive chemical spaces across diverse energy storage and conversion fields.

Tags: None

253. Physics-informed, dual-objective optimization of high-entropy-alloy nanozymes by a robotic AI chemist, Matter (April 02, 2025)

Authors: Man Luo, Zikai Xie, Huirong Li, Baicheng Zhang, Jiaqi Cao, Yan Huang, Hang Qu, Qing Zhu, Linjiang Chen, Jun Jiang, Yi Luo

Volume & Issue: Volume 8, Issue 4

Abstract: No abstract available

Tags: MAP 2: Benchmark, high-entropy alloys, autonomous chemistry, large language model, multi-objective optimization, nanozymes, robotic chemist

252. Towards multimodal foundation models in molecular cell biology, Nature (April 2025)

Authors: Haotian Cui, Alejandro Tejada-Lapuerta, Maria Brbić, Julio Saez-Rodriguez, Simona Cristea, Hani Goodarzi, Mohammad Lotfollahi, Fabian J. Theis, Bo Wang

Volume & Issue: Volume 640, Issue 8059

Pages: 623-633

Abstract: The development of multimodal foundation models, pretrained on diverse omics datasets, to unravel the intricate complexities of molecular cell biology is envisioned.

Tags: None

251. Network renormalization, Nature Reviews Physics (April 2025)

Authors: Andrea Gabrielli, Diego Garlaschelli, Subodh P. Patil, M. Ángeles Serrano

Volume & Issue: Volume 7, Issue 4

Pages: 203-219

Abstract: The renormalization group (RG) is a theoretical framework to transform systems across scales and identify critical points of phase transitions. In recent years, efforts have extended RG to complex networks, which challenge traditional assumptions. This Technical Review covers key approaches and open challenges.

Tags: None

250. Applications of natural language processing and large language models in materials discovery, npj Computational Materials (March 24, 2025)

Authors: Xue Jiang, Weiren Wang, Shaohan Tian, Hao Wang, Turab Lookman, Yanjing Su

Volume & Issue: Volume 11, Issue 1

Pages: 79

Abstract: The transformative impact of artificial intelligence (AI) technologies on materials science has revolutionized the study of materials problems. By leveraging well-characterized datasets derived from the scientific literature, AI-powered tools such as Natural Language Processing (NLP) have opened new avenues to accelerate materials research. The advances in NLP techniques and the development of large language models (LLMs) facilitate the efficient extraction and utilization of information. This review explores the application of NLP tools in materials science, focusing on automatic data extraction, materials discovery, and autonomous research. We also discuss the challenges and opportunities associated with utilizing LLMs and outline the prospects and advancements that will propel the field forward.

Tags: Theory and computation, Techniques and instrumentation

249. Elemental numerical descriptions to enhance classification and regression model performance for high-entropy alloys, npj Computational Materials (March 18, 2025)

Authors: Yan Zhang, Cheng Wen, Pengfei Dang, Xue Jiang, Dezhen Xue, Yanjing Su

Volume & Issue: Volume 11, Issue 1

Pages: 75

Abstract: The machine learning-assisted design of new alloy compositions often relies on the physical and chemical properties of elements to describe the materials. In the present study, we propose a strategy based on an evolutionary algorithm to generate new elemental numerical descriptions for high-entropy alloys (HEAs). These newly defined descriptions significantly enhance classification accuracy, increasing it from 77% to ~97% for recognizing FCC, BCC, and dual phases, compared to traditional empirical features. Our experimental validation demonstrates that our classification model, utilizing these new elemental numerical descriptions, successfully predicted the phases of 8 out of 9 randomly selected alloys, outperforming the same model based on traditional empirical features, which correctly predicted 4 out of 9. By incorporating these descriptions derived from a simple logistic regression model, the performance of various classifiers improved by at least 15%. Moreover, these new numerical descriptions for phase classification can be directly applied to regression model predictions of HEAs, reducing the error by 22% and improving the R2 value from 0.79 to 0.88 in hardness prediction. Testing on six different materials datasets, including ceramics and functional alloys, demonstrated that the obtained numerical descriptions achieved higher prediction precision across various properties, indicating the broad applicability of our strategy.

Tags: Computational methods, Mechanical properties, Metals and alloys

248. Artificial Intelligence and Multiscale Modeling for Sustainable Biopolymers and Bioinspired Materials, Advanced Materials (March 10, 2025)

Authors: Xing Quan Wang, Zeqing Jin, Dharneedar Ravichandran, Grace X. Gu

Volume & Issue: Volume 37, Issue 22

Pages: 2416901

Abstract: Biopolymers and bioinspired materials contribute to the construction of intricate hierarchical structures that exhibit advanced properties. The remarkable toughness and damage tolerance of such multilevel materials are conferred through the hierarchical assembly of their multiscale (i.e., atomistic to macroscale) components and architectures. Here, the functionality and mechanisms of biopolymers and bio-inspired materials at multilength scales are explored and summarized, focusing on biopolymer nanofibril configurations, biocompatible synthetic biopolymers, and bio-inspired composites. Their modeling methods with theoretical basis at multiple lengths and time scales are reviewed for biopolymer applications. Additionally, the exploration of artificial intelligence-powered methodologies is emphasized to realize improvements in these biopolymers from functionality, biodegradability, and sustainability to their characterization, fabrication process, and superior designs. Ultimately, a promising future for these versatile materials in the manufacturing of advanced materials across wider applications and greater lifecycle impacts is foreseen.

Tags: machine learning, artificial intelligence, bioinspired materials, biopolymer, computational mechanics, multiscale modeling

247. Transformers and genome language models, Nature Machine Intelligence (March 2025)

Authors: Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang

Volume & Issue: Volume 7, Issue 3

Pages: 346-362

Abstract: Micaela Consens et al. discuss and review the recent rise of transformer-based and large language models in genomics. They also highlight promising directions for genome language models beyond the transformer architecture.

Tags: None

246. The deep finite element method: A deep learning framework integrating the physics-informed neural networks with the finite element method, Computer Methods in Applied Mechanics and Engineering (March 01, 2025)

Authors: Wei Xiong, Xiangyun Long, Stéphane P. A. Bordas, Chao Jiang

Volume & Issue: Volume 436

Pages: 117681

Abstract: This paper proposes a deep finite element method (DFEM), which integrates physics-informed neural networks (PINNs) with the finite element method (FEM). The DFEM provides a versatile and robust framework that seamlessly combines observational data with physical laws. In this method, the outputs of the deep neural networks are utilized for approximating the displacements of grid nodes, while shape functions are used to construct trial functions within elements. A loss function is then designed by combining deep neural networks with stiffness matrices, avoiding the high-order derivative terms. Furthermore, the theoretical convergence stability of the proposed DFEM is analyzed. Finally, the training stability and predictive accuracy of the DFEM are validated through numerical examples. The results demonstrate that the proposed method greatly reduces the relative error compared to the deep energy method (DEM). Moreover, by integrating pre-training, the DFEM experiences a significant enhancement in training efficiency, reducing computation time by up to 7/8 compared to the FEM. This makes it a promising tool for fast computations and digital twin applications.

Tags: Deep energy method, Deep finite element method, Physics-informed neural networks, Solid elasticity mechanics, Three-dimensional structures

245. A generative model for inorganic materials design, Nature (March 2025)

Authors: Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie

Volume & Issue: Volume 639, Issue 8055

Pages: 624-632

Abstract: The design of functional materials with desired properties is essential in driving technological advances in areas such as energy storage, catalysis and carbon capture1–3. Generative models accelerate materials design by directly generating new materials given desired property constraints, but current methods have a low success rate in proposing stable crystals or can satisfy only a limited set of property constraints4–11. Here we present MatterGen, a model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. Compared with previous generative models4,12, structures produced by MatterGen are more than twice as likely to be new and stable, and more than ten times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stable, new materials with desired chemistry, symmetry and mechanical, electronic and magnetic properties. As a proof of concept, we synthesize one of the generated structures and measure its property value to be within 20% of our target. We believe that the quality of generated materials and the breadth of abilities of MatterGen represent an important advancement towards creating a foundational generative model for materials design.

Tags: Computer science, Theory and computation

244. CrystalFlow: A Flow-Based Generative Model for Crystalline Materials, Preprint (February 24, 2025)

Authors: Xiaoshan Luo, Zhenyu Wang, Qingchang Wang, Jian Lv, Lei Wang, Yanchao Wang, Yanming Ma

Abstract: Deep learning-based generative models have emerged as powerful tools for modeling complex data distributions and generating high-fidelity samples, offering a transformative approach to efficiently explore the configuration space of crystalline materials. In this work, we present CrystalFlow, a flow-based generative model specifically developed for the generation of crystalline materials. CrystalFlow constructs Continuous Normalizing Flows to model lattice parameters, atomic coordinates, and/or atom types, which are trained using Conditional Flow Matching techniques. Through an appropriate choice of data representation and the integration of a graph-based equivariant neural network, the model effectively captures the fundamental symmetries of crystalline materials, which ensures data-efficient learning and enables high-quality sampling. Our experiments demonstrate that CrystalFlow achieves state-of-the-art performance across standard generation benchmarks, and exhibits versatile conditional generation capabilities including producing structures optimized for specific external pressures or desired material properties. These features highlight the model’s potential to address realistic crystal structure prediction challenges, offering a robust and efficient framework for advancing data-driven research in condensed matter physics and material science.

Tags: Condensed Matter - Materials Science, Physics - Computational Physics

243. Genome modeling and design across all domains of life with Evo 2, Preprint (February 21, 2025)

Authors: Garyk Brixi, Matthew G. Durrant, Jerome Ku, Michael Poli, Greg Brockman, Daniel Chang, Gabriel A. Gonzalez, Samuel H. King, David B. Li, Aditi T. Merchant, Mohsen Naghipourfar, Eric Nguyen, Chiara Ricci-Tam, David W. Romero, Gwanggyu Sun, Ali Taghibakshi, Anton Vorontsov, Brandon Yang, Myra Deng, Liv Gorton, Nam Nguyen, Nicholas K. Wang, Etowah Adams, Stephen A. Baccus, Steven Dillmann, Stefano Ermon, Daniel Guo, Rajesh Ilango, Ken Janik, Amy X. Lu, Reshma Mehta, Mohammad R. K. Mofrad, Madelena Y. Ng, Jaspreet Pannu, Christopher Ré, Jonathan C. Schmok, John St John, Jeremy Sullivan, Kevin Zhu, Greg Zynda, Daniel Balsam, Patrick Collison, Anthony B. Costa, Tina Hernandez-Boussard, Eric Ho, Ming-Yu Liu, Thomas McGrath, Kimberly Powell, Dave P. Burke, Hani Goodarzi, Patrick D. Hsu, Brian L. Hie

Abstract: All of life encodes information with DNA. While tools for sequencing, synthesis, and editing of genomic code have transformed biological research, intelligently composing new biological systems would also require a deep understanding of the immense complexity encoded by genomes. We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution. Evo 2 learns from DNA sequence alone to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without task-specific finetuning. Applying mechanistic interpretability analyses, we reveal that Evo 2 autonomously learns a breadth of biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions. Beyond its predictive capabilities, Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Guiding Evo 2 via inference-time search enables controllable generation of epigenomic structure, for which we demonstrate the first inference-time scaling results in biology. We make Evo 2 fully open, including model parameters, training code, inference code, and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity.

Tags: None

242. Accelerating crystal structure search through active learning with neural networks for rapid relaxations, npj Computational Materials (February 20, 2025)

Authors: Stefaan S. P. Hessmann, Kristof T. Schütt, Niklas W. A. Gebauer, Michael Gastegger, Tamio Oguchi, Tomoki Yamashita

Volume & Issue: Volume 11, Issue 1

Pages: 44

Abstract: Global optimization of crystal compositions is a significant yet computationally intensive method to identify stable structures within chemical space. The specific physical properties linked to a three-dimensional atomic arrangement make this an essential task in the development of new materials. We present a method that efficiently uses active learning of neural network force fields for structure relaxation, minimizing the required number of steps in the process. This is achieved by neural network force fields equipped with uncertainty estimation, which iteratively guide a pool of randomly generated candidates toward their respective local minima. Using this approach, we are able to effectively identify the most promising candidates for further evaluation using density functional theory (DFT). Our method not only reliably reduces computational costs by up to two orders of magnitude across the benchmark systems Si16, Na8Cl8, Ga8As8 and Al4O6 but also excels in finding the most stable minimum for the unseen, more complex systems Si46 and Al16O24. Moreover, we demonstrate at the example of Si16 that our method can find multiple relevant local minima while only adding minor computational effort.

Tags: Computational methods, Atomistic models

241. Autonomous platform for solution processing of electronic polymers, Nature Communications (February 17, 2025)

Authors: Chengshi Wang, Yeon-Ju Kim, Aikaterini Vriza, Rohit Batra, Arun Baskaran, Naisong Shan, Nan Li, Pierre Darancet, Logan Ward, Yuzi Liu, Maria K. Y. Chan, Subramanian K. R. S. Sankaranarayanan, H. Christopher Fry, C. Suzanne Miller, Henry Chan, Jie Xu

Volume & Issue: Volume 16, Issue 1

Pages: 1498

Abstract: The manipulation of electronic polymers’ solid-state properties through processing is crucial in electronics and energy research. Yet, efficiently processing electronic polymer solutions into thin films with specific properties remains a formidable challenge. We introduce Polybot, an artificial intelligence (AI) driven automated material laboratory designed to autonomously explore processing pathways for achieving high-conductivity, low-defect electronic polymers films. Leveraging importance-guided Bayesian optimization, Polybot efficiently navigates a complex 7-dimensional processing space. In particular, the automated workflow and algorithms effectively explore the search space, mitigate biases, employ statistical methods to ensure data repeatability, and concurrently optimize multiple objectives with precision. The experimental campaign yields scale-up fabrication recipes, producing transparent conductive thin films with averaged conductivity exceeding 4500 S/cm. Feature importance analysis and morphological characterizations reveal key design factors. This work signifies a significant step towards transforming the manufacturing of electronic polymers, highlighting the potential of AI-driven automation in material science.

Tags: Electronic devices, Chemical engineering

240. Developing novel low-density high-entropy superalloys with high strength and superior creep resistance guided by automated machine learning, Acta Materialia (February 15, 2025)

Authors: Yancheng Li, Jingyu Pang, Zhen Li, Qing Wang, Zhenhua Wang, Jinlin Li, Hongwei Zhang, Zengbao Jiao, Chuang Dong, Peter K. Liaw

Volume & Issue: Volume 285

Pages: 120656

Abstract: Design of novel superalloys with low density, high strength, and great microstructural stability is a big challenge. This work used an automated machine learning (ML) model to explore high-entropy superalloys (HESAs) with coherent γ’ nanoprecipitates in the FCC-γ matrix. The database samples were firstly preprocessed via the domain-knowledge before ML. Both autogluon and genetic algorithm methods were applied to establish the relationship between the alloy composition and yield strength and to deal with the optimization problem in ML. Thus, the ML model cannot only predict the strength with a high accuracy (R2 > 95 %), but also design compositions efficiently with desired property in multi-component systems. Novel HESAs with targeted strengths and densities were predicted by ML and then validated by a series of experiments. It is found that the experimental results are well consistent with the predicted properties, as evidenced by the fact that the designed Ni-5.82Fe-15.34Co-2.53Al-2.99Ti-2.90Nb-15.97Cr-2.50Mo (wt.%) HESA has a yield strength of 1346 MPa at room temperature and 1061 MPa at 1023 K and a density of 7.98 g/cm3. Moreover, it exhibits superior creep resistance with a rupture lifetime of 149 h under 480 MPa at 1023 K, outperforming most conventional wrought superalloys. Additionally, the coarsening rate of γ’ nanoprecipitates in these alloys is extremely slow at 1023 K, showing a prominent microstructural stability. The strengthening and deformation mechanisms were further discussed. This framework provides a new pathway to realize the property-oriented composition design for high-performance complex alloys via ML.

Tags: Machine learning, Deformation mechanisms, High-entropy superalloys, Mechanical mechanisms, γ/γ’ microstructural stability

239. Machine Learning in Solid-State Hydrogen Storage Materials: Challenges and Perspectives, Advanced Materials (February 12, 2025)

Authors: Panpan Zhou, Qianwen Zhou, Xuezhang Xiao, Xiulin Fan, Yongjin Zou, Lixian Sun, Jinghua Jiang, Dan Song, Lixin Chen

Volume & Issue: Volume 37, Issue 6

Pages: 2413430

Abstract: Machine learning (ML) has emerged as a pioneering tool in advancing the research application of high-performance solid-state hydrogen storage materials (HSMs). This review summarizes the state-of-the-art research of ML in resolving crucial issues such as low hydrogen storage capacity and unfavorable de-/hydrogenation cycling conditions. First, the datasets, feature descriptors, and prevalent ML models tailored for HSMs are described. Specific examples include the successful application of ML in titanium-based, rare-earth-based, solid solution, magnesium-based, and complex HSMs, showcasing its role in exploiting composition–structure–property relationships and designing novel HSMs for specific applications. One of the representative ML works is the single-phase Ti-based HSM with superior cost-effective and comprehensive properties, tailored to fuel cell hydrogen feeding system at ambient temperature and pressure through high-throughput composition-performance scanning. More importantly, this review also identifies and critically analyzes the key challenges faced by ML in this domain, including poor data quality and availability, and the balance between model interpretability and accuracy, together with feasible countermeasures suggested to ameliorate these problems. In summary, this work outlines a roadmap for enhancing ML’s utilization in solid-state hydrogen storage research, promoting more efficient and sustainable energy storage solutions.

Tags: machine learning, high-throughput material design, hydrogen storage materials, mechanism mining

238. Machine Learning in Polymer Research, Advanced Materials (February 09, 2025)

Authors: Wei Ge, Ramindu De Silva, Yanan Fan, Scott A. Sisson, Martina H. Stenzel

Volume & Issue: Volume 37, Issue 11

Pages: 2413695

Abstract: Machine learning is increasingly being applied in polymer chemistry to link chemical structures to macroscopic properties of polymers and to identify chemical patterns in the polymer structures that help improve specific properties. To facilitate this, a chemical dataset needs to be translated into machine readable descriptors. However, limited and inadequately curated datasets, broad molecular weight distributions, and irregular polymer configurations pose significant challenges. Most off the shelf mathematical models often need refinement for specific applications. Addressing these challenges demand a close collaboration between chemists and mathematicians as chemists must formulate research questions in mathematical terms while mathematicians are required to refine models for specific applications. This review unites both disciplines to address dataset curation hurdles and highlight advances in polymer synthesis and modeling that enhance data availability. It then surveys ML approaches used to predict solid-state properties, solution behavior, composite performance, and emerging applications such as drug delivery and the polymer–biology interface. A perspective of the field is concluded and the importance of FAIR (findability, accessibility, interoperability, and reusability) data and the integration of polymer theory and data are discussed, and the thoughts on the machine–human interface are shared.

Tags: machine learning, chemical descriptors, FAIR data, polymers

237. ORGANA: A robotic assistant for automated chemistry experimentation and characterization, Matter (February 05, 2025)

Authors: Kourosh Darvish, Marta Skreta, Yuchi Zhao, Naruki Yoshikawa, Sagnik Som, Miroslav Bogdanovic, Yang Cao, Han Hao, Haoping Xu, Alán Aspuru-Guzik, Animesh Garg, Florian Shkurti

Volume & Issue: Volume 8, Issue 2

Pages: 101897

Abstract: Chemistry experiments can be resource- and labor-intensive, often requiring manual tasks like polishing electrodes in electrochemistry. Traditional lab automation infrastructure faces challenges adapting to new experiments. To address this, we introduce ORGANA, an assistive robotic system that automates diverse chemistry experiments using decision-making and perception tools. It makes decisions with chemists in the loop to control robots and lab devices. ORGANA interacts with chemists using large language models (LLMs) to derive experiment goals, handle disambiguation, and provide experiment logs. ORGANA plans and executes complex tasks with visual feedback while supporting scheduling and parallel task execution. We demonstrate ORGANA’s capabilities in solubility, pH measurement, recrystallization, and electrochemistry experiments. In electrochemistry, it executes a 19-step plan in parallel to characterize quinone derivatives for flow batteries. Our user study shows ORGANA reduces frustration and physical demand by over 50%, with users saving an average of 80.3% of their time when using it.

Tags: self-driving labs, large language models, assistive robotics, automatic report generation, computer vision, electrochemistry, task and motion planning with scheduling

236. Harnessing Large Language Models to Collect and Analyze Metal–Organic Framework Property Data Set, Journal of the American Chemical Society (February 05, 2025)

Authors: Yeonghun Kang, Wonseok Lee, Taeun Bae, Seunghee Han, Huiwon Jang, Jihan Kim

Volume & Issue: Volume 147, Issue 5

Pages: 3943-3958

Abstract: This research focused on the efficient collection of experimental metal–organic framework (MOF) data from scientific literature to address the challenges of accessing hard-to-find data and improving the quality of information available for machine learning studies in materials science. Utilizing a chain of advanced large language models (LLMs), we developed a systematic approach to extract and organize MOF data into a structured format. Our methodology successfully compiled information from more than 40,000 research articles, creating a comprehensive and ready-to-use data set. Specifically, data regarding MOF synthesis conditions and properties were extracted from both tables and text and then analyzed. Subsequently, we utilized the curated database to analyze the relationships between synthesis conditions, properties, and structure. Through machine learning, we identified the existence of a gap between simulation data and experimental data, and further analysis revealed the factors contributing to this discrepancy. Additionally, we leveraged the extracted synthesis condition data to develop a synthesis condition recommender system. This system suggests optimal synthesis conditions based on the provided precursors, offering a practical tool to refine synthesis strategies. This underscores the importance of experimental datasets in advancing MOF research.

Tags: None

235. Automating the practice of science: Opportunities, challenges, and implications, Proceedings of the National Academy of Sciences (February 04, 2025)

Authors: Sebastian Musslick, Laura K. Bartlett, Suyog H. Chandramouli, Marina Dubova, Fernand Gobet, Thomas L. Griffiths, Jessica Hullman, Ross D. King, J. Nathan Kutz, Christopher G. Lucas, Suhas Mahesh, Franco Pestilli, Sabina J. Sloman, William R. Holmes

Volume & Issue: Volume 122, Issue 5

Pages: e2401238121

Abstract: Automation transformed various aspects of our human civilization, revolutionizing industries and streamlining processes. In the domain of scientific inquiry, automated approaches emerged as powerful tools, holding promise for accelerating discovery, enhancing reproducibility, and overcoming the traditional impediments to scientific progress. This article evaluates the scope of automation within scientific practice and assesses recent approaches. Furthermore, it discusses different perspectives to the following questions: where do the greatest opportunities lie for automation in scientific practice?; What are the current bottlenecks of automating scientific practice?; and What are significant ethical and practical consequences of automating scientific practice? By discussing the motivations behind automated science, analyzing the hurdles encountered, and examining its implications, this article invites researchers, policymakers, and stakeholders to navigate the rapidly evolving frontier of automated scientific practice.

Tags: None

234. Exploration of Chemical Space Through Automated Reasoning, Angewandte Chemie (February 03, 2025)

Authors: Judith Clymo, Christopher M. Collins, Katie Atkinson, Matthew S. Dyer, Michael W. Gaultois, Vladimir V. Gusev, Matthew J. Rosseinsky, Sven Schewe

Volume & Issue: Volume 137, Issue 6

Pages: e202417657

Abstract: The vast size of composition space poses a significant challenge for materials chemistry: exhaustive enumeration of potentially interesting compositions is typically infeasible, hindering assessment of important criteria ranging from novelty and stability to cost and performance. We report a tool, Comgen, for the efficient exploration of composition space, which makes use of logical methods from computer science used for proving theorems. We demonstrate how these techniques, which have not previously been applied to materials discovery, can enable reasoning about scientific domain knowledge provided by human experts. Comgen accepts a variety of user-specified criteria, converts these into an abstract form, and utilises a powerful automated reasoning algorithm to identify compositions that satisfy these user requirements, or prove that the requirements cannot be simultaneously satisfied. In contrast to machine learning techniques, explicitly reasoning about domain knowledge, rather than making inferences from data, ensures that Comgen’s outputs are fully interpretable and provably correct. Users interact with Comgen through a high-level Python interface. We illustrate use of the tool with several case studies focused on the search for new ionic conductors. Further, we demonstrate the integration of Comgen into an end-to-end automated workflow to propose and evaluate candidate compositions quantitatively, prior to experimental investigation. This highlights the potential of automated formal reasoning in materials chemistry.

Tags: materials discovery, Artificial intelligence, automated reasoning, explainable models, materials chemistry

233. From text to insight: large language models for chemical data extraction, Chemical Society Reviews (February 03, 2025)

Authors: Mara Schilling-Wilhelmi, Martiño Ríos-García, Sherjeel Shabih, María Victoria Gil, Santiago Miret, Christoph T. Koch, José A. Márquez, Kevin Maik Jablonka

Abstract: No abstract available

Tags: None

232. Balancing autonomy and expertise in autonomous synthesis laboratories, Nature Computational Science (February 2025)

Authors: Xiaozhao Liu, Bin Ouyang, Yan Zeng

Volume & Issue: Volume 5, Issue 2

Pages: 92-94

Abstract: Autonomous synthesis laboratories promise to streamline the plan–make–measure–analyze iteration loop. Here, we comment on the barriers in the field, the promise of a human on-the-loop approach, and strategies for optimizing accessibility, accuracy, and efficiency of autonomous laboratories.

Tags: None

231. Knowledge-guided large language model for material science, Review of Materials Research (February 01, 2025)

Authors: Guanjie Wang, Jingjing Hu, Jian Zhou, Sen Liu, Qingjiang Li, Zhimei Sun

Volume & Issue: Volume 1, Issue 2

Pages: 100007

Abstract: With ChatGPT starting a storm of transformative applications worldwide, the advent of large language models (LLMs) has revolutionized the paradigm of scientific research, shifting from data-driven methods to AI-driven science. While LLMs have demonstrated significant promise in many fields of science, the development of material knowledge-guided, domain-specific LLMs remains challenges. In this review, the key milestones of LLMs are discussed, and guidelines for building LLMs are provided, including determining objectives, designing model architectures, data curation, and establishing training and evaluation frameworks. Furthermore, methodologies for creating domain-specific models through fine-tuning, retrieval-augmented generation, prompt engineering, and AI agents are explored. Additionally, the applications of LLMs in materials science are investigated, ranging from structured information extraction and property prediction to autonomous laboratories and robotics. Finally, challenges such as resource demands, dataset quality, benchmarking, hallucination mitigation, and AI safety are reported alongside emerging opportunities, positioning LLMs as a pivotal tool in advancing materials discovery and innovation.

Tags: Materials science, Large language model, AI agent, AI for science, Autonomous laboratory

230. Battery lifetime prediction across diverse ageing conditions with inter-cell deep learning, Nature Machine Intelligence (February 2025)

Authors: Han Zhang, Yuqi Li, Shun Zheng, Ziheng Lu, Xiaofan Gui, Wei Xu, Jiang Bian

Volume & Issue: Volume 7, Issue 2

Pages: 270-277

Abstract: Accurately predicting battery lifetime in early cycles holds tremendous value in real-world applications. However, this task poses significant challenges due to diverse factors influencing complex battery capacity degradation, such as cycling protocols, ambient temperatures and electrode materials. Moreover, cycling under specific conditions is both resource-intensive and time-consuming. Existing predictive models, primarily developed and validated within a restricted set of ageing conditions, thus raise doubts regarding their extensive applicability. Here we introduce BatLiNet, a deep learning framework tailored to predict battery lifetime reliably across a variety of ageing conditions. The distinctive design is integrating an inter-cell learning mechanism to predict the lifetime differences between two battery cells. This mechanism, when combined with conventional single-cell learning, enhances the stability of lifetime predictions for a target cell under varied ageing conditions. Our experimental results, derived from a broad spectrum of ageing conditions, demonstrate BatLiNet’s superior accuracy and robustness compared to existing models. BatLiNet also exhibits transferring capabilities across different battery chemistries, benefitting scenarios with limited resources. We expect this study could promote exploration of cross-cell insights and facilitate battery research across comprehensive ageing factors.

Tags: Computer science, Batteries

229. A guidance to intelligent metamaterials and metamaterials intelligence, Nature Communications (January 29, 2025)

Authors: Chao Qian, Ido Kaminer, Hongsheng Chen

Volume & Issue: Volume 16, Issue 1

Pages: 1154

Abstract: The bidirectional interactions between metamaterials and artificial intelligence have recently attracted immense interest to motivate scientists to revisit respective communities, giving rise to the proliferation of intelligent metamaterials and metamaterials intelligence. Owning to the strong nonlinear fitting and generalization ability, artificial intelligence is poised to serve as a materials-savvy surrogate electromagnetic simulator and a high-speed computing nucleus that drives numerous self-driving metamaterial applications, such as invisibility cloak, imaging, detection, and wireless communication. In turn, metamaterials create a versatile electromagnetic manipulator for wave-based analogue computing to be complementary with conventional electronic computing. In this Review, we stand from a unified perspective to review the recent advancements in these two nascent fields. For intelligent metamaterials, we discuss how artificial intelligence, exemplified by deep learning, streamline the photonic design, foster independent working manner, and unearth latent physics. For metamaterials intelligence, we particularly unfold three canonical categories, i.e., wave-based neural network, mathematical operation, and logic operation, all of which directly execute computation, detection, and inference task in physical space. Finally, future challenges and perspectives are pinpointed, including data curation, knowledge migration, and imminent practice-oriented issues, with a great vision of ushering in the free management of entire electromagnetic space.

Tags: Materials for optics, Optical materials and structures

228. InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders, Preprint (January 28, 2025)

Authors: Elana Simon, James Zou

Abstract: Protein language models (PLMs) have demonstrated remarkable success in protein modeling and design, yet their internal mechanisms for predicting structure and function remain poorly understood. Here we present a systematic approach to extract and analyze interpretable features from PLMs using sparse autoencoders (SAEs). By training SAEs on embeddings from the PLM ESM-2, we identify thousands of human-interpretable latent features per layer that highlight hundreds of known biological concepts such as binding sites, structural motifs, and functional domains. In contrast, examining individual neurons in ESM-2 reveals significantly less conceptual alignment, suggesting that PLMs represent most concepts in superposition. We further demonstrate that a larger PLM (ESM-2 with 650M parameters) captures substantially more interpretable concepts than a smaller PLM (ESM-2 with 8M parameters). Beyond capturing known annotations, we show that ESM-2 learns coherent concepts that do not map onto existing annotations and propose a pipeline using language models to automatically interpret novel latent features learned by the SAEs. As practical applications, we demonstrate how these latent features can fill in missing annotations in protein databases and enable targeted steering of protein sequence generation. Our results demonstrate that PLMs encode rich, interpretable representations of protein biology and we propose a systematic framework to uncover and understand these latent features. In the process, we recover both known biology and potentially new protein motifs. As community resources, we introduce InterPLM (interPLM.ai), an interactive visualization platform for investigating learned PLM features, and release code for training and analysis at github.com/ElanaPearl/interPLM.

Tags: None

227. Probing out-of-distribution generalization in machine learning for materials, Communications Materials (January 11, 2025)

Authors: Kangming Li, Andre Niyongabo Rubungo, Xiangyun Lei, Daniel Persaud, Kamal Choudhary, Brian DeCost, Adji Bousso Dieng, Jason Hattrick-Simpers

Volume & Issue: Volume 6, Issue 1

Pages: 9

Abstract: Scientific machine learning (ML) aims to develop generalizable models, yet assessments of generalizability often rely on heuristics. Here, we demonstrate in the materials science setting that heuristic evaluations lead to biased conclusions of ML generalizability and benefits of neural scaling, through evaluations of out-of-distribution (OOD) tasks involving unseen chemistry or structural symmetries. Surprisingly, many tasks demonstrate good performance across models, including boosted trees. However, analysis of the materials representation space shows that most test data reside within regions well-covered by training data, while poorly-performing tasks involve data outside the training domain. For these challenging tasks, increasing training size or time yields limited or adverse effects, contrary to traditional neural scaling trends. Our findings highlight that most OOD tests reflect interpolation, not true extrapolation, leading to overestimations of generalizability and scaling benefits. This emphasizes the need for rigorously challenging OOD benchmarks.

Tags: Chemistry, Materials science

226. Integrating artificial intelligence with mechanistic epidemiological modeling: a scoping review of opportunities and challenges, Nature Communications (January 10, 2025)

Authors: Yang Ye, Abhishek Pandey, Carolyn Bawden, Dewan Md Sumsuzzman, Rimpi Rajput, Affan Shoukat, Burton H. Singer, Seyed M. Moghadas, Alison P. Galvani

Volume & Issue: Volume 16, Issue 1

Pages: 581

Abstract: Integrating prior epidemiological knowledge embedded within mechanistic models with the data-mining capabilities of artificial intelligence (AI) offers transformative potential for epidemiological modeling. While the fusion of AI and traditional mechanistic approaches is rapidly advancing, efforts remain fragmented. This scoping review provides a comprehensive overview of emerging integrated models applied across the spectrum of infectious diseases. Through systematic search strategies, we identified 245 eligible studies from 15,460 records. Our review highlights the practical value of integrated models, including advances in disease forecasting, model parameterization, and calibration. However, key research gaps remain. These include the need for better incorporation of realistic decision-making considerations, expanded exploration of diverse datasets, and further investigation into biological and socio-behavioral mechanisms. Addressing these gaps will unlock the synergistic potential of AI and mechanistic modeling to enhance understanding of disease dynamics and support more effective public health planning and response.

Tags: Computational models, Epidemiology, Infectious diseases

225. Transforming the synthesis of carbon nanotubes with machine learning models and automation, Matter (January 08, 2025)

Authors: Yue Li, Shurui Wang, Zhou Lv, Zhaoji Wang, Yunbiao Zhao, Ying Xie, Yang Xu, Liu Qian, Yaodong Yang, Ziqiang Zhao, Jin Zhang

Volume & Issue: Volume 8, Issue 1

Abstract: No abstract available

Tags: machine learning, artificial intelligence, carbon nanotubes, catalyst design, controlled synthesis, experimental automation, MAP 5: Improvement

224. Synthesis Strategies for High Entropy Nanoparticles, Advanced Materials (January 08, 2025)

Authors: Linlin Yang, Ren He, Jiali Chai, Xueqiang Qi, Qian Xue, Xiaoyu Bi, Jing Yu, Zixu Sun, Lu Xia, Kaiwen Wang, Nilotpal Kapuria, Junshan Li, Ahmad Ostovari Moghaddam, Andreu Cabot

Volume & Issue: Volume 37, Issue 1

Pages: 2412337

Abstract: Nanoparticles (NPs) of high entropy materials (HEMs) have attracted significant attention due to their versatility and wide range of applications. HEM NPs can be synthesized by fragmenting bulk HEMs or disintegrating and recrystallizing them. Alternatively, directly producing HEMs in NP form from atomic/ionic/molecular precursors presents a significant challenge. A widely adopted strategy involves thermodynamically driving HEM NP formation by leveraging the entropic contribution but incorporating strategies to limit NP growth at the elevated temperatures used for maximizing entropy. A second approach is to kinetically drive HEM NP formation by promoting rapid reactions of homogeneous reactant mixtures or using highly diluted precursor dissolutions. Additionally, experimental evidence suggests that enthalpy plays a significant role in driving HEM NP formation processes at moderate temperatures, with the high energy cost of generating additional surfaces and interfaces at the nanoscale stabilizing the HEM phase. This review critically assesses the various synthesis strategies developed for HEM NP preparation, highlighting key illustrative examples and offering insights into the underlying formation mechanisms. Such insights are critical for fine-tuning experimental conditions to achieve specific outcomes, ultimately enabling the effective synthesis of optimized generations of these advanced materials for both current and emerging applications across various scientific and technological fields.

Tags: nanoparticles, high entropy alloy, high entropy materials, nanocrystals

223. Development and validation of a real-time prediction model for acute kidney injury in hospitalized patients, Nature Communications (January 02, 2025)

Authors: Yuhui Zhang, Damin Xu, Jianwei Gao, Ruiguo Wang, Kun Yan, Hong Liang, Juan Xu, Youlu Zhao, Xizi Zheng, Lingyi Xu, Jinwei Wang, Fude Zhou, Guopeng Zhou, Qingqing Zhou, Zhao Yang, Xiaoli Chen, Yulan Shen, Tianrong Ji, Yunlin Feng, Ping Wang, Jundong Jiao, Li Wang, Jicheng Lv, Li Yang

Volume & Issue: Volume 16, Issue 1

Pages: 68

Abstract: Early prediction of acute kidney injury (AKI) may provide a crucial opportunity for AKI prevention. To date, no prediction model targeting AKI among general hospitalized patients in developing countries has been published. Here we show a simple, real-time, interpretable AKI prediction model for general hospitalized patients developed from a large tertiary hospital in China, which has been validated across five independent, geographically distinct, different tiered hospitals. The model containing 20 readily available variables demonstrates consistent, high levels of predictive discrimination in validation cohort, with AUCs for serum creatinine-based AKI and severe AKI within 48 h ranging from 0.74–0.85 and 0.83–0.90 for transported models and from 0.81–0.90 and 0.88–0.95 for refitted models, respectively. With optimal probability cutoffs, the refitted model could predict AKI at a median of 72 (24–198) hours in advance in internal validation, and 54–90 h in advance in external validation. Broad application of the model in the future may provide an effective, convenient and cost-effective approach for AKI prevention.

Tags: Acute kidney injury, Disease prevention, Risk factors

222. Machine learning for the physics of climate, Nature Reviews Physics (January 2025)

Authors: Annalisa Bracco, Julien Brajard, Henk A. Dijkstra, Pedram Hassanzadeh, Christian Lessig, Claire Monteleoni

Volume & Issue: Volume 7, Issue 1

Pages: 6-20

Abstract: Artificial intelligence techniques, specifically machine learning, are being increasingly applied to climate physics owing to the growing availability of big data and increasing computational power. This Review focuses on key results obtained with machine learning in reconstruction, sub-grid-scale parameterization, and weather or climate prediction.

Tags: None

221. Topological data analysis assisted machine learning for polar topological structures in oxide superlattices, Acta Materialia (January 01, 2025)

Authors: Guanshihan Du, Linming Zhou, Yuhui Huang, Yongjun Wu, Zijian Hong

Volume & Issue: Volume 282

Pages: 120467

Abstract: Ferroelectric topological phases and phase transitions have been extensively investigated recently due to the rich physical insights and potential applications in next-generation electronic devices. However, precisely predicting the topological phase transitions under different internal and external conditions in polar oxide superlattice systems is challenging due to the complex energy competitions and highly nonlinear kinetics involved. Herein, we adopted a state-of-the-art mathematical concept called “persistent homology” from topological data analysis to extract the essential topological features for the polarization data in various topological structures. By implementing the persistent image as the descriptor, support vector regression (SVR) based convolutional neural network (CNN) models are developed for the automated and high precision classification and regression of topological states based on high-dimensional phase-field simulation data of the PTO/STO superlattice. Using this method, we can automatically construct the strain and electric field phase diagrams in seconds with high throughput phase-field data. We hope to spur further interest in the integration of state-of-the-art mathematical tools, machine learning algorithms, and condensed matter physics for predictions of topological phase transitions.

Tags: Machine learning, Persistent homology, Polar topological structures, Topological data analysis

220. AI4Materials: Transforming the landscape of materials science and enigneering, Review of Materials Research (January 01, 2025)

Authors: Xue Jiang, Dezhen Xue, Yang Bai, William Yi Wang, Jianjun Liu, Mingli Yang, Yanjing Su

Volume & Issue: Volume 1, Issue 1

Pages: 100010

Abstract: New materials, crucial for economic and technological progress, are prioritized globally with strategies to accelerate their advancement through big data and AI. AI for Materials (AI4Mater) serves as an overall framework for integrating AI into Materials Science and Engineering, which is structured around three main elements: materials data infrastructure, AI4Mater techniques, and applications. This article reviews the development procedure and recent innovations in materials data infrastructure, machine learning in materials, autonomous experiment, intelligent computation, and intelligent manufacture. These efforts aim to foster open access to AI resources and enhance the collective advancement of materials science, ultimately accelerating breakthroughs and elevating the engineering application of new materials in a sustainable manner.

Tags: Machine learning, Autonomous experiment, Intelligent computation, Intelligent manufacture, Materials data infrastructure

219. Computational microscopy with coherent diffractive imaging and ptychography, Nature (January 2025)

Authors: Jianwei Miao

Volume & Issue: Volume 637, Issue 8045

Pages: 281-295

Abstract: This review highlights transformative advancements in computational microscopy, encompassing coherent diffractive imaging and ptychography, which unify microscopy and crystallography to achieve unparalleled resolution, precision, and large fields of view, enabling diverse applications and driving breakthroughs across multidisciplinary sciences.

Tags: None

218. Probabilistic weather forecasting with machine learning, Nature (January 2025)

Authors: Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, Matthew Willson

Volume & Issue: Volume 637, Issue 8044

Pages: 84-90

Abstract: Weather forecasts are fundamentally uncertain, so predicting the range of probable weather scenarios is crucial for important decisions, from warning the public about hazardous weather to planning renewable energy use. Traditionally, weather forecasts have been based on numerical weather prediction (NWP)1, which relies on physics-based simulations of the atmosphere. Recent advances in machine learning (ML)-based weather prediction (MLWP) have produced ML-based models with less forecast error than single NWP simulations2,3. However, these advances have focused primarily on single, deterministic forecasts that fail to represent uncertainty and estimate risk. Overall, MLWP has remained less accurate and reliable than state-of-the-art NWP ensemble forecasts. Here we introduce GenCast, a probabilistic weather model with greater skill and speed than the top operational medium-range weather forecast in the world, ENS, the ensemble forecast of the European Centre for Medium-Range Weather Forecasts4. GenCast is an ML weather prediction method, trained on decades of reanalysis data. GenCast generates an ensemble of stochastic 15-day global forecasts, at 12-h steps and 0.25° latitude–longitude resolution, for more than 80 surface and atmospheric variables, in 8 min. It has greater skill than ENS on 97.2% of 1,320 targets we evaluated and better predicts extreme weather, tropical cyclone tracks and wind power production. This work helps open the next chapter in operational weather forecasting, in which crucial weather-dependent decisions are made more accurately and efficiently.

Tags: Computer science, Natural hazards, Atmospheric dynamics

217. Role of the human-in-the-loop in emerging self-driving laboratories for heterogeneous catalysis, Nature Catalysis (January 2025)

Authors: Christoph Scheurer, Karsten Reuter

Volume & Issue: Volume 8, Issue 1

Pages: 13-19

Abstract: Uses of machine learning and automation are increasing and these techniques are becoming popular in catalysis research. This Perspective discusses how active learning workflows and human intervention should be optimized to ensure the most efficient progress for emerging self-driving laboratories performing heterogeneous catalysis research.

Tags: None

216. Data extraction from polymer literature using large language models, Communications Materials (December 19, 2024)

Authors: Sonakshi Gupta, Akhlak Mahmood, Pranav Shetty, Aishat Adeboye, Rampi Ramprasad

Volume & Issue: Volume 5, Issue 1

Pages: 269

Abstract: Automated data extraction from materials science literature at scale using artificial intelligence and natural language processing techniques is critical to advance materials discovery. However, this process for large spans of text continues to be a challenge due to the specific nature and styles of scientific manuscripts. In this study, we present a framework to automatically extract polymer-property data from full-text journal articles using commercially available (GPT-3.5) and open-source (LlaMa 2) large language models (LLM), in tandem with the named entity recognition (NER)-based MaterialsBERT model. Leveraging a corpus of ~ 2.4 million full text articles, our method successfully identified and processed around 681,000 polymer-related articles, resulting in the extraction of over one million records corresponding to 24 properties of over 106,000 unique polymers. We additionally conducted an extensive evaluation of the performance and associated costs of the LLMs used for data extraction, compared to the NER model. We suggest methodologies to optimize costs, provide insights on effective inference via in-context few-shots learning, and illuminate gaps and opportunities for future studies utilizing LLMs for natural language processing in polymer science. The extracted polymer-property data has been made publicly available for the wider scientific community via the Polymer Scholar website.

Tags: Polymers, Databases

215. De novo design of polymer electrolytes using GPT-based and diffusion-based generative models, npj Computational Materials (December 19, 2024)

Authors: Zhenze Yang, Weike Ye, Xiangyun Lei, Daniel Schweigert, Ha-Kyung Kwon, Arash Khajeh

Volume & Issue: Volume 10, Issue 1

Pages: 296

Abstract: Solid polymer electrolytes offer promising advancements for next-generation batteries, boasting superior safety, enhanced specific energy, and extended lifespans over liquid electrolytes. However, low ionic conductivity and the vast polymer space hinder commercialization. This study leverages generative AI for de novo polymer electrolyte design, comparing GPT-based and diffusion-based models with extensive hyperparameter tuning. We evaluate these models using various metrics and full-atom molecular dynamics simulations. Among 46 candidates tested, 17 exhibit superior ionic conductivity, surpassing existing polymers in our database, with some doubling the conductivity values. Additionally, by adopting pretraining and fine-tuning methodologies, we significantly enhance our generative models, achieving quicker convergence, better performance with limited data, and greater diversity. Our method efficiently generates a large number of novel, diverse, and valid polymers, with a high likelihood of synthesizability, enabling the identification of promising candidates with markedly improved efficiency and effectiveness for practical applications.

Tags: Computational methods, Polymers

214. Poseidon: Efficient Foundation Models for PDEs, Advances in Neural Information Processing Systems (December 16, 2024)

Authors: Maximilian Herde, Bogdan Raonić, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Emmanuel de Bézenac, Siddhartha Mishra

Volume & Issue: Volume 37

Pages: 72525-72624

Abstract: No abstract available

Tags: None

213. Navigating Chemical Space with Latent Flows, Advances in Neural Information Processing Systems (December 16, 2024)

Authors: Guanghao Wei, Yining Huang, Chenru Duan, Yue Song, Yuanqi Du

Volume & Issue: Volume 37

Pages: 58663-58697

Abstract: No abstract available

Tags: None

212. Generative Hierarchical Materials Search, Advances in Neural Information Processing Systems (December 16, 2024)

Authors: Sherry Yang, Simon Batzner, Ruiqi Gao, Muratahan Aykol, Alexander Gaunt, Brendan McMorrow, Danilo Rezende, Dale Schuurmans, Igor Mordatch, Ekin D. Cubuk

Volume & Issue: Volume 37

Pages: 38799-38819

Abstract: No abstract available

Tags: None

211. Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation, Advances in Neural Information Processing Systems (December 16, 2024)

Authors: Keqiang Yan, Xiner Li, Hongyi Ling, Kenna Ashen, Carl Edwards, Raymundo Arróyave, Marinka Zitnik, Heng Ji, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

Volume & Issue: Volume 37

Pages: 125050-125072

Abstract: No abstract available

Tags: None

210. Crystal structure generation with autoregressive large language modeling, Nature Communications (December 06, 2024)

Authors: Luis M. Antunes, Keith T. Butler, Ricardo Grau-Crespo

Volume & Issue: Volume 15, Issue 1

Pages: 10570

Abstract: The generation of plausible crystal structures is often the first step in predicting the structure and properties of a material from its chemical composition. However, most current methods for crystal structure prediction are computationally expensive, slowing the pace of innovation. Seeding structure prediction algorithms with quality generated candidates can overcome a major bottleneck. Here, we introduce CrystaLLM, a methodology for the versatile generation of crystal structures, based on the autoregressive large language modeling (LLM) of the Crystallographic Information File (CIF) format. Trained on millions of CIF files, CrystaLLM focuses on modeling crystal structures through text. CrystaLLM can produce plausible crystal structures for a wide range of inorganic compounds unseen in training, as demonstrated by ab initio simulations. Our approach challenges conventional representations of crystals, and demonstrates the potential of LLMs for learning effective models of crystal chemistry, which will lead to accelerated discovery and innovation in materials science.

Tags: Computational methods, Method development, Solid-state chemistry

209. Quantifying the use and potential benefits of artificial intelligence in scientific research, Nature Human Behaviour (December 2024)

Authors: Jian Gao, Dashun Wang

Volume & Issue: Volume 8, Issue 12

Pages: 2281-2292

Abstract: Gao and Wang develop a measurement framework that demonstrates the widespread use and benefits of AI in science. Nonetheless, there is a substantial gap between AI education and application across disciplines.

Tags: None

208. Multifunctional high-entropy materials, Nature Reviews Materials (December 2024)

Authors: Liuliu Han, Shuya Zhu, Ziyuan Rao, Christina Scheu, Dirk Ponge, Alfred Ludwig, Hongbin Zhang, Oliver Gutfleisch, Horst Hahn, Zhiming Li, Dierk Raabe

Volume & Issue: Volume 9, Issue 12

Pages: 846-865

Abstract: Entropy-related phase stabilization can allow compositionally complex solid solutions of multiple principal elements. The massive mixing approach was originally introduced for metals and has recently been extended to ionic, semiconductor, polymer and low-dimensional materials. Multielement mixing can leverage new types of random, weakly ordered clustering and precipitation states in bulk materials as well as at interfaces and dislocations. The many possible atomic configurations offer opportunities to discover and exploit new functionalities, as well as to create new local symmetry features, ordering phenomena and interstitial configurations. This opens up a huge chemical and structural space in which uncharted phase states, defect chemistries, mechanisms and properties, some previously thought to be mutually exclusive, can be reconciled in one material. Earlier research concentrated on mechanical properties such as strength, toughness, fatigue and ductility. This Review shifts the focus towards multifunctional property profiles, including electronic, electrochemical, mechanical, magnetic, catalytic, hydrogen-related, Invar and caloric characteristics. Disruptive design opportunities lie in combining several of these features, rendering high-entropy materials multifunctional without sacrificing their unique mechanical properties.

Tags: Metals and alloys, Materials for energy and catalysis

207. Learning spatiotemporal dynamics with a pretrained generative model, Nature Machine Intelligence (December 2024)

Authors: Zeyu Li, Wang Han, Yue Zhang, Qingfei Fu, Jingxuan Li, Lizi Qin, Ruoyu Dong, Hao Sun, Yue Deng, Lijun Yang

Volume & Issue: Volume 6, Issue 12

Pages: 1566-1579

Abstract: Reconstructing and predicting spatiotemporal dynamics from sparse sensor data is challenging, especially with limited sensors. Li et al. address this by using self-supervised pretraining of a generative model, improving accuracy and generalization.

Tags: None

206. Towards the holistic design of alloys with large language models, Nature Reviews Materials (December 2024)

Authors: Zongrui Pei, Junqi Yin, Jörg Neugebauer, Anubhav Jain

Volume & Issue: Volume 9, Issue 12

Pages: 840-841

Abstract: Large language models are very effective at solving general tasks, but can also be useful in materials design and extracting and using information from the scientific literature and unstructured corpora. In the domain of alloy design and manufacturing, they can expedite the materials design process and enable the inclusion of holistic criteria.

Tags: Theory and computation, Metals and alloys

205. Synthesis of high-entropy materials, Nature Synthesis (December 2024)

Authors: Yifan Sun, Sheng Dai

Volume & Issue: Volume 3, Issue 12

Pages: 1457-1470

Abstract: The emergence of high-entropy materials affords opportunities to harmonize precision and disorder for materials design. This Review highlights the synthesis principles and strategies towards controllable and predictive fabrication of high-entropy materials with complex chemical compositions, engineered microstructures and tailored atomic configurations.

Tags: None

204. An automatic end-to-end chemical synthesis development platform powered by large language models, Nature Communications (November 23, 2024)

Authors: Yixiang Ruan, Chenyin Lu, Ning Xu, Yuchen He, Yixin Chen, Jian Zhang, Jun Xuan, Jianzhang Pan, Qun Fang, Hanyu Gao, Xiaodong Shen, Ning Ye, Qiang Zhang, Yiming Mo

Volume & Issue: Volume 15, Issue 1

Pages: 10160

Abstract: The rapid emergence of large language model (LLM) technology presents promising opportunities to facilitate the development of synthetic reactions. In this work, we leveraged the power of GPT-4 to build an LLM-based reaction development framework (LLM-RDF) to handle fundamental tasks involved throughout the chemical synthesis development. LLM-RDF comprises six specialized LLM-based agents, including Literature Scouter, Experiment Designer, Hardware Executor, Spectrum Analyzer, Separation Instructor, and Result Interpreter, which are pre-prompted to accomplish the designated tasks. A web application with LLM-RDF as the backend was built to allow chemist users to interact with automated experimental platforms and analyze results via natural language, thus, eliminating the need for coding skills and ensuring accessibility for all chemists. We demonstrated the capabilities of LLM-RDF in guiding the end-to-end synthesis development process for the copper/TEMPO catalyzed aerobic alcohol oxidation to aldehyde reaction, including literature search and information extraction, substrate scope and condition screening, reaction kinetics study, reaction condition optimization, reaction scale-up and product purification. Furthermore, LLM-RDF’s broader applicability and versability was validated on various synthesis tasks of three distinct reactions (SNAr reaction, photoredox C-C cross-coupling reaction, and heterogeneous photoelectrochemical reaction).

Tags: Automation, Chemical engineering, Process chemistry

203. Sequence modeling and design from molecular to genome scale with Evo, Science (November 15, 2024)

Authors: Eric Nguyen, Michael Poli, Matthew G. Durrant, Brian Kang, Dhruva Katrekar, David B. Li, Liam J. Bartie, Armin W. Thomas, Samuel H. King, Garyk Brixi, Jeremy Sullivan, Madelena Y. Ng, Ashley Lewis, Aaron Lou, Stefano Ermon, Stephen A. Baccus, Tina Hernandez-Boussard, Christopher Ré, Patrick D. Hsu, Brian L. Hie

Volume & Issue: Volume 386, Issue 6723

Pages: eado9336

Abstract: The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism’s function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale sequences with plausible genomic architecture. These prediction and generation capabilities span molecular to genomic scales of complexity, advancing our understanding and control of biology.

Tags: None

202. Learning the language of DNA, Science (November 15, 2024)

Authors: Christina V. Theodoris

Volume & Issue: Volume 386, Issue 6723

Pages: 729-730

Abstract: No abstract available

Tags: None

201. Deep learning generative model for crystal structure prediction, npj Computational Materials (November 12, 2024)

Authors: Xiaoshan Luo, Zhenyu Wang, Pengyue Gao, Jian Lv, Yanchao Wang, Changfeng Chen, Yanming Ma

Volume & Issue: Volume 10, Issue 1

Pages: 254

Abstract: Recent advances in deep learning generative models (GMs) have created high capabilities in accessing and assessing complex high-dimensional data, allowing superior efficiency in navigating vast material configuration space in search of viable structures. Coupling such capabilities with physically significant data to construct trained models for materials discovery is crucial to moving this emerging field forward. Here, we present a universal GM for crystal structure prediction (CSP) via a conditional crystal diffusion variational autoencoder (Cond-CDVAE) approach, which is tailored to allow user-defined material and physical parameters such as composition and pressure. This model is trained on an expansive dataset containing over 670,000 local minimum structures, including a rich spectrum of high-pressure structures, along with ambient-pressure structures in Materials Project database. We demonstrate that the Cond-CDVAE model can generate physically plausible structures with high fidelity under diverse pressure conditions without necessitating local optimization, accurately predicting 59.3% of the 3547 unseen ambient-pressure experimental structures within 800 structure samplings, with the accuracy rate climbing to 83.2% for structures comprising fewer than 20 atoms per unit cell. These results meet or exceed those achieved via conventional CSP methods based on global optimization. The present findings showcase substantial potential of GMs in the realm of CSP.

Tags: Computational methods, Structure of solids and liquids

200. Crystal Structure Determination from Powder Diffraction Patterns with Generative Machine Learning, Journal of the American Chemical Society (November 06, 2024)

Authors: Eric A. Riesel, Tsach Mackey, Hamed Nilforoshan, Minkai Xu, Catherine K. Badding, Alison B. Altman, Jure Leskovec, Danna E. Freedman

Volume & Issue: Volume 146, Issue 44

Pages: 30340-30348

Abstract: Powder X-ray diffraction (PXRD) is a cornerstone technique in materials characterization. However, complete structure determination from PXRD patterns alone remains time-consuming and is often intractable, especially for novel materials. Current machine learning (ML) approaches to PXRD analysis predict only a subset of the total information that comprises a crystal structure. We developed a pioneering generative ML model designed to solve crystal structures from real-world experimental PXRD data. In addition to strong performance on simulated diffraction patterns, we demonstrate full structure solutions over a large set of experimental diffraction patterns. Benchmarking our model, we predicted the structure for 134 experimental patterns from the RRUFF database and thousands of simulated patterns from the Materials Project on which our model achieves state-of-the-art 42 and 67% match rate, respectively. Further, we applied our model to determine the unreported structures of materials such as NaCu2P2, Ca2MnTeO6, ZrGe6Ni6, LuOF, and HoNdV2O8 from the Powder Diffraction File database. We extended this methodology to new materials created in our lab at high pressure with previously unsolved structures and found the new binary compounds Rh3Bi, RuBi2, and KBi3. We expect that our model will open avenues toward materials discovery under conditions which preclude single crystal growth and toward automated materials discovery pipelines, opening the door to new domains of chemistry.

Tags: None

199. A Machine Learning Study on High Thermal Conductivity Assisted to Discover Chalcogenides with Balanced Infrared Nonlinear Optical Performance, Advanced Materials (November 06, 2024)

Authors: Qingchen Wu, Lei Kang, Zheshuai Lin

Volume & Issue: Volume 36, Issue 6

Pages: 2309675

Abstract: Exploration of novel nonlinear optical (NLO) chalcogenides with high laser-induced damage thresholds (LIDT) is critical for mid-infrared (mid-IR) solid-state laser applications. High lattice thermal conductivity (κL) is crucial to increasing LIDT yet often neglected in the search for NLO crystals due to lack of accurate κL data. A machine learning (ML) approach to predict κL for over 6000 chalcogenides is hereby proposed. Combining ML-generated κL data and first-principles calculation, a high-throughput screening route is initiated, and ten new potential mid-IR NLO chalcogenides with optimal bandgap, NLO coefficients, and thermal conductivity are discovered, in which Li2SiS3 and AlZnGaS4 are highlighted. Big-data analysis on structural chemistry proves that the chalcogenides having dense and simple lattice structures with low anisotropy, light atoms, and strong covalent bonds are likely to possess higher κL. The four-coordinated motifs in which central cations show the bond valence sum of +2 to +3 and are from IIIA, IVA, VA, and IIB groups, such as those in diamond-like defect-chalcopyrite chalcogenides, are preferred to fulfill the desired structural chemistry conditions for balanced NLO and thermal properties. This work provides not only an efficient strategy but also interpretable research directions in the search for NLO crystals with high thermal conductivity.

Tags: balanced infrared nonlinear optical performance, high-throughput screening, machine learning, nonlinear optical chalcogenides, thermal conductivity

198. Physics-Informed Inverse Design of Programmable Metasurfaces, Advanced Science (November 06, 2024)

Authors: Yucheng Xu, Jia-Qi Yang, Kebin Fan, Sheng Wang, Jingbo Wu, Caihong Zhang, De-Chuan Zhan, Willie J. Padilla, Biaobing Jin, Jian Chen, Peiheng Wu

Volume & Issue: Volume 11, Issue 41

Pages: 2406878

Abstract: Emerging reconfigurable metasurfaces offer various possibilities for programmatically manipulating electromagnetic waves across spatial, spectral, and temporal domains, showcasing great potential for enhancing terahertz applications. However, they are hindered by limited tunability, particularly evident in relatively small phase tuning over 270°, due to the design constraints with time-intensive forward design methodologies. Here, a multi-bit programmable metasurface is demonstrated capable of terahertz beam steering facilitated by a developed physics-informed inverse design (PIID) approach. Through integrating a modified coupled mode theory (MCMT) into residual neural networks, the PIID algorithm not only significantly increases the design accuracy compared to conventional neural networks but also elucidates the intricate physical relations between the geometry and the modes. Without decreasing the reflection intensity, the method achieves the enhanced phase tuning as large as 300°. Additionally, the inverse-designed programmable beam steering metasurface is experimentally validated, which is adaptable across 1-bit, 2-bit, and tri-state coding schemes, yielding a deflection angle up to 68° and broadened steering coverage. The demonstration provides a promising pathway for rapidly exploring advanced metasurface devices, with potentially great impact on communication and imaging technologies.

Tags: deep learning, beam steering, inverse design, programmable metasurfaces

197. Reproducibility in automated chemistry laboratories using computer science abstractions, Nature Synthesis (November 2024)

Authors: Richard B. Canty, Milad Abolhasani

Volume & Issue: Volume 3, Issue 11

Pages: 1327-1339

Abstract: Digital workflow representations in automated and autonomous chemistry laboratories can achieve transferability by using abstract concepts. However, such abstractions must abide by certain rules to ensure reproducibility. Lessons learned from computer science for responsible abstraction are translated into an automated chemistry laboratory context to guide digital workflow development towards reproducibility.

Tags: None

196. Autonomous mobile robots for exploratory synthetic chemistry, Nature (November 2024)

Authors: Tianwei Dai, Sriram Vijayakrishnan, Filip T. Szczypiński, Jean-François Ayme, Ehsan Simaei, Thomas Fellowes, Rob Clowes, Lyubomir Kotopanov, Caitlin E. Shields, Zhengxue Zhou, John W. Ward, Andrew I. Cooper

Volume & Issue: Volume 635, Issue 8040

Pages: 890-897

Abstract: Autonomous laboratories can accelerate discoveries in chemical synthesis, but this requires automated measurements coupled with reliable decision-making1,2. Most autonomous laboratories involve bespoke automated equipment3–6, and reaction outcomes are often assessed using a single, hard-wired characterization technique7. Any decision-making algorithms8 must then operate using this narrow range of characterization data9,10. By contrast, manual experiments tend to draw on a wider range of instruments to characterize reaction products, and decisions are rarely taken based on one measurement alone. Here we show that a synthesis laboratory can be integrated into an autonomous laboratory by using mobile robots11–13 that operate equipment and make decisions in a human-like way. Our modular workflow combines mobile robots, an automated synthesis platform, a liquid chromatography–mass spectrometer and a benchtop nuclear magnetic resonance spectrometer. This allows robots to share existing laboratory equipment with human researchers without monopolizing it or requiring extensive redesign. A heuristic decision-maker processes the orthogonal measurement data, selecting successful reactions to take forward and automatically checking the reproducibility of any screening hits. We exemplify this approach in the three areas of structural diversification chemistry, supramolecular host–guest chemistry and photochemical synthesis. This strategy is particularly suited to exploratory chemistry that can yield multiple potential products, as for supramolecular assemblies, where we also extend the method to an autonomous function assay by evaluating host–guest binding properties.

Tags: Chemical synthesis, Supramolecular chemistry

195. Efficient Exploratory Synthesis of Quaternary Cesium Chlorides Guided by In Silico Predictions, Journal of the American Chemical Society (October 30, 2024)

Authors: Akira Miura, Muratahan Aykol, Shumma Kozaki, Chikako Moriyoshi, Shintaro Kobayashi, Shogo Kawaguchi, Chul-Ho Lee, Yongming Wang, Amil Merchant, Simon Batzner, Hiroshi Kageyama, Kiyoharu Tadanaga, Pushmeet Kohli, Ekin Dogus Cubuk

Volume & Issue: Volume 146, Issue 43

Pages: 29637-29644

Abstract: Exploratory synthesis of solids is essential for the advancement of materials science but is also highly time- and resource-intensive. Here, we demonstrate an efficient strategy to explore solid-state synthesis of quaternary cesium chlorides in the search space of CsnAIBCl6 (n = 2 or 3, A = Li, Na or K, and B = d or p-block metal), where the target compositions are selected from a pool of candidates based on computationally predicted stabilities and availability of viable precursor powders. Synthesizability of the targets is assessed by observing the evolution of starting phases upon heating under in situ synchrotron X-ray diffraction. Laboratory synthesis is attempted for promising targets, and resulting materials are characterized by powder X-ray and neutron diffraction and subsequent Rietveld refinement. We focus on how computational predictions can be bridged to experimental characterizations in exploratory synthesis and report on successful and failed synthesis attempts for compounds of type Cs2AIBIIICl6, revealing underexplored variants including new polymorphs of Cs2LiCrCl6 and Cs2LiRuCl6, and a new compound Cs2LiIrCl6.

Tags: None

194. Transforming science labs into automated factories of discovery, Science Robotics (October 23, 2024)

Authors: Angelos Angelopoulos, James F. Cahoon, Ron Alterovitz

Volume & Issue: Volume 9, Issue 95

Pages: eadm6991

Abstract: Laboratories in chemistry, biochemistry, and materials science are at the leading edge of technology, discovering molecules and materials to unlock capabilities in energy, catalysis, biotechnology, sustainability, electronics, and more. Yet, most modern laboratories resemble factories from generations past, with a large reliance on humans manually performing synthesis and characterization tasks. Robotics and automation can enable scientific experiments to be conducted faster, more safely, more accurately, and with greater reproducibility, allowing scientists to tackle large societal problems in domains such as health and energy on a shorter timescale. We define five levels of laboratory automation, from laboratory assistance to full automation. We also introduce robotics research challenges that arise when increasing levels of automation and when increasing the generality of tasks within the laboratory. Robots are poised to transform science labs into automated factories of discovery that accelerate scientific progress.

Tags: None

193. Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models, Preprint (October 16, 2024)

Authors: Luis Barroso-Luque, Muhammed Shuaibi, Xiang Fu, Brandon M. Wood, Misko Dzamba, Meng Gao, Ammar Rizvi, C. Lawrence Zitnick, Zachary W. Ulissi

Abstract: The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

Tags: Condensed Matter - Materials Science, Physics - Computational Physics, Computer Science - Artificial Intelligence

192. MatGPT: A Vane of Materials Informatics from Past, Present, to Future, Advanced Materials (October 09, 2024)

Authors: Zhilong Wang, An Chen, Kehao Tao, Yanqiang Han, Jinjin Li

Volume & Issue: Volume 36, Issue 6

Pages: 2306733

Abstract: Combining materials science, artificial intelligence (AI), physical chemistry, and other disciplines, materials informatics is continuously accelerating the vigorous development of new materials. The emergence of “GPT (Generative Pre-trained Transformer) AI” shows that the scientific research field has entered the era of intelligent civilization with “data” as the basic factor and “algorithm + computing power” as the core productivity. The continuous innovation of AI will impact the cognitive laws and scientific methods, and reconstruct the knowledge and wisdom system. This leads to think more about materials informatics. Here, a comprehensive discussion of AI models and materials infrastructures is provided, and the advances in the discovery and design of new materials are reviewed. With the rise of new research paradigms triggered by “AI for Science”, the vane of materials informatics: “MatGPT”, is proposed and the technical path planning from the aspects of data, descriptors, generative models, pretraining models, directed design models, collaborative training, experimental robots, as well as the efforts and preparations needed to develop a new generation of materials informatics, is carried out. Finally, the challenges and constraints faced by materials informatics are discussed, in order to achieve a more digital, intelligent, and automated construction of materials informatics with the joint efforts of more interdisciplinary scientists.

Tags: artificial intelligence, AI for Science, materials informatics, materials science

191. Machine learning for data-centric epidemic forecasting, Nature Machine Intelligence (October 2024)

Authors: Alexander Rodríguez, Harshavardhan Kamarthi, Pulak Agarwal, Javen Ho, Mira Patel, Suchet Sapre, B. Aditya Prakash

Volume & Issue: Volume 6, Issue 10

Pages: 1122-1131

Abstract: Forecasting epidemic progression is a complex task influenced by various factors, including human behaviour, pathogen dynamics and environmental conditions. Rodríguez, Kamarthi and colleagues provide a review of machine learning methods for epidemic forecasting from a data-centric computational perspective.

Tags: None

190. Generative deep learning for the inverse design of materials, Preprint (September 27, 2024)

Authors: Teng Long, Yixuan Zhang, Hongbin Zhang

Abstract: In addition to the forward inference of materials properties using machine learning, generative deep learning techniques applied on materials science allow the inverse design of materials, i.e., assessing the composition-processing-(micro-)structure-property relationships in a reversed way. In this review, we focus on the (micro-)structure-property mapping, i.e., crystal structure-intrinsic property and microstructure-extrinsic property, and summarize comprehensively how generative deep learning can be performed. Three key elements, i.e., the construction of latent spaces for both the crystal structures and microstructures, generative learning approaches, and property constraints, are discussed in detail. A perspective is given outlining the challenges of the existing methods in terms of computational resource consumption, data compatibility, and yield of generation.

Tags: Condensed Matter - Materials Science, Physics - Computational Physics

189. Stable diffusion for the inverse design of microstructures, Preprint (September 27, 2024)

Authors: Yixuan Zhang, Teng Long, Hongbin Zhang

Abstract: In materials science, microstructures and their associated extrinsic properties are critical for engineering advanced structural and functional materials, yet their robust reconstruction and generation remain significant challenges. In this work, we developed a microstructure generation model based on the Stable Diffusion (SD) model, training it on a dataset of 576,000 2D synthetic microstructures containing both phase and grain orientation information. This model was applied to a range of tasks, including microstructure reconstruction, interpolation, inpainting, and generation. Experimental results demonstrate that our image-based approach can analyze and generate complex microstructural features with exceptional statistical and morphological fidelity. Additionally, by integrating the ControlNet fine-tuning model, we achieved the inverse design of microstructures based on specific properties. Compared to conventional methods, our approach offers greater accuracy, efficiency, and versatility, showcasing its generative potential in exploring previously uncharted microstructures and paving the way for data-driven development of advanced materials with tailored properties.

Tags: Condensed Matter - Materials Science, Physics - Data Analysis, Statistics and Probability

188. Are LLMs Ready for Real-World Materials Discovery?, Preprint (September 25, 2024)

Authors: Santiago Miret, N. M. Anoop Krishnan

Abstract: Large Language Models (LLMs) create exciting possibilities for powerful language processing tools to accelerate research in materials science. While LLMs have great potential to accelerate materials understanding and discovery, they currently fall short in being practical materials science tools. In this position paper, we show relevant failure cases of LLMs in materials science that reveal current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given those shortcomings, we outline a framework for developing Materials Science LLMs (MatSci-LLMs) that are grounded in materials science knowledge and hypothesis generation followed by hypothesis testing. The path to attaining performant MatSci-LLMs rests in large part on building high-quality, multi-modal datasets sourced from scientific literature where various information extraction challenges persist. As such, we describe key materials science information extraction challenges which need to be overcome in order to build large-scale, multi-modal datasets that capture valuable materials science knowledge. Finally, we outline a roadmap for applying future MatSci-LLMs for real-world materials discovery via: 1. Automated Knowledge Base Generation; 2. Automated In-Silico Material Design; and 3. MatSci-LLM Integrated Self-Driving Materials Laboratories.

Tags: Condensed Matter - Materials Science, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Computation and Language

187. Electronic descriptors for dislocation deformation behavior and intrinsic ductility in bcc high-entropy alloys, Science Advances (September 20, 2024)

Authors: Pedro P. P. O. Borges, Robert O. Ritchie, Mark Asta

Volume & Issue: Volume 10, Issue 38

Pages: eadp7670

Abstract: Controlling the balance between strength and damage tolerance in high-entropy alloys (HEAs) is central to their application as structural materials. Materials discovery efforts for HEAs are therefore impeded by an incomplete understanding of the chemical factors governing this balance. Through first-principles calculations, this study explores factors governing intrinsic ductility of a crucial subset of HEAs—those with a body-centered cubic (bcc) crystal structure. Analyses of three sets of bcc HEAs comprising nine different compositions reveal that alloy chemistry profoundly influences screw dislocation core structure, dislocation vibrational properties, and intrinsic ductility parameters derived from unstable stacking fault and surface energies. Key features in the electronic structure are identified that correlate with these properties: the fraction of occupied bonding states and bimodality of the d-orbital density of states. The findings enhance the fundamental understanding of the origins of intrinsic ductility and establish an electronic structure–based framework for computationally accelerated materials discovery and design.

Tags: None

186. Scalable crystal structure relaxation using an iteration-free deep generative model with uncertainty quantification, Nature Communications (September 17, 2024)

Authors: Ziduo Yang, Yi-Ming Zhao, Xian Wang, Xiaoqing Liu, Xiuying Zhang, Yifan Li, Qiujie Lv, Calvin Yu-Chian Chen, Lei Shen

Volume & Issue: Volume 15, Issue 1

Pages: 8148

Abstract: In computational molecular and materials science, determining equilibrium structures is the crucial first step for accurate subsequent property calculations. However, the recent discovery of millions of new crystals and super large twisted structures has challenged traditional computational methods, both ab initio and machine-learning-based, due to their computationally intensive iterative processes. To address these scalability issues, here we introduce DeepRelax, a deep generative model capable of performing geometric crystal structure relaxation rapidly and without iterations. DeepRelax learns the equilibrium structural distribution, enabling it to predict relaxed structures directly from their unrelaxed ones. The ability to perform structural relaxation at the millisecond level per structure, combined with the scalability of parallel processing, makes DeepRelax particularly useful for large-scale virtual screening. We demonstrate DeepRelax’s reliability and robustness by applying it to five diverse databases, including oxides, Materials Project, two-dimensional materials, van der Waals crystals, and crystals with point defects. DeepRelax consistently shows high accuracy and efficiency, validated by density functional theory calculations. Finally, we enhance its trustworthiness by integrating uncertainty quantification. This work significantly accelerates computational workflows, offering a robust and trustworthy machine-learning method for material discovery and advancing the application of AI for science.

Tags: Theory and computation, Computational science, Method development, Structure of solids and liquids

185. De novo design of high-affinity protein binders with AlphaProteo, Preprint (September 12, 2024)

Authors: Vinicius Zambaldi, David La, Alexander E. Chu, Harshnira Patani, Amy E. Danson, Tristan O. C. Kwan, Thomas Frerix, Rosalia G. Schneider, David Saxton, Ashok Thillaisundaram, Zachary Wu, Isabel Moraes, Oskar Lange, Eliseo Papa, Gabriella Stanton, Victor Martin, Sukhdeep Singh, Lai H. Wong, Russ Bates, Simon A. Kohl, Josh Abramson, Andrew W. Senior, Yilmaz Alguel, Mary Y. Wu, Irene M. Aspalter, Katie Bentley, David L. V. Bauer, Peter Cherepanov, Demis Hassabis, Pushmeet Kohli, Rob Fergus, Jue Wang

Abstract: Computational design of protein-binding proteins is a fundamental capability with broad utility in biomedical research and biotechnology. Recent methods have made strides against some target proteins, but on-demand creation of high-affinity binders without multiple rounds of experimental testing remains an unsolved challenge. This technical report introduces AlphaProteo, a family of machine learning models for protein design, and details its performance on the de novo binder design problem. With AlphaProteo, we achieve 3- to 300-fold better binding affinities and higher experimental success rates than the best existing methods on seven target proteins. Our results suggest that AlphaProteo can generate binders “ready-to-use” for many research applications using only one round of medium-throughput screening and no further optimization.

Tags: Quantitative Biology - Biomolecules

184. ChemOS 2.0: An orchestration architecture for chemical self-driving laboratories, Matter (September 04, 2024)

Authors: Malcolm Sim, Mohammad Ghazi Vakili, Felix Strieth-Kalthoff, Han Hao, Riley J. Hickman, Santiago Miret, Sergio Pablo-García, Alán Aspuru-Guzik

Volume & Issue: Volume 7, Issue 9

Pages: 2959-2977

Abstract: No abstract available

Tags: automation, MAP 4: Demonstrate, materials discovery, orchestration, self-driving laboratories

183. Closed-loop transfer enables artificial intelligence to yield chemical knowledge, Nature (September 2024)

Authors: Nicholas H. Angello, David M. Friday, Changhyun Hwang, Seungjoo Yi, Austin H. Cheng, Tiara C. Torres-Flores, Edward R. Jira, Wesley Wang, Alán Aspuru-Guzik, Martin D. Burke, Charles M. Schroeder, Ying Diao, Nicholas E. Jackson

Volume & Issue: Volume 633, Issue 8029

Pages: 351-358

Abstract: Integration of closed-loop experiments with physics-based feature selection and supervised learning, denoted as closed-loop transfer, yields chemical insights in parallel with optimization of objective functions.

Tags: None

182. AI-driven research in pure mathematics and theoretical physics, Nature Reviews Physics (September 2024)

Authors: Yang-Hui He

Volume & Issue: Volume 6, Issue 9

Pages: 546-553

Abstract: Advances in artificial-intelligence-assisted mathematical investigations suggest that human–machine collaboration will be an integral part of future theoretical research.

Tags: None

181. Fine-tuning protein language models boosts predictions across diverse tasks, Nature Communications (August 28, 2024)

Authors: Robert Schmirler, Michael Heinzinger, Burkhard Rost

Volume & Issue: Volume 15, Issue 1

Pages: 7407

Abstract: Prediction methods inputting embeddings from protein language models have reached or even surpassed state-of-the-art performance on many protein prediction tasks. In natural language processing fine-tuning large language models has become the de facto standard. In contrast, most protein language model-based protein predictions do not back-propagate to the language model. Here, we compare the fine-tuning of three state-of-the-art models (ESM2, ProtT5, Ankh) on eight different tasks. Two results stand out. Firstly, task-specific supervised fine-tuning almost always improves downstream predictions. Secondly, parameter-efficient fine-tuning can reach similar improvements consuming substantially fewer resources at up to 4.5-fold acceleration of training over fine-tuning full models. Our results suggest to always try fine-tuning, in particular for problems with small datasets, such as for fitness landscape predictions of a single protein. For ease of adaptability, we provide easy-to-use notebooks to fine-tune all models used during this work for per-protein (pooling) and per-residue prediction tasks.

Tags: Machine learning, Computational models, Protein function predictions

180. Machine learning enables the discovery of 2D Invar and anti-Invar monolayers, Nature Communications (August 14, 2024)

Authors: Shun Tian, Ke Zhou, Wanjian Yin, Yilun Liu

Volume & Issue: Volume 15, Issue 1

Pages: 6977

Abstract: Materials demonstrating positive thermal expansion (PTE) or negative thermal expansion (NTE) are quite common, whereas those exhibiting zero thermal expansion (ZTE) are notably scarce. In this work, we identify the mechanical descriptors, namely in-plane tensile stiffness and out-of-plane bending stiffness, that can effectively classify PTE and NTE 2D crystals. By utilizing high throughput calculations and the state-of-the-art symbolic regression method, these descriptors aid in the discovery of ZTE or 2D Invar monolayers with the linear thermal expansion coefficient (LTEC) within ±2 × 10−6 K−1 in the middle range of temperatures. Additionally, the descriptors assist the discovery of large PTE and NTE 2D monolayers with the LTEC larger than ±15 × 10−6 K−1, which are so-called 2D anti-Invar monolayers. Advancing our understanding of materials with exceptionally low or high thermal expansion is of substantial scientific and technological interest, particularly in the development of next-generation electronics at the nanometer or even Ångstrom scale.

Tags: Structure of solids and liquids, Two-dimensional materials

179. Accurate prediction of protein function using statistics-informed graph networks, Nature Communications (August 04, 2024)

Authors: Yaan J. Jang, Qi-Qi Qin, Si-Yu Huang, Arun T. John Peter, Xue-Ming Ding, Benoît Kornmann

Volume & Issue: Volume 15, Issue 1

Pages: 6601

Abstract: Understanding protein function is pivotal in comprehending the intricate mechanisms that underlie many crucial biological activities, with far-reaching implications in the fields of medicine, biotechnology, and drug development. However, more than 200 million proteins remain uncharacterized, and computational efforts heavily rely on protein structural information to predict annotations of varying quality. Here, we present a method that utilizes statistics-informed graph networks to predict protein functions solely from its sequence. Our method inherently characterizes evolutionary signatures, allowing for a quantitative assessment of the significance of residues that carry out specific functions. PhiGnet not only demonstrates superior performance compared to alternative approaches but also narrows the sequence-function gap, even in the absence of structural information. Our findings indicate that applying deep learning to evolutionary data can highlight functional sites at the residue level, providing valuable support for interpreting both existing properties and new functionalities of proteins in research and biomedicine.

Tags: Machine learning, Computational models, Protein function predictions, Bioinformatics, Protein sequencing

178. Accelerated discovery of perovskite solid solutions through automated materials synthesis and characterization, Nature Communications (August 02, 2024)

Authors: Mojan Omidvar, Hangfeng Zhang, Achintha Avin Ihalage, Theo Graves Saunders, Henry Giddens, Michael Forrester, Sajad Haq, Yang Hao

Volume & Issue: Volume 15, Issue 1

Pages: 6554

Abstract: Accelerating perovskite solid solution discovery and sustainable synthesis is crucial for addressing challenges in wireless communication and biosensors. However, the vast array of chemical compositions and their dependence on factors such as crystal structure, and sintering temperature require time-consuming manual processes. To overcome these constraints, we introduce an automated materials discovery approach encompassing machine learning (ML) assisted material screening, robotic synthesis, and high-throughput characterization. Our proposed platform for rapid sintering and dielectric analysis streamlines the characterization of perovskites and the discovery of disordered materials. The setup has been successfully validated, demonstrating processing materials within minutes, in stark contrast to conventional procedures that can take hours or days. Following setup validation with established samples, we showcase synthesizing single-phase solid solutions within the barium family, such as (BaxSr1-x)CeO3, identified through ML-guided chemistry.

Tags: Characterization and analytical techniques, Design, Electronic devices, synthesis and processing

177. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins, Nature Chemical Biology (August 2024)

Authors: Vinayak Agarwal, Andrew C. McShan

Volume & Issue: Volume 20, Issue 8

Pages: 950-959

Abstract: This Perspective proposes practical guidance to the application of AlphaFold2 for structure prediction of different classes of proteins including rigid globular proteins, intrinsically disordered proteins and alternative conformational states. The use of evaluation metrics to predict reliability of the resulting models and their integration with experimental data are also discussed.

Tags: None

176. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization, Nature Methods (August 2024)

Authors: Gustaf Ahdritz, Nazim Bouatta, Christina Floristean, Sachin Kadyan, Qinghui Xia, William Gerecke, Timothy J. O’Donnell, Daniel Berenberg, Ian Fisk, Niccolò Zanichelli, Bo Zhang, Arkadiusz Nowaczynski, Bei Wang, Marta M. Stepniewska-Dziubinska, Shang Zhang, Adegoke Ojewole, Murat Efe Guney, Stella Biderman, Andrew M. Watkins, Stephen Ra, Pablo Ribalta Lorenzo, Lucas Nivon, Brian Weitzner, Yih-En Andrew Ban, Shiyang Chen, Minjia Zhang, Conglong Li, Shuaiwen Leon Song, Yuxiong He, Peter K. Sorger, Emad Mostaque, Zhao Zhang, Richard Bonneau, Mohammed AlQuraishi

Volume & Issue: Volume 21, Issue 8

Pages: 1514-1524

Abstract: OpenFold is a trainable open-source implementation of AlphaFold2. It is fast and memory efficient, and the code and training data are available under a permissive license.

Tags: None

175. Large-scale foundation model on single-cell transcriptomics, Nature Methods (August 2024)

Authors: Minsheng Hao, Jing Gong, Xin Zeng, Chiming Liu, Yucheng Guo, Xingyi Cheng, Taifeng Wang, Jianzhu Ma, Xuegong Zhang, Le Song

Volume & Issue: Volume 21, Issue 8

Pages: 1481-1491

Abstract: scFoundation, with 100 million parameters covering about 20,000 genes, pretrained on over 50 million single-cell transcriptomics profiles, is a foundation model for diverse tasks of single-cell analysis.

Tags: None

174. Sequential closed-loop Bayesian optimization as a guide for organic molecular metallophotocatalyst formulation discovery, Nature Chemistry (August 2024)

Authors: Xiaobo Li, Yu Che, Linjiang Chen, Tao Liu, Kewei Wang, Lunjie Liu, Haofan Yang, Edward O. Pyzer-Knapp, Andrew I. Cooper

Volume & Issue: Volume 16, Issue 8

Pages: 1286-1294

Abstract: Conjugated organic photoredox catalysts (OPCs) can promote a wide range of chemical transformations. It is challenging to predict the catalytic activities of OPCs from first principles, either by expert knowledge or by using a priori calculations, as catalyst activity depends on a complex range of interrelated properties. Organic photocatalysts and other catalyst systems have often been discovered by a mixture of design and trial and error. Here we report a two-step data-driven approach to the targeted synthesis of OPCs and the subsequent reaction optimization for metallophotocatalysis, demonstrated for decarboxylative sp3–sp2 cross-coupling of amino acids with aryl halides. Our approach uses a Bayesian optimization strategy coupled with encoding of key physical properties using molecular descriptors to identify promising OPCs from a virtual library of 560 candidate molecules. This led to OPC formulations that are competitive with iridium catalysts by exploring just 2.4% of the available catalyst formulation space (107 of 4,500 possible reaction conditions).

Tags: Computational chemistry, Cheminformatics, Organocatalysis, Photocatalysis

173. Compositional design of multicomponent alloys using reinforcement learning, Acta Materialia (August 01, 2024)

Authors: Yuehui Xian, Pengfei Dang, Yuan Tian, Xue Jiang, Yumei Zhou, Xiangdong Ding, Jun Sun, Turab Lookman, Dezhen Xue

Volume & Issue: Volume 274

Pages: 120017

Abstract: The design of alloys has typically involved adaptive experimental synthesis and characterization guided by machine learning models fitted to available data. A bottleneck for sequential design, be it for self-driven or manual synthesis, by Bayesian Global Optimization (BGO) for example, is that the search space becomes intractable as the number of alloy elements and its compositions exceed a threshold. Here we investigate how reinforcement learning (RL) performs in the compositional design of alloys within a closed loop with manual synthesis and characterization. We demonstrate this strategy by designing a phase change multicomponent alloy (Ti27.2Ni47Hf13.8Zr12) with the highest transformation enthalpy (ΔH) -37.1 J/g (-39.0 J/g with further calibration) within the TiNi-based family of alloys from a space of over 2 × 108 candidates, although the initial training is only on a compact dataset of 112 alloys. We show how the training efficiency is increased by employing acquisition functions containing uncertainties, such as expected improvement (EI), as the reward itself. Existing alloy data is often limited, however, if the agent is pretrained on experimental results prior to the training process, it can access regions of higher reward values more frequently. In addition, the experimental feedback enables the agent to gradually explore new regions with higher rewards, compositionally different from the initial dataset. Our approach directly applies to processing conditions where the actions would be performed in a given order. We also compare RL performance to BGO and the genetic algorithm on several test functions to gain insight on their relative strengths in materials design.

Tags: Compositional design, Multicomponent alloys, Phase change materials, Reinforcement learning, Transformational enthalpy

172. Higher-order equivariant neural networks for charge density prediction in materials, npj Computational Materials (July 26, 2024)

Authors: Teddy Koker, Keegan Quigley, Eric Taw, Kevin Tibbetts, Lin Li

Volume & Issue: Volume 10, Issue 1

Pages: 161

Abstract: The calculation of electron density distribution using density functional theory (DFT) in materials and molecules is central to the study of their quantum and macro-scale properties, yet accurate and efficient calculation remains a long-standing challenge. We introduce ChargE3Net, an E(3)-equivariant graph neural network for predicting electron density in atomic systems. ChargE3Net enables the learning of higher-order equivariant features to achieve high predictive accuracy and model expressivity. We show that ChargE3Net exceeds the performance of prior work on diverse sets of molecules and materials. When trained on the massive dataset of over 100K materials in the Materials Project database, our model is able to capture the complexity and variability in the data, leading to a significant 26.7% reduction in self-consistent iterations when used to initialize DFT calculations on unseen materials. Furthermore, we show that non-self-consistent DFT calculations using our predicted charge densities yield near-DFT performance on electronic and thermodynamic property prediction at a fraction of the computational cost. Further analysis attributes the greater predictive accuracy to improved modeling of systems with high angular variations. These results illuminate a pathway towards a machine learning-accelerated ab initio calculations for materials discovery.

Tags: Computational methods, Electronic properties and materials, Electronic structure

171. Large Language Models for Inorganic Synthesis Predictions, Journal of the American Chemical Society (July 24, 2024)

Authors: Seongmin Kim, Yousung Jung, Joshua Schrier

Volume & Issue: Volume 146, Issue 29

Pages: 19654-19659

Abstract: We evaluate the effectiveness of pretrained and fine-tuned large language models (LLMs) for predicting the synthesizability of inorganic compounds and the selection of precursors needed to perform inorganic synthesis. The predictions of fine-tuned LLMs are comparable to─and sometimes better than─recent bespoke machine learning models for these tasks but require only minimal user expertise, cost, and time to develop. Therefore, this strategy can serve both as an effective and strong baseline for future machine learning studies of various chemical applications and as a practical tool for experimental chemists.

Tags: None

170. Autonomous chemistry: Navigating self-driving labs in chemical and material sciences, Matter (July 03, 2024)

Authors: Oliver Bayley, Elia Savino, Aidan Slattery, Timothy Noël

Volume & Issue: Volume 7, Issue 7

Pages: 2382-2398

Abstract: No abstract available

Tags: machine learning, AI, automation, robotic synthesis, SDL, self-driving labs

169. Has generative artificial intelligence solved inverse materials design?, Matter (July 03, 2024)

Authors: Hyunsoo Park, Zhenzhu Li, Aron Walsh

Volume & Issue: Volume 7, Issue 7

Pages: 2355-2367

Abstract:

Summary

The directed design and discovery of compounds with pre-determined properties is a long-standing challenge in materials research. We provide a perspective on progress toward achieving this goal using generative models for chemical compositions and crystal structures based on a set of powerful statistical techniques drawn from the artificial intelligence community. We introduce the central concepts underpinning generative models of crystalline materials. Coverage is provided of early implementations for inorganic crystals based on generative adversarial networks and variational autoencoders through to ongoing progress involving autoregressive and diffusion models. The influence of the choice of chemical representation and the generative architecture is discussed, along with metrics for quantifying the quality of the hypothetical compounds produced. While further developments are required to enable realistic predictions drawn from richer structure and property datasets, generative artificial intelligence is already proving to be complementary to traditional materials design strategies.

Tags: None

168. Promising directions of machine learning for partial differential equations, Nature Computational Science (July 2024)

Authors: Steven L. Brunton, J. Nathan Kutz

Volume & Issue: Volume 4, Issue 7

Pages: 483-494

Abstract: Machine learning has enabled major advances in the field of partial differential equations. This Review discusses some of these efforts and other ongoing challenges and opportunities for development.

Tags: None

167. Prediction of DNA origami shape using graph neural network, Nature Materials (July 2024)

Authors: Chien Truong-Quoc, Jae Young Lee, Kyung Soo Kim, Do-Nyun Kim

Volume & Issue: Volume 23, Issue 7

Pages: 984-992

Abstract: Limited datasets hinder the accurate prediction of DNA origami structures. A data-driven and physics-informed approach for model training is presented using a graph neural network to facilitate the rapid virtual prototyping of DNA-based nanostructures.

Tags: None

166. From bulk effective mass to 2D carrier mobility accurate prediction via adversarial transfer learning, Nature Communications (June 25, 2024)

Authors: Xinyu Chen, Shuaihua Lu, Qian Chen, Qionghua Zhou, Jinlan Wang

Volume & Issue: Volume 15, Issue 1

Pages: 5391

Abstract: Data scarcity is one of the critical bottlenecks to utilizing machine learning in material discovery. Transfer learning can use existing big data to assist property prediction on small data sets, but the premise is that there must be a strong correlation between large and small data sets. To extend its applicability in scenarios with different properties and materials, here we develop a hybrid framework combining adversarial transfer learning and expert knowledge, which enables the direct prediction of carrier mobility of two-dimensional (2D) materials using the knowledge learned from bulk effective mass. Specifically, adversarial training ensures that only common knowledge between bulk and 2D materials is extracted while expert knowledge is incorporated to further improve the prediction accuracy and generalizability. Successfully, 2D carrier mobilities are predicted with the accuracy over 90% from only crystal structure, and 21 2D semiconductors with carrier mobilities far exceeding silicon and suitable bandgap are successfully screened out. This work enables transfer learning in simultaneous cross-property and cross-material scenarios, providing an effective tool to predict intricate material properties with limited data.

Tags: Computational methods, Two-dimensional materials

165. LLMatDesign: Autonomous Materials Discovery with Large Language Models, Preprint (June 19, 2024)

Authors: Shuyi Jia, Chao Zhang, Victor Fung

Abstract: Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space. Recent advances in machine learning have enabled data-driven methods to rapidly screen or generate promising materials, but these methods still depend heavily on very large quantities of training data and often lack the flexibility and chemical understanding often desired in materials discovery. We introduce LLMatDesign, a novel language-based framework for interpretable materials design powered by large language models (LLMs). LLMatDesign utilizes LLM agents to translate human instructions, apply modifications to materials, and evaluate outcomes using provided tools. By incorporating self-reflection on its previous decisions, LLMatDesign adapts rapidly to new tasks and conditions in a zero-shot manner. A systematic evaluation of LLMatDesign on several materials design tasks, in silico, validates LLMatDesign’s effectiveness in developing new materials with user-defined target properties in the small data regime. Our framework demonstrates the remarkable potential of autonomous LLM-guided materials discovery in the computational setting and towards self-driving laboratories in the future.

Tags: Condensed Matter - Materials Science, Computer Science - Artificial Intelligence, Computer Science - Computation and Language

164. Deep learning probability flows and entropy production rates in active matter, Proceedings of the National Academy of Sciences (June 18, 2024)

Authors: Nicholas M. Boffi, Eric Vanden-Eijnden

Volume & Issue: Volume 121, Issue 25

Pages: e2318106121

Abstract: Active matter systems, from self-propelled colloids to motile bacteria, are characterized by the conversion of free energy into useful work at the microscopic scale. They involve physics beyond the reach of equilibrium statistical mechanics, and a persistent challenge has been to understand the nature of their nonequilibrium states. The entropy production rate and the probability current provide quantitative ways to do so by measuring the breakdown of time-reversal symmetry. Yet, their efficient computation has remained elusive, as they depend on the system’s unknown and high-dimensional probability density. Here, building upon recent advances in generative modeling, we develop a deep learning framework to estimate the score of this density. We show that the score, together with the microscopic equations of motion, gives access to the entropy production rate, the probability current, and their decomposition into local contributions from individual particles. To represent the score, we introduce a spatially local transformer network architecture that learns high-order interactions between particles while respecting their underlying permutation symmetry. We demonstrate the broad utility and scalability of the method by applying it to several high-dimensional systems of active particles undergoing motility-induced phase separation (MIPS). We show that a single network trained on a system of 4,096 particles at one packing fraction can generalize to other regions of the phase diagram, including to systems with as many as 32,768 particles. We use this observation to quantify the spatial structure of the departure from equilibrium in MIPS as a function of the number of particles and the packing fraction.

Tags: None

163. Generative learning facilitated discovery of high-entropy ceramic dielectrics for capacitive energy storage, Nature Communications (June 10, 2024)

Authors: Wei Li, Zhong-Hui Shen, Run-Lin Liu, Xiao-Xiao Chen, Meng-Fan Guo, Jin-Ming Guo, Hua Hao, Yang Shen, Han-Xing Liu, Long-Qing Chen, Ce-Wen Nan

Volume & Issue: Volume 15, Issue 1

Pages: 4940

Abstract: Dielectric capacitors offer great potential for advanced electronics due to their high power densities, but their energy density still needs to be further improved. High-entropy strategy has emerged as an effective method for improving energy storage performance, however, discovering new high-entropy systems within a high-dimensional composition space is a daunting challenge for traditional trial-and-error experiments. Here, based on phase-field simulations and limited experimental data, we propose a generative learning approach to accelerate the discovery of high-entropy dielectrics in a practically infinite exploration space of over 1011 combinations. By encoding-decoding latent space regularities to facilitate data sampling and forward inference, we employ inverse design to screen out the most promising combinations via a ranking strategy. Through only 5 sets of targeted experiments, we successfully obtain a Bi(Mg0.5Ti0.5)O3-based high-entropy dielectric film with a significantly improved energy density of 156 J cm−3 at an electric field of 5104 kV cm−1, surpassing the pristine film by more than eight-fold. This work introduces an effective and innovative avenue for designing high-entropy dielectrics with drastically reduced experimental cycles, which could be also extended to expedite the design of other multicomponent material systems with desired properties.

Tags: Energy storage, Ferroelectrics and multiferroics

162. Machine learning-guided realization of full-color high-quantum-yield carbon quantum dots, Nature Communications (June 06, 2024)

Authors: Huazhang Guo, Yuhao Lu, Zhendong Lei, Hong Bao, Mingwan Zhang, Zeming Wang, Cuntai Guan, Bijun Tang, Zheng Liu, Liang Wang

Volume & Issue: Volume 15, Issue 1

Pages: 4843

Abstract: Carbon quantum dots (CQDs) have versatile applications in luminescence, whereas identifying optimal synthesis conditions has been challenging due to numerous synthesis parameters and multiple desired outcomes, creating an enormous search space. In this study, we present a novel multi-objective optimization strategy utilizing a machine learning (ML) algorithm to intelligently guide the hydrothermal synthesis of CQDs. Our closed-loop approach learns from limited and sparse data, greatly reducing the research cycle and surpassing traditional trial-and-error methods. Moreover, it also reveals the intricate links between synthesis parameters and target properties and unifies the objective function to optimize multiple desired properties like full-color photoluminescence (PL) wavelength and high PL quantum yields (PLQY). With only 63 experiments, we achieve the synthesis of full-color fluorescent CQDs with high PLQY exceeding 60% across all colors. Our study represents a significant advancement in ML-guided CQDs synthesis, setting the stage for developing new materials with multiple desired properties.

Tags: Computational methods, Materials chemistry, Quantum dots, Theoretical chemistry

161. Closed-Loop Multi-Objective Optimization for Cu–Sb–S Photo-Electrocatalytic Materials’ Discovery, Advanced Materials (June 04, 2024)

Authors: Yang Bai, Zi Hui Jonathan Khoo, Riko I Made, Huiqing Xie, Carina Yi Jing Lim, Albertus Denny Handoko, Vijila Chellappan, Jianwei Jayce Cheng, Fengxia Wei, Yee-Fun Lim, Kedar Hippalgaonkar

Volume & Issue: Volume 36, Issue 2

Pages: 2304269

Abstract: Copper antimony sulfides are regarded as promising catalysts for photo-electrochemical water splitting because of their earth abundance and broad light absorption. The unique photoactivity of copper antimony sulfides is dependent on their various crystalline structures and atomic compositions. Here, a closed-loop workflow is built, which explores Cu–Sb–S compositional space to optimize its photo-electrocatalytic hydrogen evolution from water, by integrating a high-throughput robotic platform, characterization techniques, and machine learning (ML) optimization workflow. The multi-objective optimization model discovers optimum experimental conditions after only nine cycles of integrated experiments–machine learning loop. Photocurrent testing at 0 V versus reversible hydrogen electrode (RHE) confirms the expected correlation between the materials’ properties and photocurrent. An optimum photocurrent of −186 µA cm−2 is observed on Cu–Sb–S in the ratio of 9:45:46 in the form of single-layer coating on F-doped SnO2 (FTO) glass with a corresponding bandgap of 1.85 eV and 63.2% Cu1+/Cu species content. The targeted intelligent search reveals a nonobvious CuSbS composition that exhibits 2.3 times greater activity than baseline results from random sampling.

Tags: machine learning, high-throughput experiments, photo-electrochemical water splitting

160. ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models, Nature Communications (June 03, 2024)

Authors: Yeonghun Kang, Jihan Kim

Volume & Issue: Volume 15, Issue 1

Pages: 4705

Abstract: ChatMOF is an artificial intelligence (AI) system that is built to predict and generate metal-organic frameworks (MOFs). By leveraging a large-scale language model (GPT-4, GPT-3.5-turbo, and GPT-3.5-turbo-16k), ChatMOF extracts key details from textual inputs and delivers appropriate responses, thus eliminating the necessity for rigid and formal structured queries. The system is comprised of three core components (i.e., an agent, a toolkit, and an evaluator) and it forms a robust pipeline that manages a variety of tasks, including data retrieval, property prediction, and structure generations. ChatMOF shows high accuracy rates of 96.9% for searching, 95.7% for predicting, and 87.5% for generating tasks with GPT-4. Additionally, it successfully creates materials with user-desired properties from natural language. The study further explores the merits and constraints of utilizing large language models (LLMs) in combination with database and machine learning in material sciences and showcases its transformative potential for future advancements.

Tags: Computational methods, Metal–organic frameworks, Structure prediction

159. Accurate structure prediction of biomolecular interactions with AlphaFold 3, Nature (June 2024)

Authors: Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvilė Žemgulytė, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, John M. Jumper

Volume & Issue: Volume 630, Issue 8016

Pages: 493-500

Abstract: The introduction of AlphaFold 21 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2–6. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein–ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein–nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody–antigen prediction accuracy compared with AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.

Tags: Machine learning, Drug discovery, Protein structure predictions, Structural biology

158. Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations, Nature Methods (June 2024)

Authors: Srinivas Niranj Chandrasekaran, Beth A. Cimini, Amy Goodale, Lisa Miller, Maria Kost-Alimova, Nasim Jamali, John G. Doench, Briana Fritchman, Adam Skepner, Michelle Melanson, Alexandr A. Kalinin, John Arevalo, Marzieh Haghighi, Juan C. Caicedo, Daniel Kuhn, Desiree Hernandez, James Berstler, Hamdah Shafqat-Abbasi, David E. Root, Susanne E. Swalley, Sakshi Garg, Shantanu Singh, Anne E. Carpenter

Volume & Issue: Volume 21, Issue 6

Pages: 1114-1121

Abstract: The identification of genetic and chemical perturbations with similar impacts on cell morphology can elucidate compounds’ mechanisms of action or novel regulators of genetic pathways. Research on methods for identifying such similarities has lagged due to a lack of carefully designed and well-annotated image sets of cells treated with chemical and genetic perturbations. Here we create such a Resource dataset, CPJUMP1, in which each perturbed gene’s product is a known target of at least two chemical compounds in the dataset. We systematically explore the directionality of correlations among perturbations that target the same protein encoded by a given gene, and we find that identifying matches between chemical and genetic perturbations is a challenging task. Our dataset and baseline analyses provide a benchmark for evaluating methods that measure perturbation similarities and impact, and more generally, learn effective representations of cellular state from microscopy images. Such advancements would accelerate the applications of image-based profiling of cellular states, such as uncovering drug mode of action or probing functional genomics.

Tags: Drug discovery, Image processing, High-throughput screening

157. Machine intelligence-accelerated discovery of all-natural plastic substitutes, Nature Nanotechnology (June 2024)

Authors: Tianle Chen, Zhenqian Pang, Shuaiming He, Yang Li, Snehi Shrestha, Joshua M. Little, Haochen Yang, Tsai-Chun Chung, Jiayue Sun, Hayden Christopher Whitley, I.-Chi Lee, Taylor J. Woehl, Teng Li, Liangbing Hu, Po-Yen Chen

Volume & Issue: Volume 19, Issue 6

Pages: 782-791

Abstract: One possible solution against the accumulation of petrochemical plastics in natural environments is to develop biodegradable plastic substitutes using natural components. However, discovering all-natural alternatives that meet specific properties, such as optical transparency, fire retardancy and mechanical resilience, which have made petrochemical plastics successful, remains challenging. Current approaches still rely on iterative optimization experiments. Here we show an integrated workflow that combines robotics and machine learning to accelerate the discovery of all-natural plastic substitutes with programmable optical, thermal and mechanical properties. First, an automated pipetting robot is commanded to prepare 286 nanocomposite films with various properties to train a support-vector machine classifier. Next, through 14 active learning loops with data augmentation, 135 all-natural nanocomposites are fabricated stagewise, establishing an artificial neural network prediction model. We demonstrate that the prediction model can conduct a two-way design task: (1) predicting the physicochemical properties of an all-natural nanocomposite from its composition and (2) automating the inverse design of biodegradable plastic substitutes that fulfils various user-specific requirements. By harnessing the model’s prediction capabilities, we prepare several all-natural substitutes, that could replace non-biodegradable counterparts as exhibiting analogous properties. Our methodology integrates robot-assisted experiments, machine intelligence and simulation tools to accelerate the discovery and design of eco-friendly plastic substitutes starting from building blocks taken from the generally-recognized-as-safe database.

Tags: Design, synthesis and processing, Composites

156. Machine learning-aided generative molecular design, Nature Machine Intelligence (June 2024)

Authors: Yuanqi Du, Arian R. Jamasb, Jeff Guo, Tianfan Fu, Charles Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, Tom L. Blundell

Volume & Issue: Volume 6, Issue 6

Pages: 589-604

Abstract: Data-driven generative methods have the potential to greatly facilitate molecular design tasks for drug design.

Tags: None

155. High-Entropy Photothermal Materials, Advanced Materials (June 2024)

Authors: Cheng-Yu He, Yang Li, Zhuo-Hao Zhou, Bao-Hua Liu, Xiang-Hu Gao

Volume & Issue: Volume 36, Issue 24

Pages: 2400920

Abstract: Abstract High-entropy (HE) materials, celebrated for their extraordinary chemical and physical properties, have garnered increasing attention for their broad applications across diverse disciplines. The expansive compositional range of these materials allows for nuanced tuning of their properties and innovative structural designs. Recent advances have been centered on their versatile photothermal conversion capabilities, effective across the full solar spectrum (300?2500 nm). The HE effect, coupled with hysteresis diffusion, imparts these materials with desirable thermal and chemical stability. These attributes position HE materials as a revolutionary alternative to traditional photothermal materials, signifying a transformative shift in photothermal technology. This review delivers a comprehensive summary of the current state of knowledge regarding HE photothermal materials, emphasizing the intricate relationship between their compositions, structures, light-absorbing mechanisms, and optical properties. Furthermore, the review outlines the notable advances in HE photothermal materials, emphasizing their contributions to areas, such as solar water evaporation, personal thermal management, solar thermoelectric generation, catalysis, and biomedical applications. The review culminates in presenting a roadmap that outlines prospective directions for future research in this burgeoning field, and also outlines fruitful ways to develop advanced HE photothermal materials and to expand their promising applications.

Tags: bandgap, high-entropy materials, photothermal applications, photothermal conversion, stability

154. Diffusion-based deep learning method for augmenting ultrastructural imaging and volume electron microscopy, Nature Communications (June 01, 2024)

Authors: Chixiang Lu, Kai Chen, Heng Qiu, Xiaojun Chen, Gu Chen, Xiaojuan Qi, Haibo Jiang

Volume & Issue: Volume 15, Issue 1

Pages: 4677

Abstract: Electron microscopy (EM) revolutionized the way to visualize cellular ultrastructure. Volume EM (vEM) has further broadened its three-dimensional nanoscale imaging capacity. However, intrinsic trade-offs between imaging speed and quality of EM restrict the attainable imaging area and volume. Isotropic imaging with vEM for large biological volumes remains unachievable. Here, we developed EMDiffuse, a suite of algorithms designed to enhance EM and vEM capabilities, leveraging the cutting-edge image generation diffusion model. EMDiffuse generates realistic predictions with high resolution ultrastructural details and exhibits robust transferability by taking only one pair of images of 3 megapixels to fine-tune in denoising and super-resolution tasks. EMDiffuse also demonstrated proficiency in the isotropic vEM reconstruction task, generating isotropic volume even in the absence of isotropic training data. We demonstrated the robustness of EMDiffuse by generating isotropic volumes from seven public datasets obtained from different vEM techniques and instruments. The generated isotropic volume enables accurate three-dimensional nanoscale ultrastructure analysis. EMDiffuse also features self-assessment functionalities on predictions’ reliability. We envision EMDiffuse to pave the way for investigations of the intricate subcellular nanoscale ultrastructure within large volumes of biological systems.

Tags: Imaging, Microscopy, Software

153. Large-Language-Model-Based AI Agent for Organic Semiconductor Device Research, Advanced Materials (May 30, 2024)

Authors: Qian Zhang, Yongxu Hu, Jiaxin Yan, Hengyue Zhang, Xinyi Xie, Jie Zhu, Huchao Li, Xinxin Niu, Liqiang Li, Yajing Sun, Wenping Hu

Volume & Issue: Volume 36, Issue 32

Pages: 2405163

Abstract: Large language models (LLMs) have attracted widespread attention recently, however, their application in specialized scientific fields still requires deep adaptation. Here, an artificial intelligence (AI) agent for organic field-effect transistors (OFETs) is designed by integrating the generative pre-trained transformer 4 (GPT-4) model with well-trained machine learning (ML) algorithms. It can efficiently extract the experimental parameters of OFETs from scientific literature and reshape them into a structured database, achieving precision and recall rates both exceeding 92%. Combined with well-trained ML models, this AI agent can further provide targeted guidance and suggestions for device design. With prompt engineering and human-in-loop strategies, the agent extracts sufficient information of 709 OFETs from 277 research articles across different publishers and gathers them into a standardized database containing more than 10 000 device parameters. Using this database, a ML model based on Extreme Gradient Boosting is trained for device performance judgment. Combined with the interpretation of the high-precision model, the agent has provided a feasible optimization scheme that has tripled the charge transport properties of 2,6-diphenyldithieno[3,2-b:2′,3′-d]thiophene OFETs. This work is an effective practice of LLMs in the field of organic optoelectronic devices and expands the research paradigm of organic optoelectronic materials and devices.

Tags: machine learning, large language models, accelerated design, organic field-effect transistors

152. Physics-informed neural network for lithium-ion battery degradation stable modeling and prognosis, Nature Communications (May 21, 2024)

Authors: Fujin Wang, Zhi Zhai, Zhibin Zhao, Yi Di, Xuefeng Chen

Volume & Issue: Volume 15, Issue 1

Pages: 4332

Abstract: Accurate state-of-health (SOH) estimation is critical for reliable and safe operation of lithium-ion batteries. However, reliable and stable battery SOH estimation remains challenging due to diverse battery types and operating conditions. In this paper, we propose a physics-informed neural network (PINN) for accurate and stable estimation of battery SOH. Specifically, we model the attributes that affect the battery degradation from the perspective of empirical degradation and state space equations, and utilize neural networks to capture battery degradation dynamics. A general feature extraction method is designed to extract statistical features from a short period of data before the battery is fully charged, enabling our method applicable to different battery types and charge/discharge protocols. Additionally, we generate a comprehensive dataset consisting of 55 lithium-nickel-cobalt-manganese-oxide (NCM) batteries. Combined with three other datasets from different manufacturers, we use a total of 387 batteries with 310,705 samples to validate our method. The mean absolute percentage error (MAPE) is 0.87%. Our proposed PINN has demonstrated remarkable performance in regular experiments, small sample experiments, and transfer experiments when compared to alternative neural networks. This study highlights the promise of physics-informed machine learning for battery degradation modeling and SOH estimation.

Tags: Batteries

151. MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures, Preprint (May 10, 2024)

Authors: Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, Matthew Horton, Robert Pinsler, Andrew Fowler, Daniel Zügner, Tian Xie, Jake Smith, Lixin Sun, Qian Wang, Lingyu Kong, Chang Liu, Hongxia Hao, Ziheng Lu

Abstract: Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computations, for efficient atomistic simulations at first-principles level and accurate prediction of broad material properties across the periodic table, spanning temperatures from 0 to 5000 K and pressures up to 1000 GPa. Out-of-the-box, the model serves as a machine learning force field, and shows remarkable capabilities not only in predicting ground-state material structures and energetics, but also in simulating their behavior under realistic temperatures and pressures, signifying an up to ten-fold enhancement in precision compared to the prior best-in-class. This enables MatterSim to compute materials’ lattice dynamics, mechanical and thermodynamic properties, and beyond, to an accuracy comparable with first-principles methods. Specifically, MatterSim predicts Gibbs free energies for a wide range of inorganic solids with near-first-principles accuracy and achieves a 15 meV/atom resolution for temperatures up to 1000K compared with experiments. This opens an opportunity to predict experimental phase diagrams of materials at minimal computational cost. Moreover, MatterSim also serves as a platform for continuous learning and customization by integrating domain-specific data. The model can be fine-tuned for atomistic simulations at a desired level of theory or for direct structure-to-property predictions, achieving high data efficiency with a reduction in data requirements by up to 97%.

Tags: Condensed Matter - Materials Science

150. Robotic synthesis decoded through phase diagram mastery, Nature Synthesis (May 2024)

Authors: Jeffrey A. Bennett, Milad Abolhasani

Volume & Issue: Volume 3, Issue 5

Pages: 565-567

Abstract: Selection principles for precursors are decoded from phase diagrams and applied to the synthesis of inorganic oxide materials.

Tags: None

149. Navigating phase diagram complexity to guide robotic inorganic materials synthesis, Nature Synthesis (May 2024)

Authors: Jiadong Chen, Samuel R. Cross, Lincoln J. Miara, Jeong-Ju Cho, Yan Wang, Wenhao Sun

Volume & Issue: Volume 3, Issue 5

Pages: 606-614

Abstract: Efficient synthesis recipes are needed to streamline the manufacturing of complex materials and to accelerate the realization of theoretically predicted materials. Often, the solid-state synthesis of multicomponent oxides is impeded by undesired by-product phases, which can kinetically trap reactions in an incomplete non-equilibrium state. Here we report a thermodynamic strategy to navigate high-dimensional phase diagrams in search of precursors that circumvent low-energy, competing by-products, while maximizing the reaction energy to drive fast phase transformation kinetics. Using a robotic inorganic materials synthesis laboratory, we perform a large-scale experimental validation of our precursor selection principles. For a set of 35 target quaternary oxides, with chemistries representative of intercalation battery cathodes and solid-state electrolytes, our robot performs 224 reactions spanning 27 elements with 28 unique precursors, operated by 1 human experimentalist. Our predicted precursors frequently yield target materials with higher phase purity than traditional precursors. Robotic laboratories offer an exciting platform for data-driven experimental synthesis science, from which we can develop fundamental insights to guide both human and robotic chemists.

Tags: Computational methods, Design, synthesis and processing

148. Designing semiconductor materials and devices in the post-Moore era by tackling computational challenges with data-driven strategies, Nature Computational Science (May 2024)

Authors: Jiahao Xie, Yansong Zhou, Muhammad Faizan, Zewei Li, Tianshu Li, Yuhao Fu, Xinjiang Wang, Lijun Zhang

Volume & Issue: Volume 4, Issue 5

Pages: 322-333

Abstract: Discovering improved semiconductor materials is essential for optimal device fabrication. In this Perspective, data-driven computational frameworks for semiconductor discovery and device development are discussed, including the challenges and opportunities moving forward.

Tags: None

147. Synthetic Lagrangian turbulence by generative diffusion models, Nature Machine Intelligence (April 2024)

Authors: T. Li, L. Biferale, F. Bonaccorso, M. A. Scarpolini, M. Buzzicotti

Volume & Issue: Volume 6, Issue 4

Pages: 393-403

Abstract: Lagrangian turbulence lies at the core of numerous applied and fundamental problems related to the physics of dispersion and mixing in engineering, biofluids, the atmosphere, oceans and astrophysics. Despite exceptional theoretical, numerical and experimental efforts conducted over the past 30 years, no existing models are capable of faithfully reproducing statistical and topological properties exhibited by particle trajectories in turbulence. We propose a machine learning approach, based on a state-of-the-art diffusion model, to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers, thereby bypassing the need for direct numerical simulations or experiments to obtain reliable Lagrangian data. Our model demonstrates the ability to reproduce most statistical benchmarks across time scales, including the fat-tail distribution for velocity increments, the anomalous power law and the increased intermittency around the dissipative scale. Slight deviations are observed below the dissipative scale, particularly in the acceleration and flatness statistics. Surprisingly, the model exhibits strong generalizability for extreme events, producing events of higher intensity and rarity that still match the realistic statistics. This paves the way for producing synthetic high-quality datasets for pretraining various downstream applications of Lagrangian turbulence.

Tags: Fluid dynamics, Statistical physics

146. Universal chemical programming language for robotic synthesis repeatability, Nature Synthesis (April 2024)

Authors: Robert Rauschen, Mason Guy, Jason E. Hein, Leroy Cronin

Volume & Issue: Volume 3, Issue 4

Pages: 488-496

Abstract: The use of a universal chemical programming language (χDL) to encode and execute synthesis procedures for a variety of chemical reactions is reported, including reductive amination, ring formation, esterification, carbon–carbon bond formation and amide coupling. These procedures are validated and repeated in two international laboratories and on three independent robots.

Tags: None

145. Autonomous closed-loop mechanistic investigation of molecular electrochemistry via automation, Nature Communications (March 30, 2024)

Authors: Hongyuan Sheng, Jingwen Sun, Oliver Rodríguez, Benjamin B. Hoar, Weitong Zhang, Danlei Xiang, Tianhua Tang, Avijit Hazra, Daniel S. Min, Abigail G. Doyle, Matthew S. Sigman, Cyrille Costentin, Quanquan Gu, Joaquín Rodríguez-López, Chong Liu

Volume & Issue: Volume 15, Issue 1

Pages: 2781

Abstract: Electrochemical research often requires stringent combinations of experimental parameters that are demanding to manually locate. Recent advances in automated instrumentation and machine-learning algorithms unlock the possibility for accelerated studies of electrochemical fundamentals via high-throughput, online decision-making. Here we report an autonomous electrochemical platform that implements an adaptive, closed-loop workflow for mechanistic investigation of molecular electrochemistry. As a proof-of-concept, this platform autonomously identifies and investigates an EC mechanism, an interfacial electron transfer (E step) followed by a solution reaction (C step), for cobalt tetraphenylporphyrin exposed to a library of organohalide electrophiles. The generally applicable workflow accurately discerns the EC mechanism’s presence amid negative controls and outliers, adaptively designs desired experimental conditions, and quantitatively extracts kinetic information of the C step spanning over 7 orders of magnitude, from which mechanistic insights into oxidative addition pathways are gained. This work opens opportunities for autonomous mechanistic discoveries in self-driving electrochemistry laboratories without manual intervention.

Tags: Analytical chemistry, Electrochemistry

144. Crystal Structure Assignment for Unknown Compounds from X-ray Diffraction Patterns with Deep Learning, Journal of the American Chemical Society (March 27, 2024)

Authors: Litao Chen, Bingxu Wang, Wentao Zhang, Shisheng Zheng, Zhefeng Chen, Mingzheng Zhang, Cheng Dong, Feng Pan, Shunning Li

Volume & Issue: Volume 146, Issue 12

Pages: 8098-8109

Abstract: Determining the structures of previously unseen compounds from experimental characterizations is a crucial part of materials science. It requires a step of searching for the structure type that conforms to the lattice of the unknown compound, which enables the pattern matching process for characterization data, such as X-ray diffraction (XRD) patterns. However, this procedure typically places a high demand on domain expertise, thus creating an obstacle for computer-driven automation. Here, we address this challenge by leveraging a deep-learning model composed of a union of convolutional residual neural networks. The accuracy of the model is demonstrated on a dataset of over 60,000 different compounds for 100 structure types, and additional categories can be integrated without the need to retrain the existing networks. We also unravel the operation of the deep-learning black box and highlight the way in which the resemblance between the unknown compound and a structure type is quantified based on both local and global characteristics in XRD patterns. This computational tool opens new avenues for automating structure analysis on materials unearthed in high-throughput experimentation.

Tags: None

143. Machine-Learning Assisted Screening Proton Conducting Co/Fe based Oxide for the Air Electrode of Protonic Solid Oxide Cell, Advanced Functional Materials (March 18, 2024)

Authors: Ning Wang, Baoyin Yuan, Fangyuan Zheng, Shanyun Mo, Xiaohan Zhang, Lei Du, Lixin Xing, Ling Meng, Lei Zhao, Yoshitaka Aoki, Chunmei Tang, Siyu Ye

Volume & Issue: Volume 34, Issue 12

Pages: 2309855

Abstract: Proton-conducting solid oxide cells (P-SOCs) as energy conversion devices for power generation and hydrogen production have attracted increasing attention recently. The lack of efficient proton-conducting air electrodes is a huge obstacle to developing high-performance P-SOCs. The currently widely used air electrode material is Co/Fe based perovskite oxide, however, there is still no systematic research on studying and comparing the roles of diversiform elements at the B site for Co/Fe based perovskite oxide. Here, a machine learning (ML) model with eXtreme Gradient Boosting (XGBoost) algorithm is built to quickly and accurately predict the proton absorption amount of Co/Fe based perovskite oxides with 27 elements dopant at B site. Hereafter, La(Co0.9Ni0.1)O3 (LCN91) is screened by a combination of the ML model and the density functional theory calculation. Finally, LCN91 is applied to the air electrode of P-SOC, and the cell exhibits excellent electrochemical performances in fuel cell and electrolysis modes. The current study not only provides a useful model for screening air electrodes of P-SOC, but also extends the application of ML in exploring the key materials for P-SOCs and other fuel cells/electrolyzers.

Tags: machine learning, air electrode, Co/Fe based oxide, protonic solid oxide cell, XGBoost

142. Autonomous reaction Pareto-front mapping with a self-driving catalysis laboratory, Nature Chemical Engineering (March 2024)

Authors: J. A. Bennett, N. Orouji, M. Khan, S. Sadeghi, J. Rodgers, M. Abolhasani

Volume & Issue: Volume 1, Issue 3

Pages: 240-250

Abstract: A self-driving catalysis laboratory, Fast-Cat, is presented for efficient high-throughput screening of high-pressure, high-temperature, gas–liquid reaction conditions using rhodium-catalyzed hydroformylation as a case study. Fast-Cat is used to Pareto map the reaction space and investigate the varying performance of several phosphorus-based hydroformylation ligands.

Tags: None

141. Digital twins in mechanical and aerospace engineering, Nature Computational Science (March 2024)

Authors: Alberto Ferrari, Karen Willcox

Volume & Issue: Volume 4, Issue 3

Pages: 178-183

Abstract: Digital twins bring value to mechanical and aerospace systems by speeding up development, reducing risk, predicting issues and reducing sustainment costs. Realizing these benefits at scale requires a structured and intentional approach to digital twin conception, design, development, operation and sustainment. To bring maximal value, a digital twin does not need to be an exquisite virtual replica but instead must be envisioned to be fit for purpose, where the determination of fitness depends on the capability needs and the cost–benefit trade-offs.

Tags: Mechanical engineering, Computational science, Applied mathematics, Aerospace engineering

140. Digital twins in medicine, Nature Computational Science (March 2024)

Authors: R. Laubenbacher, B. Mehrad, I. Shmulevich, N. Trayanova

Volume & Issue: Volume 4, Issue 3

Pages: 184-191

Abstract: The digital twin concept, while initially formulated and developed in industry and engineering, has compelling potential applications in medicine. There are, however, major challenges that need to be overcome to fully embrace digital twin technology in the medical context.

Tags: None

139. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics, Nature Machine Intelligence (March 2024)

Authors: Kyle Swanson, Gary Liu, Denise B. Catacutan, Autumn Arnold, James Zou, Jonathan M. Stokes

Volume & Issue: Volume 6, Issue 3

Pages: 338-353

Abstract: AI methods can discover new antibiotics but existing methods have limitations. Swanson et al. develop a generative AI model that learns to design molecules that are easy to synthesize. The authors apply the model to design and validate novel antibiotics against the bacterial pathogen Acinetobacter baumannii.

Tags: None

138. A comprehensive transformer-based approach for high-accuracy gas adsorption predictions in metal-organic frameworks, Nature Communications (March 01, 2024)

Authors: Jingqi Wang, Jiapeng Liu, Hongshuai Wang, Musen Zhou, Guolin Ke, Linfeng Zhang, Jianzhong Wu, Zhifeng Gao, Diannan Lu

Volume & Issue: Volume 15, Issue 1

Pages: 1904

Abstract: Gas separation is crucial for industrial production and environmental protection, with metal-organic frameworks (MOFs) offering a promising solution due to their tunable structural properties and chemical compositions. Traditional simulation approaches, such as molecular dynamics, are complex and computationally demanding. Although feature engineering-based machine learning methods perform better, they are susceptible to overfitting because of limited labeled data. Furthermore, these methods are typically designed for single tasks, such as predicting gas adsorption capacity under specific conditions, which restricts the utilization of comprehensive datasets including all adsorption capacities. To address these challenges, we propose Uni-MOF, an innovative framework for large-scale, three-dimensional MOF representation learning, designed for multi-purpose gas prediction. Specifically, Uni-MOF serves as a versatile gas adsorption estimator for MOF materials, employing pure three-dimensional representations learned from over 631,000 collected MOF and COF structures. Our experimental results show that Uni-MOF can automatically extract structural representations and predict adsorption capacities under various operating conditions using a single model. For simulated data, Uni-MOF exhibits remarkably high predictive accuracy across all datasets. Additionally, the values predicted by Uni-MOF correspond with the outcomes of adsorption experiments. Furthermore, Uni-MOF demonstrates considerable potential for broad applicability in predicting a wide array of other properties.

Tags: Computational methods, Scientific data

137. Automated synthesis of oxygen-producing catalysts from Martian meteorites by a robotic AI chemist, Nature Synthesis (March 2024)

Authors: Qing Zhu, Yan Huang, Donglai Zhou, Luyuan Zhao, Lulu Guo, Ruyu Yang, Zixu Sun, Man Luo, Fei Zhang, Hengyu Xiao, Xinsheng Tang, Xuchun Zhang, Tao Song, Xiang Li, Baochen Chong, Junyi Zhou, Yihan Zhang, Baicheng Zhang, Jiaqi Cao, Guozhen Zhang, Song Wang, Guilin Ye, Wanjun Zhang, Haitao Zhao, Shuang Cong, Huirong Li, Li-Li Ling, Zhe Zhang, Weiwei Shang, Jun Jiang, Yi Luo

Volume & Issue: Volume 3, Issue 3

Pages: 319-328

Abstract: Sustained Mars exploration requires in situ synthesis of vital chemicals such as oxygen. Now a data-driven platform for synthesizing oxygen-producing electrocatalysts from Martian meteorites using robotics and artificial intelligence is developed, allowing automated screening of the optimal catalyst formula. This approach demonstrates materials discovery under challenging circumstances and without human intervention.

Tags: None

136. Data-Driven Design for Metamaterials and Multiscale Systems: A Review, Advanced Materials (February 22, 2024)

Authors: Doksoo Lee, Wei (Wayne) Chen, Liwei Wang, Yu-Chin Chan, Wei Chen

Volume & Issue: Volume 36, Issue 8

Pages: 2305254

Abstract: Metamaterials are artificial materials designed to exhibit effective material parameters that go beyond those found in nature. Composed of unit cells with rich designability that are assembled into multiscale systems, they hold great promise for realizing next-generation devices with exceptional, often exotic, functionalities. However, the vast design space and intricate structure–property relationships pose significant challenges in their design. A compelling paradigm that could bring the full potential of metamaterials to fruition is emerging: data-driven design. This review provides a holistic overview of this rapidly evolving field, emphasizing the general methodology instead of specific domains and deployment contexts. Existing research is organized into data-driven modules, encompassing data acquisition, machine learning-based unit cell design, and data-driven multiscale optimization. The approaches are further categorized within each module based on shared principles, analyze and compare strengths and applicability, explore connections between different modules, and identify open research questions and opportunities.

Tags: machine learning, data-driven design, metamaterial, multiscale design, topology optimization

135. Extracting accurate materials data from research papers with conversational language models and prompt engineering, Nature Communications (February 21, 2024)

Authors: Maciej P. Polak, Dane Morgan

Volume & Issue: Volume 15, Issue 1

Pages: 1569

Abstract: There has been a growing effort to replace manual extraction of data from research papers with automated data extraction based on natural language processing, language models, and recently, large language models (LLMs). Although these methods enable efficient extraction of data from large sets of research papers, they require a significant amount of up-front effort, expertise, and coding. In this work, we propose the ChatExtract method that can fully automate very accurate data extraction with minimal initial effort and background, using an advanced conversational LLM. ChatExtract consists of a set of engineered prompts applied to a conversational LLM that both identify sentences with data, extract that data, and assure the data’s correctness through a series of follow-up questions. These follow-up questions largely overcome known issues with LLMs providing factually inaccurate responses. ChatExtract can be applied with any conversational LLMs and yields very high quality data extraction. In tests on materials data, we find precision and recall both close to 90% from the best conversational LLMs, like GPT-4. We demonstrate that the exceptional performance is enabled by the information retention in a conversational model combined with purposeful redundancy and introducing uncertainty through follow-up prompts. These results suggest that approaches similar to ChatExtract, due to their simplicity, transferability, and accuracy are likely to become powerful tools for data extraction in the near future. Finally, databases for critical cooling rates of metallic glasses and yield strengths of high entropy alloys are developed using ChatExtract.

Tags: Theory and computation, Databases

134. Structured information extraction from scientific text with large language models, Nature Communications (February 15, 2024)

Authors: John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S. Rosen, Gerbrand Ceder, Kristin A. Persson, Anubhav Jain

Volume & Issue: Volume 15, Issue 1

Pages: 1418

Abstract: Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.

Tags: Theory and computation, Databases, Materials science, Scientific data

133. A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture, Communications Chemistry (February 14, 2024)

Authors: Hyun Park, Xiaoli Yan, Ruijie Zhu, Eliu A. Huerta, Santanu Chaudhuri, Donny Cooper, Ian Foster, Emad Tajkhorshid

Volume & Issue: Volume 7, Issue 1

Pages: 21

Abstract: Metal-organic frameworks (MOFs) exhibit great promise for CO2 capture. However, finding the best performing materials poses computational and experimental grand challenges in view of the vast chemical space of potential building blocks. Here, we introduce GHP-MOFassemble, a generative artificial intelligence (AI), high performance framework for the rational and accelerated design of MOFs with high CO2 adsorption capacity and synthesizable linkers. GHP-MOFassemble generates novel linkers, assembled with one of three pre-selected metal nodes (Cu paddlewheel, Zn paddlewheel, Zn tetramer) into MOFs in a primitive cubic topology. GHP-MOFassemble screens and validates AI-generated MOFs for uniqueness, synthesizability, structural validity, uses molecular dynamics simulations to study their stability and chemical consistency, and crystal graph neural networks and Grand Canonical Monte Carlo simulations to quantify their CO2 adsorption capacities. We present the top six AI-generated MOFs with CO2 capacities greater than 2m mol g−1, i.e., higher than 96.9% of structures in the hypothetical MOF dataset.

Tags: Computational methods, Computational chemistry, Metal–organic frameworks, Carbon capture and storage

132. Autonomous execution of highly reactive chemical transformations in the Schlenkputer, Nature Chemical Engineering (February 2024)

Authors: Nicola L. Bell, Florian Boser, Andrius Bubliauskas, Dominic R. Willcox, Victor Sandoval Luna, Leroy Cronin

Volume & Issue: Volume 1, Issue 2

Pages: 180-189

Abstract: We design a modular programmable inert-atmosphere Schlenkputer (Schlenk-line computer) for the synthesis and manipulation of highly reactive compounds, including those that are air and moisture sensitive or pyrophoric. Here, to do this, we constructed a programmable Schlenk line using the Chemputer architecture for the inertization of glassware that can achieve a vacuum line pressure of 1.5 × 10−3 mbar, and integrated a range of automated Schlenk glassware for the handling, storage and isolation of reactive compounds at sub-ppm levels of O2 and H2O. This has enabled automation of a range of common organometallic reaction types for the synthesis of four highly reactive compounds: [Cp2TiIII(MeCN)2]+, CeIII{N(SiMe3)2}3, B(C6F5)3 and {DippNacNacMgI}2, which are variously sensitive to temperature, pressure, water and oxygen. Automated crystallization, filtration and sublimation are demonstrated, along with analysis using inline nuclear magnetic resonance or reaction sampling for ultraviolet–visible spectroscopy. Finally, we demonstrate low-temperature reactivity down to −90 °C as well as safe handling and quenching of alkali metal reagents using dynamic feedback from an in situ temperature probe.

Tags: Chemical engineering, Organometallic chemistry

131. Automated self-optimization, intensification, and scale-up of photocatalysis in flow, Science (January 26, 2024)

Authors: Aidan Slattery, Zhenghui Wen, Pauline Tenblad, Jesús Sanjosé-Orduna, Diego Pintossi, Tim den Hartog, Timothy Noël

Volume & Issue: Volume 383, Issue 6681

Pages: eadj1817

Abstract: The optimization, intensification, and scale-up of photochemical processes constitute a particular challenge in a manufacturing environment geared primarily toward thermal chemistry. In this work, we present a versatile flow-based robotic platform to address these challenges through the integration of readily available hardware and custom software. Our open-source platform combines a liquid handler, syringe pumps, a tunable continuous-flow photoreactor, inexpensive Internet of Things devices, and an in-line benchtop nuclear magnetic resonance spectrometer to enable automated, data-rich optimization with a closed-loop Bayesian optimization strategy. A user-friendly graphical interface allows chemists without programming or machine learning expertise to easily monitor, analyze, and improve photocatalytic reactions with respect to both continuous and discrete variables. The system’s effectiveness was demonstrated by increasing overall reaction yields and improving space-time yields compared with those of previously reported processes.

Tags: None

130. A dynamic knowledge graph approach to distributed self-driving laboratories, Nature Communications (January 23, 2024)

Authors: Jiaru Bai, Sebastian Mosbach, Connor J. Taylor, Dogancan Karan, Kok Foong Lee, Simon D. Rihm, Jethro Akroyd, Alexei A. Lapkin, Markus Kraft

Volume & Issue: Volume 15, Issue 1

Pages: 462

Abstract: The ability to integrate resources and share knowledge across organisations empowers scientists to expedite the scientific discovery process. This is especially crucial in addressing emerging global challenges that require global solutions. In this work, we develop an architecture for distributed self-driving laboratories within The World Avatar project, which seeks to create an all-encompassing digital twin based on a dynamic knowledge graph. We employ ontologies to capture data and material flows in design-make-test-analyse cycles, utilising autonomous agents as executable knowledge components to carry out the experimentation workflow. Data provenance is recorded to ensure its findability, accessibility, interoperability, and reusability. We demonstrate the practical application of our framework by linking two robots in Cambridge and Singapore for a collaborative closed-loop optimisation for a pharmaceutically-relevant aldol condensation reaction in real-time. The knowledge graph autonomously evolves toward the scientist’s research goals, with the two robots effectively generating a Pareto front for cost-yield optimisation in three days.

Tags: Chemical engineering, Software, Information technology

129. Expanding the Horizons of Machine Learning in Nanomaterials to Chiral Nanostructures, Advanced Materials (January 19, 2024)

Authors: Vera Kuznetsova, Áine Coogan, Dmitry Botov, Yulia Gromova, Elena V. Ushakova, Yurii K. Gun’ko

Volume & Issue: Volume 36, Issue 18

Pages: 2308912

Abstract: Machine learning holds significant research potential in the field of nanotechnology, enabling nanomaterial structure and property predictions, facilitating materials design and discovery, and reducing the need for time-consuming and labor-intensive experiments and simulations. In contrast to their achiral counterparts, the application of machine learning for chiral nanomaterials is still in its infancy, with a limited number of publications to date. This is despite the great potential of machine learning to advance the development of new sustainable chiral materials with high values of optical activity, circularly polarized luminescence, and enantioselectivity, as well as for the analysis of structural chirality by electron microscopy. In this review, an analysis of machine learning methods used for studying achiral nanomaterials is provided, subsequently offering guidance on adapting and extending this work to chiral nanomaterials. An overview of chiral nanomaterials within the framework of synthesis–structure–property–application relationships is presented and insights on how to leverage machine learning for the study of these highly complex relationships are provided. Some key recent publications are reviewed and discussed on the application of machine learning for chiral nanomaterials. Finally, the review captures the key achievements, ongoing challenges, and the prospective outlook for this very important research field.

Tags: machine learning, artificial intelligence, chirality, nanomaterials, nanoparticles

128. Universal machine learning aided synthesis approach of two-dimensional perovskites in a typical laboratory, Nature Communications (January 02, 2024)

Authors: Yilei Wu, Chang-Feng Wang, Ming-Gang Ju, Qiangqiang Jia, Qionghua Zhou, Shuaihua Lu, Xinying Gao, Yi Zhang, Jinlan Wang

Volume & Issue: Volume 15, Issue 1

Pages: 138

Abstract: The past decade has witnessed the significant efforts in novel material discovery in the use of data-driven techniques, in particular, machine learning (ML). However, since it needs to consider the precursors, experimental conditions, and availability of reactants, material synthesis is generally much more complex than property and structure prediction, and very few computational predictions are experimentally realized. To solve these challenges, a universal framework that integrates high-throughput experiments, a priori knowledge of chemistry, and ML techniques such as subgroup discovery and support vector machine is proposed to guide the experimental synthesis of materials, which is capable of disclosing structure-property relationship hidden in high-throughput experiments and rapidly screening out materials with high synthesis feasibility from vast chemical space. Through application of our approach to challenging and consequential synthesis problem of 2D silver/bismuth organic-inorganic hybrid perovskites, we have increased the success rate of the synthesis feasibility by a factor of four relative to traditional approaches. This study provides a practical route for solving multidimensional chemical acceleration problems with small dataset from typical laboratory with limited experimental resources available.

Tags: Energy, Materials for energy and catalysis

127. Active learning guides discovery of a champion four-metal perovskite oxide for oxygen evolution electrocatalysis, Nature Materials (January 2024)

Authors: Junseok Moon, Wiktor Beker, Marta Siek, Jiheon Kim, Hyeon Seok Lee, Taeghwan Hyeon, Bartosz A. Grzybowski

Volume & Issue: Volume 23, Issue 1

Pages: 108-115

Abstract: Multi-metal and perovskite oxides are attractive as oxygen evolution electrocatalysts, and thus far the most promising candidates have emerged from experimental methodologies. Active-learning models supplemented by structural-characterization data and closed-loop experimentation can now identify a perovskite oxide with outstanding performance.

Tags: None

126. Self-driving laboratories to autonomously navigate the protein fitness landscape, Nature Chemical Engineering (January 2024)

Authors: Jacob T. Rapp, Bennett J. Bremer, Philip A. Romero

Volume & Issue: Volume 1, Issue 1

Pages: 97-107

Abstract: Protein engineering has nearly limitless applications across chemistry, energy and medicine, but creating new proteins with improved or novel functions remains slow, labor-intensive and inefficient. Here we present the Self-driving Autonomous Machines for Protein Landscape Exploration (SAMPLE) platform for fully autonomous protein engineering. SAMPLE is driven by an intelligent agent that learns protein sequence–function relationships, designs new proteins and sends designs to a fully automated robotic system that experimentally tests the designed proteins and provides feedback to improve the agent’s understanding of the system. We deploy four SAMPLE agents with the goal of engineering glycoside hydrolase enzymes with enhanced thermal tolerance. Despite showing individual differences in their search behavior, all four agents quickly converge on thermostable enzymes. Self-driving laboratories automate and accelerate the scientific discovery process and hold great potential for the fields of protein engineering and synthetic biology.

Tags: Protein design, Biotechnology

125. Learning skillful medium-range global weather forecasting, Science (December 22, 2023)

Authors: Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, Peter Battaglia

Volume & Issue: Volume 382, Issue 6677

Pages: 1416-1421

Abstract: Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy but does not directly use historical weather data to improve the underlying model. Here, we introduce GraphCast, a machine learning–based method trained directly from reanalysis data. It predicts hundreds of weather variables for the next 10 days at 0.25° resolution globally in under 1 minute. GraphCast significantly outperforms the most accurate operational deterministic systems on 90% of 1380 verification targets, and its forecasts support better severe event prediction, including tropical cyclone tracking, atmospheric rivers, and extreme temperatures. GraphCast is a key advance in accurate and efficient weather forecasting and helps realize the promise of machine learning for modeling complex dynamical systems.

Tags: None

124. Data-driven analysis and prediction of stable phases for high-entropy alloy design, Scientific Reports (December 18, 2023)

Authors: Iman Peivaste, Ericmoore Jossou, Ahmed A. Tiamiyu

Volume & Issue: Volume 13, Issue 1

Pages: 22556

Abstract: High-entropy alloys (HEAs) represent a promising class of materials with exceptional structural and functional properties. However, their design and optimization pose challenges due to the large composition-phase space coupled with the complex and diverse nature of the phase formation dynamics. In this study, a data-driven approach that utilizes machine learning (ML) techniques to predict HEA phases and their composition-dependent phases is proposed. By employing a comprehensive dataset comprising 5692 experimental records encompassing 50 elements and 11 phase categories, we compare the performance of various ML models. Our analysis identifies the most influential features for accurate phase prediction. Furthermore, the class imbalance is addressed by employing data augmentation methods, raising the number of records to 1500 in each category, and ensuring a balanced representation of phase categories. The results show that XGBoost and Random Forest consistently outperform the other models, achieving 86% accuracy in predicting all phases. Additionally, this work provides an extensive analysis of HEA phase formers, showing the contributions of elements and features to the presence of specific phases. We also examine the impact of including different phases on ML model accuracy and feature significance. Notably, the findings underscore the need for ML model selection based on specific applications and desired predictions, as feature importance varies across models and phases. This study significantly advances the understanding of HEA phase formation, enabling targeted alloy design and fostering progress in the field of materials science.

Tags: Computational methods, Materials science

123. Automated classification of big X-ray diffraction data using deep learning models, npj Computational Materials (December 04, 2023)

Authors: Jerardo E. Salgado, Samuel Lerman, Zhaotong Du, Chenliang Xu, Niaz Abdolrahim

Volume & Issue: Volume 9, Issue 1

Pages: 1-12

Abstract: In current in situ X-ray diffraction (XRD) techniques, data generation surpasses human analytical capabilities, potentially leading to the loss of insights. Automated techniques require human intervention, and lack the performance and adaptability required for material exploration. Given the critical need for high-throughput automated XRD pattern analysis, we present a generalized deep learning model to classify a diverse set of materials’ crystal systems and space groups. In our approach, we generate training data with a holistic representation of patterns that emerge from varying experimental conditions and crystal properties. We also employ an expedited learning technique to refine our model’s expertise to experimental conditions. In addition, we optimize model architecture to elicit classification based on Bragg’s Law and use evaluation data to interpret our model’s decision-making. We evaluate our models using experimental data, materials unseen in training, and altered cubic crystals, where we observe state-of-the-art performance and even greater advances in space group classification.

Tags: None

122. Autonomous chemical research with large language models, Nature (December 2023)

Authors: Daniil A. Boiko, Robert MacKnight, Ben Kline, Gabe Gomes

Volume & Issue: Volume 624, Issue 7992

Pages: 570-578

Abstract: Transformer-based large language models are making significant strides in various fields, such as natural language processing1–5, biology6,7, chemistry8–10 and computer programming11,12. Here, we show the development and capabilities of Coscientist, an artificial intelligence system driven by GPT-4 that autonomously designs, plans and performs complex experiments by incorporating large language models empowered by tools such as internet and documentation search, code execution and experimental automation. Coscientist showcases its potential for accelerating research across six diverse tasks, including the successful reaction optimization of palladium-catalysed cross-couplings, while exhibiting advanced capabilities for (semi-)autonomous experimental design and execution. Our findings demonstrate the versatility, efficacy and explainability of artificial intelligence systems like Coscientist in advancing research.

Tags: Computer science, Chemistry

121. Scaling deep learning for materials discovery, Nature (December 2023)

Authors: Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, Ekin Dogus Cubuk

Volume & Issue: Volume 624, Issue 7990

Pages: 80-85

Abstract: Novel functional materials enable fundamental breakthroughs across technological applications from clean energy to information processing1–11. From microchips to batteries and photovoltaics, discovery of inorganic crystals has been bottlenecked by expensive trial-and-error approaches. Concurrently, deep-learning models for language, vision and biology have showcased emergent predictive capabilities with increasing data and computation12–14. Here we show that graph networks trained at scale can reach unprecedented levels of generalization, improving the efficiency of materials discovery by an order of magnitude. Building on 48,000 stable crystals identified in continuing studies15–17, improved efficiency enables the discovery of 2.2 million structures below the current convex hull, many of which escaped previous human chemical intuition. Our work represents an order-of-magnitude expansion in stable materials known to humanity. Stable discoveries that are on the final convex hull will be made available to screen for technological applications, as we demonstrate for layered materials and solid-electrolyte candidates. Of the stable structures, 736 have already been independently experimentally realized. The scale and diversity of hundreds of millions of first-principles calculations also unlock modelling capabilities for downstream applications, leading in particular to highly accurate and robust learned interatomic potentials that can be used in condensed-phase molecular-dynamics simulations and high-fidelity zero-shot prediction of ionic conductivity.

Tags: Computer science, Scaling laws

120. An autonomous laboratory for the accelerated synthesis of novel materials, Nature (December 2023)

Authors: Nathan J. Szymanski, Bernardus Rendy, Yuxing Fei, Rishi E. Kumar, Tanjin He, David Milsted, Matthew J. McDermott, Max Gallant, Ekin Dogus Cubuk, Amil Merchant, Haegyeom Kim, Anubhav Jain, Christopher J. Bartel, Kristin Persson, Yan Zeng, Gerbrand Ceder

Volume & Issue: Volume 624, Issue 7990

Pages: 86-91

Abstract: To close the gap between the rates of computational screening and experimental realization of novel materials1,2, we introduce the A-Lab, an autonomous laboratory for the solid-state synthesis of inorganic powders. This platform uses computations, historical data from the literature, machine learning (ML) and active learning to plan and interpret the outcomes of experiments performed using robotics. Over 17 days of continuous operation, the A-Lab realized 41 novel compounds from a set of 58 targets including a variety of oxides and phosphates that were identified using large-scale ab initio phase-stability data from the Materials Project and Google DeepMind. Synthesis recipes were proposed by natural-language models trained on the literature and optimized using an active-learning approach grounded in thermodynamics. Analysis of the failed syntheses provides direct and actionable suggestions to improve current techniques for materials screening and synthesis design. The high success rate demonstrates the effectiveness of artificial-intelligence-driven platforms for autonomous materials discovery and motivates further integration of computations, historical knowledge and robotics.

Tags: Computational methods, Characterization and analytical techniques, Design, synthesis and processing

119. Uncovering developmental time and tempo using deep learning, Nature Methods (December 2023)

Authors: Nikan Toulany, Hernán Morales-Navarrete, Daniel Čapek, Jannis Grathwohl, Murat Ünalan, Patrick Müller

Volume & Issue: Volume 20, Issue 12

Pages: 2000-2010

Abstract: During animal development, embryos undergo complex morphological changes over time. Differences in developmental tempo between species are emerging as principal drivers of evolutionary novelty, but accurate description of these processes is very challenging. To address this challenge, we present here an automated and unbiased deep learning approach to analyze the similarity between embryos of different timepoints. Calculation of similarities across stages resulted in complex phenotypic fingerprints, which carry characteristic information about developmental time and tempo. Using this approach, we were able to accurately stage embryos, quantitatively determine temperature-dependent developmental tempo, detect naturally occurring and induced changes in the developmental progression of individual embryos, and derive staging atlases for several species de novo in an unsupervised manner. Our approach allows us to quantify developmental time and tempo objectively and provides a standardized way to analyze early embryogenesis.

Tags: Machine learning, Imaging, Embryogenesis, Embryology, Zebrafish

118. Vision-controlled jetting for composite systems and robots, Nature (November 2023)

Authors: Thomas J. K. Buchner, Simon Rogler, Stefan Weirich, Yannick Armati, Barnabas Gavin Cangan, Javier Ramos, Scott T. Twiddy, Davide M. Marini, Aaron Weber, Desai Chen, Greg Ellson, Joshua Jacob, Walter Zengerle, Dmitriy Katalichenko, Chetan Keny, Wojciech Matusik, Robert K. Katzschmann

Volume & Issue: Volume 623, Issue 7987

Pages: 522-530

Abstract: Recreating complex structures and functions of natural organisms in a synthetic form is a long-standing goal for humanity1. The aim is to create actuated systems with high spatial resolutions and complex material arrangements that range from elastic to rigid. Traditional manufacturing processes struggle to fabricate such complex systems2. It remains an open challenge to fabricate functional systems automatically and quickly with a wide range of elastic properties, resolutions, and integrated actuation and sensing channels2,3. We propose an inkjet deposition process called vision-controlled jetting that can create complex systems and robots. Hereby, a scanning system captures the three-dimensional print geometry and enables a digital feedback loop, which eliminates the need for mechanical planarizers. This contactless process allows us to use continuously curing chemistries and, therefore, print a broader range of material families and elastic moduli. The advances in material properties are characterized by standardized tests comparing our printed materials to the state-of-the-art. We directly fabricated a wide range of complex high-resolution composite systems and robots: tendon-driven hands, pneumatically actuated walking manipulators, pumps that mimic a heart and metamaterial structures. Our approach provides an automated, scalable, high-throughput process to manufacture high-resolution, functional multimaterial systems.

Tags: Polymers, Mechanical engineering, Electrical and electronic engineering, Composites

117. AI-driven robotic chemist for autonomous synthesis of organic molecules, Science Advances (November 2023)

Authors: Taesin Ha, Dongseon Lee, Youngchun Kwon, Min Sik Park, Sangyoon Lee, Jaejun Jang, Byungkwon Choi, Hyunjeong Jeon, Jeonghun Kim, Hyundo Choi, Hyung-Tae Seo, Wonje Choi, Wooram Hong, Young Jin Park, Junwon Jang, Joonkee Cho, Bosung Kim, Hyukju Kwon, Gahee Kim, Won Seok Oh, Jin Woo Kim, Joonhyuk Choi, Minsik Min, Aram Jeon, Yongsik Jung, Eunji Kim, Hyosug Lee, Youn-Suk Choi

Volume & Issue: Volume 9, Issue 44

Pages: eadj0461

Abstract: The automation of organic compound synthesis is pivotal for expediting the development of such compounds. In addition, enhancing development efficiency can be achieved by incorporating autonomous functions alongside automation. To achieve this, we developed an autonomous synthesis robot that harnesses the power of artificial intelligence (AI) and robotic technology to establish optimal synthetic recipes. Given a target molecule, our AI initially plans synthetic pathways and defines reaction conditions. It then iteratively refines these plans using feedback from the experimental robot, gradually optimizing the recipe. The system performance was validated by successfully determining synthetic recipes for three organic compounds, yielding that conversion rates that outperform existing references. Notably, this autonomous system is designed around batch reactors, making it accessible and valuable to chemists in standard laboratory settings, thereby streamlining research endeavors.

Tags: None

116. Machine learning-enabled constrained multi-objective design of architected materials, Nature Communications (October 19, 2023)

Authors: Bo Peng, Ye Wei, Yu Qin, Jiabao Dai, Yue Li, Aobo Liu, Yun Tian, Liuliu Han, Yufeng Zheng, Peng Wen

Volume & Issue: Volume 14, Issue 1

Pages: 6630

Abstract: Architected materials that consist of multiple subelements arranged in particular orders can demonstrate a much broader range of properties than their constituent materials. However, the rational design of these materials generally relies on experts’ prior knowledge and requires painstaking effort. Here, we present a data-efficient method for the high-dimensional multi-property optimization of 3D-printed architected materials utilizing a machine learning (ML) cycle consisting of the finite element method (FEM) and 3D neural networks. Specifically, we apply our method to orthopedic implant design. Compared to uniform designs, our experience-free method designs microscale heterogeneous architectures with a biocompatible elastic modulus and higher strength. Furthermore, inspired by the knowledge learned from the neural networks, we develop machine-human synergy, adapting the ML-designed architecture to fix a macroscale, irregularly shaped animal bone defect. Such adaptation exhibits 20% higher experimental load-bearing capacity than the uniform design. Thus, our method provides a data-efficient paradigm for the fast and intelligent design of architected materials with tailored mechanical, physical, and chemical properties.

Tags: Bioinspired materials, Biomedical engineering

115. Fine-Tuned Language Models Generate Stable Inorganic Materials as Text, International Conference on Learning Representations (October 13, 2023)

Authors: Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ward Ulissi

Abstract: We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90\% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49\% vs 28\%) of CDVAE, a competing diffusion model. Because of text prompting’s inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models’ ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.

Tags: None

114. Scalable Diffusion for Materials Generation, International Conference on Learning Representations (October 13, 2023)

Authors: Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

Abstract: Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in crystals), but generating structures can be difficult to scale to large and complex systems. Another challenge in generating materials is the mismatch between standard generative modeling metrics and downstream applications. For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering novel stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. Our empirical results suggest that despite the lack of explicit structure modeling, UniMat can generate high fidelity crystal structures from larger and more complex chemical systems, outperforming previous graph-based approaches under various generative modeling metrics. To better connect the generation quality of materials to downstream applications, such as discovering novel stable materials, we propose additional metrics for evaluating generative models of materials, including per-composition formation energy and stability with respect to convex hulls through decomposition energy from Density Function Theory (DFT). Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.

Tags: None

113. Universal machine learning for the response of atomistic systems to external fields, Nature Communications (October 12, 2023)

Authors: Yaolong Zhang, Bin Jiang

Volume & Issue: Volume 14, Issue 1

Pages: 6424

Abstract: Machine learned interatomic interaction potentials have enabled efficient and accurate molecular simulations of closed systems. However, external fields, which can greatly change the chemical structure and/or reactivity, have been seldom included in current machine learning models. This work proposes a universal field-induced recursively embedded atom neural network (FIREANN) model, which integrates a pseudo field vector-dependent feature into atomic descriptors to represent system-field interactions with rigorous rotational equivariance. This “all-in-one” approach correlates various response properties like dipole moment and polarizability with the field-dependent potential energy in a single model, very suitable for spectroscopic and dynamics simulations in molecular and periodic systems in the presence of electric fields. Especially for periodic systems, we find that FIREANN can overcome the intrinsic multiple-value issue of the polarization by training atomic forces only. These results validate the universality and capability of the FIREANN method for efficient first-principles modeling of complicated systems in strong external fields.

Tags: Chemical physics, Molecular dynamics, Reaction kinetics and dynamics

112. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science, Journal of the American Chemical Society (October 11, 2023)

Authors: Joshua Schrier, Alexander J. Norquist, Tonio Buonassisi, Jakoah Brgoch

Volume & Issue: Volume 145, Issue 40

Pages: 21699-21716

Abstract: Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.

Tags: None

111. Finite-difference time-domain methods, Nature Reviews Methods Primers (October 05, 2023)

Authors: F. L. Teixeira, C. Sarris, Y. Zhang, D.-Y. Na, J.-P. Berenger, Y. Su, M. Okoniewski, W. C. Chew, V. Backman, J. J. Simpson

Volume & Issue: Volume 3, Issue 1

Pages: 1-19

Abstract: Time-domain solutions to Maxwell’s equations can be computed using the finite-difference time-domain (FDTD) method. This Primer explores how FDTD can be used to study electromagnetic fields in complex media, including a summary of FDTD models, extensions, outputs and applications across the electromagnetic spectrum.

Tags: None

110. Inverse design of chiral functional films by a robotic AI-guided system, Nature Communications (October 04, 2023)

Authors: Yifan Xie, Shuo Feng, Linxiao Deng, Aoran Cai, Liyu Gan, Zifan Jiang, Peng Yang, Guilin Ye, Zaiqing Liu, Li Wen, Qing Zhu, Wanjun Zhang, Zhanpeng Zhang, Jiahe Li, Zeyu Feng, Chutian Zhang, Wenjie Du, Lixin Xu, Jun Jiang, Xin Chen, Gang Zou

Volume & Issue: Volume 14, Issue 1

Pages: 6177

Abstract: Artificial chiral materials and nanostructures with strong and tuneable chiroptical activities, including sign, magnitude, and wavelength distribution, are useful owing to their potential applications in chiral sensing, enantioselective catalysis, and chiroptical devices. Thus, the inverse design and customized manufacturing of these materials is highly desirable. Here, we use an artificial intelligence (AI) guided robotic chemist to accurately predict chiroptical activities from the experimental absorption spectra and structure/process parameters, and generate chiral films with targeted chiroptical activities across the full visible spectrum. The robotic AI-chemist carries out the entire process, including chiral film construction, characterization, and testing. A machine learned reverse design model using spectrum embedded descriptors is developed to predict optimal structure/process parameters for any targeted chiroptical property. A series of chiral films with a dissymmetry factor as high as 1.9 (gabs ~ 1.9) are identified out of more than 100 million possible structures, and their feasible application in circular polarization-selective color filters for multiplex laser display and switchable circularly polarized (CP) luminescence is demonstrated. Our findings not only provide chiral films with the highest reported chiroptical activity, but also have great fundamental value for the inverse design of chiroptical materials.

Tags: Techniques and instrumentation, Materials for optics

109. Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks, Nature Communications (October 03, 2023)

Authors: Yu Wang, Chao Pang, Yuzhe Wang, Junru Jin, Jingjie Zhang, Xiangxiang Zeng, Ran Su, Quan Zou, Leyi Wei

Volume & Issue: Volume 14, Issue 1

Pages: 6155

Abstract: Automating retrosynthesis with artificial intelligence expedites organic chemistry research in digital laboratories. However, most existing deep-learning approaches are hard to explain, like a “black box” with few insights. Here, we propose RetroExplainer, formulizing the retrosynthesis task into a molecular assembly process, containing several retrosynthetic actions guided by deep learning. To guarantee a robust performance of our model, we propose three units: a multi-sense and multi-scale Graph Transformer, structure-aware contrastive learning, and dynamic adaptive multi-task learning. The results on 12 large-scale benchmark datasets demonstrate the effectiveness of RetroExplainer, which outperforms the state-of-the-art single-step retrosynthesis approaches. In addition, the molecular assembly process renders our model with good interpretability, allowing for transparent decision-making and quantitative attribution. When extended to multi-step retrosynthesis planning, RetroExplainer has identified 101 pathways, in which 86.9% of the single reactions correspond to those already reported in the literature. As a result, RetroExplainer is expected to offer valuable insights for reliable, high-throughput, and high-quality organic synthesis in drug development.

Tags: Cheminformatics, Synthetic chemistry methodology, Drug discovery and development

108. Accelerating science with human-aware artificial intelligence, Nature Human Behaviour (October 2023)

Authors: Jamshid Sourati, James A. Evans

Volume & Issue: Volume 7, Issue 10

Pages: 1682-1696

Abstract: Can human-aware artificial intelligence help accelerate science? In this article, the authors incorporate the distribution of human expertise by training unsupervised models on simulated inferences cognitively accessible to experts and show that this substantially improves the models’ predictions of future discoveries, but also enables AI to generate high-value alternatives that complement human discoveries.

Tags: None

107. A foundation model for generalizable disease detection from retinal images, Nature (October 2023)

Authors: Yukun Zhou, Mark A. Chia, Siegfried K. Wagner, Murat S. Ayhan, Dominic J. Williamson, Robbert R. Struyven, Timing Liu, Moucheng Xu, Mateo G. Lozano, Peter Woodward-Court, Yuka Kihara, Andre Altmann, Aaron Y. Lee, Eric J. Topol, Alastair K. Denniston, Daniel C. Alexander, Pearse A. Keane

Volume & Issue: Volume 622, Issue 7981

Pages: 156-163

Abstract: Medical artificial intelligence (AI) offers great potential for recognizing signs of health conditions in retinal images and expediting the diagnosis of eye diseases and systemic disorders1. However, the development of AI models requires substantial annotation and models are usually task-specific with limited generalizability to different clinical applications2. Here, we present RETFound, a foundation model for retinal images that learns generalizable representations from unlabelled retinal images and provides a basis for label-efficient model adaptation in several applications. Specifically, RETFound is trained on 1.6 million unlabelled retinal images by means of self-supervised learning and then adapted to disease detection tasks with explicit labels. We show that adapted RETFound consistently outperforms several comparison models in the diagnosis and prognosis of sight-threatening eye diseases, as well as incident prediction of complex systemic disorders such as heart failure and myocardial infarction with fewer labelled data. RETFound provides a generalizable solution to improve model performance and alleviate the annotation workload of experts to enable broad clinical AI applications from retinal imaging.

Tags: Cardiovascular diseases, Medical imaging, Prognosis, Retinal diseases, Translational research

106. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach, Nature Communications (September 21, 2023)

Authors: Gang Wang, Shinya Mine, Duotian Chen, Yuan Jing, Kah Wei Ting, Taichi Yamaguchi, Motoshi Takao, Zen Maeno, Ichigaku Takigawa, Koichi Matsushita, Ken-ichi Shimizu, Takashi Toyao

Volume & Issue: Volume 14, Issue 1

Pages: 5861

Abstract: Designing novel catalysts is key to solving many energy and environmental challenges. Despite the promise that data science approaches, including machine learning (ML), can accelerate the development of catalysts, truly novel catalysts have rarely been discovered through ML approaches because of one of its most common limitations and criticisms—the assumed inability to extrapolate and identify extraordinary materials. Herein, we demonstrate an extrapolative ML approach to develop new multi-elemental reverse water-gas shift catalysts. Using 45 catalysts as the initial data points and performing 44 cycles of the closed loop discovery system (ML prediction + experiment), we experimentally tested a total of 300 catalysts and identified more than 100 catalysts with superior activity compared to those of the previously reported high-performance catalysts. The composition of the optimal catalyst discovered was Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO2. Notably, niobium (Nb) was not included in the original dataset, and the catalyst composition identified was not predictable even by human experts.

Tags: Computational methods, Heterogeneous catalysis, Materials for energy and catalysis

105. Accurate proteome-wide missense variant effect prediction with AlphaMissense, Science (September 19, 2023)

Authors: Jun Cheng, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zielinski, Tobias Sargeant, Rosalia G. Schneider, Andrew W. Senior, John Jumper, Demis Hassabis, Pushmeet Kohli, Žiga Avsec

Volume & Issue: Volume 381, Issue 6664

Pages: eadg7492

Abstract: The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.

Tags: None

104. Demonstration of an AI-driven workflow for autonomous high-resolution scanning microscopy, Nature Communications (September 07, 2023)

Authors: Saugat Kandel, Tao Zhou, Anakha V. Babu, Zichao Di, Xinxin Li, Xuedan Ma, Martin Holt, Antonino Miceli, Charudatta Phatak, Mathew J. Cherukara

Volume & Issue: Volume 14, Issue 1

Pages: 5501

Abstract: Modern scanning microscopes can image materials with up to sub-atomic spatial and sub-picosecond time resolutions, but these capabilities come with large volumes of data, which can be difficult to store and analyze. We report the Fast Autonomous Scanning Toolkit (FAST) that addresses this challenge by combining a neural network, route optimization, and efficient hardware controls to enable a self-driving experiment that actively identifies and measures a sparse but representative data subset in lieu of the full dataset. FAST requires no prior information about the sample, is computationally efficient, and uses generic hardware controls with minimal experiment-specific wrapping. We test FAST in simulations and a dark-field X-ray microscopy experiment of a WSe2 film. Our studies show that a FAST scan of <25% is sufficient to accurately image and analyze the sample. FAST is easy to adapt for any scanning microscope; its broad adoption will empower general multi-level studies of materials evolution with respect to time, temperature, or other parameters.

Tags: Image processing, Microscopy, Computational science, Techniques and instrumentation

103. A principal odor map unifies diverse tasks in olfactory perception, Science (September 2023)

Authors: Brian K. Lee, Emily J. Mayhew, Benjamin Sanchez-Lengeling, Jennifer N. Wei, Wesley W. Qian, Kelsie A. Little, Matthew Andres, Britney B. Nguyen, Theresa Moloy, Jacob Yasonik, Jane K. Parker, Richard C. Gerkin, Joel D. Mainland, Alexander B. Wiltschko

Volume & Issue: Volume 381, Issue 6661

Pages: 999-1006

Abstract: Mapping molecular structure to odor perception is a key challenge in olfaction. We used graph neural networks to generate a principal odor map (POM) that preserves perceptual relationships and enables odor quality prediction for previously uncharacterized odorants. The model was as reliable as a human in describing odor quality: On a prospective validation set of 400 out-of-sample odorants, the model-generated odor profile more closely matched the trained panel mean than did the median panelist. By applying simple, interpretable, theoretically rooted transformations, the POM outperformed chemoinformatic models on several other odor prediction tasks, indicating that the POM successfully encoded a generalized map of structure-odor relationships. This approach broadly enables odor prediction and paves the way toward digitizing odors.

Tags: None

102. Learning heterogeneous reaction kinetics from X-ray videos pixel by pixel, Nature (September 2023)

Authors: Hongbo Zhao, Haitao Dean Deng, Alexander E. Cohen, Jongwoo Lim, Yiyang Li, Dimitrios Fraggedakis, Benben Jiang, Brian D. Storey, William C. Chueh, Richard D. Braatz, Martin Z. Bazant

Volume & Issue: Volume 621, Issue 7978

Pages: 289-294

Abstract: Reaction rates at spatially heterogeneous, unstable interfaces are notoriously difficult to quantify, yet are essential in engineering many chemical systems, such as batteries1 and electrocatalysts2. Experimental characterizations of such materials by operando microscopy produce rich image datasets3–6, but data-driven methods to learn physics from these images are still lacking because of the complex coupling of reaction kinetics, surface chemistry and phase separation7. Here we show that heterogeneous reaction kinetics can be learned from in situ scanning transmission X-ray microscopy (STXM) images of carbon-coated lithium iron phosphate (LFP) nanoparticles. Combining a large dataset of STXM images with a thermodynamically consistent electrochemical phase-field model, partial differential equation (PDE)-constrained optimization and uncertainty quantification, we extract the free-energy landscape and reaction kinetics and verify their consistency with theoretical models. We also simultaneously learn the spatial heterogeneity of the reaction rate, which closely matches the carbon-coating thickness profiles obtained through Auger electron microscopy (AEM). Across 180,000 image pixels, the mean discrepancy with the learned model is remarkably small (<7%) and comparable with experimental noise. Our results open the possibility of learning nonequilibrium material properties beyond the reach of traditional experimental methods and offer a new non-destructive technique for characterizing and optimizing heterogeneous reactive surfaces.

Tags: Theory and computation, Applied mathematics, Energy modelling

101. Material symmetry recognition and property prediction accomplished by crystal capsule representation, Nature Communications (August 25, 2023)

Authors: Chao Liang, Yilimiranmu Rouzhahong, Caiyuan Ye, Chong Li, Biao Wang, Huashan Li

Volume & Issue: Volume 14, Issue 1

Pages: 5198

Abstract: Learning the global crystal symmetry and interpreting the equivariant information is crucial for accurately predicting material properties, yet remains to be fully accomplished by existing algorithms based on convolution networks. To overcome this challenge, here we develop a machine learning (ML) model, named symmetry-enhanced equivariance network (SEN), to build material representation with joint structure-chemical patterns, to encode important clusters embedded in the crystal structure, and to learn pattern equivariance in different scales via capsule transformers. Quantitative analyses of the intermediate matrices demonstrate that the intrinsic crystal symmetries and interactions between clusters have been exactly perceived by the SEN model and critically affect the prediction performances by reducing effective feature space. The mean absolute errors (MAEs) of 0.181 eV and 0.0161 eV/atom are obtained for predicting bandgap and formation energy in the MatBench dataset. The general and interpretable SEN model reveals the potential to design ML models by implicitly encoding feature relationship based on physical mechanisms.

Tags: Atomistic models, Electronic properties and materials

100. ChatGPT Chemistry Assistant for Text Mining and the Prediction of MOF Synthesis, Journal of the American Chemical Society (August 16, 2023)

Authors: Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi

Volume & Issue: Volume 145, Issue 32

Pages: 18048-18062

Abstract: We use prompt engineering to guide ChatGPT in the automation of text mining of metal–organic framework (MOF) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT’s tendency to hallucinate information, an issue that previously made the use of large language models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different trade-offs among labor, speed, and accuracy. We deploy this system to extract 26 257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90–99%. Furthermore, with the data set built by text mining, we constructed a machine-learning model with over 87% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions about chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry subdisciplines.

Tags: None

99. Enhancing corrosion-resistant alloy design through natural language processing and deep learning, Science Advances (August 11, 2023)

Authors: Kasturi Narasimha Sasidhar, Nima Hamidi Siboni, Jaber Rezaei Mianroodi, Michael Rohwerder, Jörg Neugebauer, Dierk Raabe

Volume & Issue: Volume 9, Issue 32

Pages: eadg7992

Abstract: We propose strategies that couple natural language processing with deep learning to enhance machine capability for corrosion-resistant alloy design. First, accuracy of machine learning models for materials datasets is often limited by their inability to incorporate textual data. Manual extraction of numerical parameters from descriptions of alloy processing or experimental methodology inevitably leads to a reduction in information density. To overcome this, we have developed a fully automated natural language processing approach to transform textual data into a form compatible for feeding into a deep neural network. This approach has resulted in a pitting potential prediction accuracy substantially beyond state of the art. Second, we have implemented a deep learning model with a transformed-input feature space, consisting of a set of elemental physical/chemical property–based numerical descriptors of alloys replacing alloy compositions. This helped identification of those descriptors that are most critical toward enhancing their pitting potential. In particular, configurational entropy, atomic packing efficiency, local electronegativity differences, and atomic radii differences proved to be the most critical.

Tags: None

98. Applied machine learning as a driver for polymeric biomaterials design, Nature Communications (August 10, 2023)

Authors: Samantha M. McDonald, Emily K. Augustine, Quinn Lanners, Cynthia Rudin, L. Catherine Brinson, Matthew L. Becker

Volume & Issue: Volume 14, Issue 1

Pages: 4838

Abstract: Polymers are ubiquitous to almost every aspect of modern society and their use in medical products is similarly pervasive. Despite this, the diversity in commercial polymers used in medicine is stunningly low. Considerable time and resources have been extended over the years towards the development of new polymeric biomaterials which address unmet needs left by the current generation of medical-grade polymers. Machine learning (ML) presents an unprecedented opportunity in this field to bypass the need for trial-and-error synthesis, thus reducing the time and resources invested into new discoveries critical for advancing medical treatments. Current efforts pioneering applied ML in polymer design have employed combinatorial and high throughput experimental design to address data availability concerns. However, the lack of available and standardized characterization of parameters relevant to medicine, including degradation time and biocompatibility, represents a nearly insurmountable obstacle to ML-aided design of biomaterials. Herein, we identify a gap at the intersection of applied ML and biomedical polymer design, highlight current works at this junction more broadly and provide an outlook on challenges and future directions.

Tags: Computer science, Biomedical engineering, Biomaterials, Biomedical materials, Polymer synthesis

97. Machine Learning Descriptors for Data-Driven Catalysis Study, Advanced Science (August 04, 2023)

Authors: Li-Hui Mou, TianTian Han, Pieter E. S. Smith, Edward Sharman, Jun Jiang

Volume & Issue: Volume 10, Issue 22

Pages: 2301020

Abstract: Traditional trial-and-error experiments and theoretical simulations have difficulty optimizing catalytic processes and developing new, better-performing catalysts. Machine learning (ML) provides a promising approach for accelerating catalysis research due to its powerful learning and predictive abilities. The selection of appropriate input features (descriptors) plays a decisive role in improving the predictive accuracy of ML models and uncovering the key factors that influence catalytic activity and selectivity. This review introduces tactics for the utilization and extraction of catalytic descriptors in ML-assisted experimental and theoretical research. In addition to the effectiveness and advantages of various descriptors, their limitations are also discussed. Highlighted are both 1) newly developed spectral descriptors for catalytic performance prediction and 2) a novel research paradigm combining computational and experimental ML models through suitable intermediate descriptors. Current challenges and future perspectives on the application of descriptors and ML techniques to catalysis are also presented.

Tags: machine learning, high-throughput experiments, catalytic descriptors, heterogeneous catalysis, theoretical simulations

96. Machine-learning-assisted material discovery of oxygen-rich highly porous carbon active materials for aqueous supercapacitors, Nature Communications (August 01, 2023)

Authors: Tao Wang, Runtong Pan, Murillo L. Martins, Jinlei Cui, Zhennan Huang, Bishnu P. Thapaliya, Chi-Linh Do-Thanh, Musen Zhou, Juntian Fan, Zhenzhen Yang, Miaofang Chi, Takeshi Kobayashi, Jianzhong Wu, Eugene Mamontov, Sheng Dai

Volume & Issue: Volume 14, Issue 1

Pages: 4607

Abstract: Porous carbons are the active materials of choice for supercapacitor applications because of their power capability, long-term cycle stability, and wide operating temperatures. However, the development of carbon active materials with improved physicochemical and electrochemical properties is generally carried out via time-consuming and cost-ineffective experimental processes. In this regard, machine-learning technology provides a data-driven approach to examine previously reported research works to find the critical features for developing ideal carbon materials for supercapacitors. Here, we report the design of a machine-learning-derived activation strategy that uses sodium amide and cross-linked polymer precursors to synthesize highly porous carbons (i.e., with specific surface areas > 4000 m2/g). Tuning the pore size and oxygen content of the carbonaceous materials, we report a highly porous carbon-base electrode with 0.7 mg/cm2 of electrode mass loading that exhibits a high specific capacitance of 610 F/g in 1 M H2SO4. This result approaches the specific capacitance of a porous carbon electrode predicted by the machine learning approach. We also investigate the charge storage mechanism and electrolyte transport properties via step potential electrochemical spectroscopy and quasielastic neutron scattering measurements.

Tags: Porous materials, Synthetic chemistry methodology

95. Scientific discovery in the age of artificial intelligence, Nature (August 2023)

Authors: Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, Anima Anandkumar, Karianne Bergen, Carla P. Gomes, Shirley Ho, Pushmeet Kohli, Joan Lasenby, Jure Leskovec, Tie-Yan Liu, Arjun Manrai, Debora Marks, Bharath Ramsundar, Le Song, Jimeng Sun, Jian Tang, Petar Veličković, Max Welling, Linfeng Zhang, Connor W. Coley, Yoshua Bengio, Marinka Zitnik

Volume & Issue: Volume 620, Issue 7972

Pages: 47-60

Abstract: The advances in artificial intelligence over the past decade are examined, with a discussion on how artificial intelligence systems can aid the scientific process and the central issues that remain despite advances.

Tags: None

94. Fatigue database of complex metallic alloys, Scientific Data (July 12, 2023)

Authors: Zian Zhang, Haoxuan Tang, Zhiping Xu

Volume & Issue: Volume 10, Issue 1

Pages: 447

Abstract: The past few decades have witnessed rapid progresses in the research and development of complex metallic alloys such as metallic glasses and multi-principal element alloys, which offer new solutions to tackle engineering problems of materials such as the strength-toughness conflict and deployment in harsh environments and/or for long-term service. A fatigue database (FatigueData-CMA2022) is compiled from the literature by the end of 2022. Data for both metallic glasses and multi-principal element alloys are included and analyzed for their statistics and patterns. Automatic extraction and manual examination are combined in the workflow to improve the efficiency of processing, the quality of published data, and the reusability. The database contains 272 fatigue datasets of S-N (the stress-life relation), ε-N (the strain-life relation), and da/dN-ΔK (the relation between the fatigue crack growth rate and the stress intensity factor range) data, together with the information of materials, processing and testing conditions, and mechanical properties. The database and scripts are released in open repositories, which are designed in formats that can be continuously expanded and updated.

Tags: Metals and alloys, Industry

93. Encoding physics to learn reaction–diffusion processes, Nature Machine Intelligence (July 2023)

Authors: Chengping Rao, Pu Ren, Qi Wang, Oral Buyukozturk, Hao Sun, Yang Liu

Volume & Issue: Volume 5, Issue 7

Pages: 765-779

Abstract: Reaction–diffusion processes, which can be found in many fundamental spatiotemporal dynamical phenomena in chemistry, biology, geology, physics and ecology, can be modelled by partial differential equations (PDEs). Physics-informed deep learning approaches can accelerate the discovery of PDEs and Rao et al. improve interpretability and generalizability by strong encoding of the underlying physics structure in the neural network.

Tags: None

92. Skilful nowcasting of extreme precipitation with NowcastNet, Nature (July 2023)

Authors: Yuchen Zhang, Mingsheng Long, Kaiyuan Chen, Lanxiang Xing, Ronghua Jin, Michael I. Jordan, Jianmin Wang

Volume & Issue: Volume 619, Issue 7970

Pages: 526-532

Abstract: Extreme precipitation is a considerable contributor to meteorological disasters and there is a great need to mitigate its socioeconomic effects through skilful nowcasting that has high resolution, long lead times and local details1–3. Current methods are subject to blur, dissipation, intensity or location errors, with physics-based numerical methods struggling to capture pivotal chaotic dynamics such as convective initiation4 and data-driven learning methods failing to obey intrinsic physical laws such as advective conservation5. We present NowcastNet, a nonlinear nowcasting model for extreme precipitation that unifies physical-evolution schemes and conditional-learning methods into a neural-network framework with end-to-end forecast error optimization. On the basis of radar observations from the USA and China, our model produces physically plausible precipitation nowcasts with sharp multiscale patterns over regions of 2,048 km × 2,048 km and with lead times of up to 3 h. In a systematic evaluation by 62 professional meteorologists from across China, our model ranks first in 71% of cases against the leading methods. NowcastNet provides skilful forecasts at light-to-heavy rain rates, particularly for extreme-precipitation events accompanied by advective or convective processes that were previously considered intractable.

Tags: Computer science, Atmospheric science, Computational science

91. Using a physics-informed neural network and fault zone acoustic monitoring to predict lab earthquakes, Nature Communications (June 21, 2023)

Authors: Prabhav Borate, Jacques Rivière, Chris Marone, Ankur Mali, Daniel Kifer, Parisa Shokouhi

Volume & Issue: Volume 14, Issue 1

Pages: 3693

Abstract: Predicting failure in solids has broad applications including earthquake prediction which remains an unattainable goal. However, recent machine learning work shows that laboratory earthquakes can be predicted using micro-failure events and temporal evolution of fault zone elastic properties. Remarkably, these results come from purely data-driven models trained with large datasets. Such data are equivalent to centuries of fault motion rendering application to tectonic faulting unclear. In addition, the underlying physics of such predictions is poorly understood. Here, we address scalability using a novel Physics-Informed Neural Network (PINN). Our model encodes fault physics in the deep learning loss function using time-lapse ultrasonic data. PINN models outperform data-driven models and significantly improve transfer learning for small training datasets and conditions outside those used in training. Our work suggests that PINN offers a promising path for machine learning-based failure prediction and, ultimately for improving our understanding of earthquake physics and prediction.

Tags: Geophysics, Physics

90. Discovery of senolytics using machine learning, Nature Communications (June 10, 2023)

Authors: Vanessa Smer-Barreto, Andrea Quintanilla, Richard J. R. Elliott, John C. Dawson, Jiugeng Sun, Víctor M. Campa, Álvaro Lorente-Macías, Asier Unciti-Broceta, Neil O. Carragher, Juan Carlos Acosta, Diego A. Oyarzún

Volume & Issue: Volume 14, Issue 1

Pages: 3445

Abstract: Cellular senescence is a stress response involved in ageing and diverse disease processes including cancer, type-2 diabetes, osteoarthritis and viral infection. Despite growing interest in targeted elimination of senescent cells, only few senolytics are known due to the lack of well-characterised molecular targets. Here, we report the discovery of three senolytics using cost-effective machine learning algorithms trained solely on published data. We computationally screened various chemical libraries and validated the senolytic action of ginkgetin, periplocin and oleandrin in human cell lines under various modalities of senescence. The compounds have potency comparable to known senolytics, and we show that oleandrin has improved potency over its target as compared to best-in-class alternatives. Our approach led to several hundred-fold reduction in drug screening costs and demonstrates that artificial intelligence can take maximum advantage of small and heterogeneous drug screening data, paving the way for new open science approaches to early-stage drug discovery.

Tags: Machine learning, Phenotypic screening, Virtual screening

89. Precursor recommendation for inorganic synthesis by machine learning materials similarity from scientific literature, Science Advances (June 09, 2023)

Authors: Tanjin He, Haoyan Huo, Christopher J. Bartel, Zheren Wang, Kevin Cruse, Gerbrand Ceder

Volume & Issue: Volume 9, Issue 23

Pages: eadg8180

Abstract: Synthesis prediction is a key accelerator for the rapid design of advanced materials. However, determining synthesis variables such as the choice of precursor materials is challenging for inorganic materials because the sequence of reactions during heating is not well understood. In this work, we use a knowledge base of 29,900 solid-state synthesis recipes, text-mined from the scientific literature, to automatically learn which precursors to recommend for the synthesis of a novel target material. The data-driven approach learns chemical similarity of materials and refers the synthesis of a new target to precedent synthesis procedures of similar materials, mimicking human synthesis design. When proposing five precursor sets for each of 2654 unseen test target materials, the recommendation strategy achieves a success rate of at least 82%. Our approach captures decades of heuristic synthesis data in a mathematical form, making it accessible for use in recommendation engines and autonomous laboratories.

Tags: None

88. The rise of self-driving labs in chemical and materials sciences, Nature Synthesis (June 2023)

Authors: Milad Abolhasani, Eugenia Kumacheva

Volume & Issue: Volume 2, Issue 6

Pages: 483-492

Abstract: Accelerating the discovery of new molecules and materials, as well as developing green and sustainable ways to synthesize them, will help to address global challenges in energy, sustainability and healthcare. The recent growth of data science and automated experimentation techniques has resulted in the advent of self-driving labs (SDLs) via the integration of machine learning, lab automation and robotics. An SDL is a machine-learning-assisted modular experimental platform that iteratively operates a series of experiments selected by the machine learning algorithm to achieve a user-defined objective. These intelligent robotic assistants help researchers to accelerate the pace of fundamental and applied research through rapid exploration of the chemical space. In this Review, we introduce SDLs and provide a roadmap for their implementation by non-expert scientists. We present the status quo of successful SDL implementations in the field and discuss their current limitations and future opportunities to accelerate finding solutions for societal needs.

Tags: Design, synthesis and processing, Synthetic chemistry methodology, Techniques and instrumentation

87. Combinatorial synthesis for AI-driven materials discovery, Nature Synthesis (June 2023)

Authors: John M. Gregoire, Lan Zhou, Joel A. Haber

Volume & Issue: Volume 2, Issue 6

Pages: 493-504

Abstract: Combinatorial synthesis has historically been the cornerstone of high-throughput experimentation. In this Review, we discuss the evolution of combinatorial synthesis and envision a future for accelerated materials science through its integration with artificial intelligence. We also evaluate the key aspects of combinatorial synthesis with respect to workflow design.

Tags: None

86. Discovering small-molecule senolytics with deep neural networks, Nature Aging (June 2023)

Authors: Felix Wong, Satotaka Omori, Nina M. Donghia, Erica J. Zheng, James J. Collins

Volume & Issue: Volume 3, Issue 6

Pages: 734-750

Abstract: Senolytic compounds have shown promise for the treatment of aging-related diseases in animal models. Here, to discover new small molecule senolytics, Wong, Omori and colleagues introduce a graph neural network platform, identify structurally diverse compounds with favorable drug-like properties and confirm one compound’s in vivo activity in aged mice.

Tags: None

85. A robotic platform for the synthesis of colloidal nanocrystals, Nature Synthesis (June 2023)

Authors: Haitao Zhao, Wei Chen, Hao Huang, Zhehao Sun, Zijian Chen, Lingjun Wu, Baicheng Zhang, Fuming Lai, Zhuo Wang, Mukhtar Lawan Adam, Cheng Heng Pang, Paul K. Chu, Yang Lu, Tao Wu, Jun Jiang, Zongyou Yin, Xue-Feng Yu

Volume & Issue: Volume 2, Issue 6

Pages: 505-514

Abstract: Morphological control with broad tunability is a primary goal for the synthesis of colloidal nanocrystals with unique physicochemical properties. Here we develop a robotic platform as a substitute for trial-and-error synthesis and labour-intensive characterization to achieve this goal. Gold nanocrystals (with strong visible-light absorption) and double-perovskite nanocrystals (with photoluminescence) are selected as typical proof-of-concept nanocrystals for this platform. An initial choice of key synthesis parameters was acquired through data mining of the literature. Automated synthesis and in situ characterization with further ex situ validation was then carried out and controllable synthesis of nanocrystals with the desired morphology was accomplished. To achieve morphology-oriented inverse design, correlations between the morphologies and structure-directing agents are identified by machine-learning models trained on a continuously expanded experimental database. Thus, the developed robotic platform with a data mining–synthesis–inverse design framework is promising in data-driven robotic synthesis of nanocrystals and beyond.

Tags: Materials chemistry, Synthesis and processing

84. Data-driven design of new chiral carboxylic acid for construction of indoles with C-central and C–N axial chirality via cobalt catalysis, Nature Communications (May 31, 2023)

Authors: Zi-Jing Zhang, Shu-Wen Li, João C. A. Oliveira, Yanjun Li, Xinran Chen, Shuo-Qing Zhang, Li-Cheng Xu, Torben Rogge, Xin Hong, Lutz Ackermann

Volume & Issue: Volume 14, Issue 1

Pages: 3149

Abstract: Challenging enantio- and diastereoselective cobalt-catalyzed C–H alkylation has been realized by an innovative data-driven knowledge transfer strategy. Harnessing the statistics of a related transformation as the knowledge source, the designed machine learning (ML) model took advantage of delta learning and enabled accurate and extrapolative enantioselectivity predictions. Powered by the knowledge transfer model, the virtual screening of a broad scope of 360 chiral carboxylic acids led to the discovery of a new catalyst featuring an intriguing furyl moiety. Further experiments verified that the predicted chiral carboxylic acid can achieve excellent stereochemical control for the target C–H alkylation, which supported the expedient synthesis for a large library of substituted indoles with C-central and C–N axial chirality. The reported machine learning approach provides a powerful data engine to accelerate the discovery of molecular catalysis by harnessing the hidden value of the available structure-performance statistics.

Tags: Synthetic chemistry methodology, Asymmetric catalysis

83. A database of ultrastable MOFs reassembled from stable fragments with machine learning models, Matter (May 03, 2023)

Authors: Aditya Nandy, Shuwen Yue, Changhwan Oh, Chenru Duan, Gianmarco G. Terrones, Yongchul G. Chung, Heather J. Kulik

Volume & Issue: Volume 6, Issue 5

Pages: 1585-1603

Abstract: High-throughput screening of hypothetical metal-organic framework (MOF) databases can uncover new materials, but their stability in real-world applications is often unknown. We leverage community knowledge and machine learning (ML) models to identify MOFs that are thermally stable and stable upon activation. We separate these MOFs into their building blocks and recombine them to make a new hypothetical MOF database of over 50,000 structures with orders of magnitude more (1) connectivity nets and (2) inorganic building blocks than were present in prior databases. This database shows a 10-fold enrichment of ultrastable MOF structures that are stable upon activation and more than 1 standard deviation more thermally stable than the average experimentally characterized MOF. For nearly 10,000 ultrastable MOFs, we compute elastic moduli to confirm that these materials have good mechanical stability, and we report methane deliverable capacities. We identify privileged metal nodes in ultrastable MOFs that optimize gas storage and mechanical stability simultaneously.

Tags: machine learning, materials discovery, stability, computational materials design, elastic modulus, metal-organic framework, methane deliverable, MOF, shear modulus

82. High-throughput printing of combinatorial materials from aerosols, Nature (May 2023)

Authors: Minxiang Zeng, Yipu Du, Qiang Jiang, Nicholas Kempf, Chen Wei, Miles V. Bimrose, A. N. M. Tanvir, Hengrui Xu, Jiahao Chen, Dylan J. Kirsch, Joshua Martin, Brian C. Wyatt, Tatsunori Hayashi, Mortaza Saeidi-Javash, Hirotaka Sakaue, Babak Anasori, Lihua Jin, Michael D. McMurtrey, Yanliang Zhang

Volume & Issue: Volume 617, Issue 7960

Pages: 292-298

Abstract: The development of new materials and their compositional and microstructural optimization are essential in regard to next-generation technologies such as clean energy and environmental sustainability. However, materials discovery and optimization have been a frustratingly slow process. The Edisonian trial-and-error process is time consuming and resource inefficient, particularly when contrasted with vast materials design spaces1. Whereas traditional combinatorial deposition methods can generate material libraries2,3, these suffer from limited material options and inability to leverage major breakthroughs in nanomaterial synthesis. Here we report a high-throughput combinatorial printing method capable of fabricating materials with compositional gradients at microscale spatial resolution. In situ mixing and printing in the aerosol phase allows instantaneous tuning of the mixing ratio of a broad range of materials on the fly, which is an important feature unobtainable in conventional multimaterials printing using feedstocks in liquid–liquid or solid–solid phases4–6. We demonstrate a variety of high-throughput printing strategies and applications in combinatorial doping, functional grading and chemical reaction, enabling materials exploration of doped chalcogenides and compositionally graded materials with gradient properties. The ability to combine the top-down design freedom of additive manufacturing with bottom-up control over local material compositions promises the development of compositionally complex materials inaccessible via conventional manufacturing approaches.

Tags: Materials science, Mechanical engineering

81. Generative Models as an Emerging Paradigm in the Chemical Sciences, Journal of the American Chemical Society (April 26, 2023)

Authors: Dylan M. Anstine, Olexandr Isayev

Volume & Issue: Volume 145, Issue 16

Pages: 8736-8750

Abstract: Traditional computational approaches to design chemical species are limited by the need to compute properties for a vast number of candidates, e.g., by discriminative modeling. Therefore, inverse design methods aim to start from the desired property and optimize a corresponding chemical structure. From a machine learning viewpoint, the inverse design problem can be addressed through so-called generative modeling. Mathematically, discriminative models are defined by learning the probability distribution function of properties given the molecular or material structure. In contrast, a generative model seeks to exploit the joint probability of a chemical species with target characteristics. The overarching idea of generative modeling is to implement a system that produces novel compounds that are expected to have a desired set of chemical features, effectively sidestepping issues found in the forward design process. In this contribution, we overview and critically analyze popular generative algorithms like generative adversarial networks, variational autoencoders, flow, and diffusion models. We highlight key differences between each of the models, provide insights into recent success stories, and discuss outstanding challenges for realizing generative modeling discovered solutions in chemical applications.

Tags: None

80. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing, npj Computational Materials (April 05, 2023)

Authors: Pranav Shetty, Arunkumar Chitteth Rajan, Chris Kuenneth, Sonakshi Gupta, Lakshmi Prerana Panchumarti, Lauren Holm, Chao Zhang, Rampi Ramprasad

Volume & Issue: Volume 9, Issue 1

Pages: 52

Abstract: The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from literature. We used natural language processing methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets. Using this pipeline, we obtained ~300,000 material property records from ~130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available at polymerscholar.orgwhich can be used to locate material property data recorded in abstracts. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with extracted material property information.

Tags: Computational methods, Polymers

79. Evolutionary-scale prediction of atomic-level protein structure with a language model, Science (March 17, 2023)

Authors: Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives

Volume & Issue: Volume 379, Issue 6637

Pages: 1123-1130

Abstract: Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.

Tags: None

78. AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning, Nature Communications (March 14, 2023)

Authors: Amanda A. Volk, Robert W. Epps, Daniel T. Yonemoto, Benjamin S. Masters, Felix N. Castellano, Kristofer G. Reyes, Milad Abolhasani

Volume & Issue: Volume 14, Issue 1

Pages: 1403

Abstract: Closed-loop, autonomous experimentation enables accelerated and material-efficient exploration of large reaction spaces without the need for user intervention. However, autonomous exploration of advanced materials with complex, multi-step processes and data sparse environments remains a challenge. In this work, we present AlphaFlow, a self-driven fluidic lab capable of autonomous discovery of complex multi-step chemistries. AlphaFlow uses reinforcement learning integrated with a modular microdroplet reactor capable of performing reaction steps with variable sequence, phase separation, washing, and continuous in-situ spectral monitoring. To demonstrate the power of reinforcement learning toward high dimensionality multi-step chemistries, we use AlphaFlow to discover and optimize synthetic routes for shell-growth of core-shell semiconductor nanoparticles, inspired by colloidal atomic layer deposition (cALD). Without prior knowledge of conventional cALD parameters, AlphaFlow successfully identified and optimized a novel multi-step reaction route, with up to 40 parameters, that outperformed conventional sequences. Through this work, we demonstrate the capabilities of closed-loop, reinforcement learning-guided systems in exploring and solving challenges in multi-step nanoparticle syntheses, while relying solely on in-house generated data from a miniaturized microfluidic platform. Further application of AlphaFlow in multi-step chemistries beyond cALD can lead to accelerated fundamental knowledge generation as well as synthetic route discoveries and optimization.

Tags: Design, synthesis and processing, Chemical engineering, Nanoparticles, Optical materials

77. Adaptively driven X-ray diffraction guided by machine learning for autonomous phase identification, npj Computational Materials (March 02, 2023)

Authors: Nathan J. Szymanski, Christopher J. Bartel, Yan Zeng, Mouhamad Diallo, Haegyeom Kim, Gerbrand Ceder

Volume & Issue: Volume 9, Issue 1

Pages: 1-8

Abstract: Machine learning (ML) has become a valuable tool to assist and improve materials characterization, enabling automated interpretation of experimental results with techniques such as X-ray diffraction (XRD) and electron microscopy. Because ML models are fast once trained, there is a key opportunity to bring interpretation in-line with experiments and make on-the-fly decisions to achieve optimal measurement effectiveness, which creates broad opportunities for rapid learning and information extraction from experiments. Here, we demonstrate such a capability with the development of autonomous and adaptive XRD. By coupling an ML algorithm with a physical diffractometer, this method integrates diffraction and analysis such that early experimental information is leveraged to steer measurements toward features that improve the confidence of a model trained to identify crystalline phases. We validate the effectiveness of an adaptive approach by showing that ML-driven XRD can accurately detect trace amounts of materials in multi-phase mixtures with short measurement times. The improved speed of phase detection also enables in situ identification of short-lived intermediate phases formed during solid-state reactions using a standard in-house diffractometer. Our findings showcase the advantages of in-line ML for materials characterization and point to the possibility of more general approaches for adaptive experimentation.

Tags: None

Authors: Yeonghun Kang, Hyunsoo Park, Berend Smit, Jihan Kim

Volume & Issue: Volume 5, Issue 3

Pages: 309-318

Abstract: Metal–organic frameworks are of high interest for a range of energy and environmental applications due to their stable gas storage properties. A new machine learning approach based on a pre-trained multi-modal transformer can be fine-tuned with small datasets to predict structure-property relationships and design new metal-organic frameworks for a range of specific tasks.

Tags: None

75. Accelerating the design of compositionally complex materials via physics-informed artificial intelligence, Nature Computational Science (March 2023)

Authors: Dierk Raabe, Jaber Rezaei Mianroodi, Jörg Neugebauer

Volume & Issue: Volume 3, Issue 3

Pages: 198-209

Abstract: Machine learning models have been widely applied to boost the computational efficiency of searching vast chemical space of compositionally complex materials. This Perspective summarizes the recent developments and proposes future opportunities, such as the physics-informed machine learning models.

Tags: None

74. Biological research and self-driving labs in deep space supported by artificial intelligence, Nature Machine Intelligence (March 2023)

Authors: Lauren M. Sanders, Ryan T. Scott, Jason H. Yang, Amina Ann Qutub, Hector Garcia Martin, Daniel C. Berrios, Jaden J. A. Hastings, Jon Rask, Graham Mackintosh, Adrienne L. Hoarfrost, Stuart Chalk, John Kalantari, Kia Khezeli, Erik L. Antonsen, Joel Babdor, Richard Barker, Sergio E. Baranzini, Afshin Beheshti, Guillermo M. Delgado-Aparicio, Benjamin S. Glicksberg, Casey S. Greene, Melissa Haendel, Arif A. Hamid, Philip Heller, Daniel Jamieson, Katelyn J. Jarvis, Svetlana V. Komarova, Matthieu Komorowski, Prachi Kothiyal, Ashish Mahabal, Uri Manor, Christopher E. Mason, Mona Matar, George I. Mias, Jack Miller, Jerry G. Myers, Charlotte Nelson, Jonathan Oribello, Seung-min Park, Patricia Parsons-Wingerter, R. K. Prabhu, Robert J. Reynolds, Amanda Saravia-Butler, Suchi Saria, Aenor Sawyer, Nitin Kumar Singh, Michael Snyder, Frank Soboczenski, Karthik Soman, Corey A. Theriot, David Van Valen, Kasthuri Venkateswaran, Liz Warren, Liz Worthey, Marinka Zitnik, Sylvain V. Costes

Volume & Issue: Volume 5, Issue 3

Pages: 208-219

Abstract: Deep space exploration missions will require new technologies that can support astronaut health systems, as well as biological monitoring and research systems that can function independently from Earth-based mission control centres. A NASA workshop explored how artificial intelligence advances could help address these challenges and, in this second of two Review articles based on the findings from the workshop, the intersection between artificial intelligence and space biology is discussed.

Tags: None

73. A Materials Acceleration Platform for Organic Laser Discovery, Advanced Materials (February 09, 2023)

Authors: Tony C Wu, Andrés Aguilar-Granda, Kazuhiro Hotta, Sahar Alasvand Yazdani, Robert Pollice, Jenya Vestfrid, Han Hao, Cyrille Lavigne, Martin Seifrid, Nicholas Angello, Fatima Bencheikh, Jason E. Hein, Martin Burke, Chihaya Adachi, Alán Aspuru-Guzik

Volume & Issue: Volume 35, Issue 6

Pages: 2207070

Abstract: Conventional materials discovery is a laborious and time-consuming process that can take decades from initial conception of the material to commercialization. Recent developments in materials acceleration platforms promise to accelerate materials discovery using automation of experiments coupled with machine learning. However, most of the automation efforts in chemistry focus on synthesis and compound identification, with integrated target property characterization receiving less attention. In this work, an automated platform is introduced for the discovery of molecules as gain mediums for organic semiconductor lasers, a problem that has been challenging for conventional approaches. This platform encompasses automated lego-like synthesis, product identification, and optical characterization that can be executed in a fully integrated end-to-end fashion. Using this workflow to screen organic laser candidates, discovered eight potential candidates for organic lasers is discovered. The lasing threshold of four molecules in thin-film devices and find two molecules with state-of-the-art performance is tested. These promising results show the potential of automated synthesis and screening for accelerated materials development.

Tags: autonomous laboratory, accelerated materials discovery, automated synthesis and analysis, organic laser

72. Accurate global machine learning force fields for molecules with hundreds of atoms, Science Advances (January 11, 2023)

Authors: Stefan Chmiela, Valentin Vassilev-Galindo, Oliver T. Unke, Adil Kabylda, Huziel E. Sauceda, Alexandre Tkatchenko, Klaus-Robert Müller

Volume & Issue: Volume 9, Issue 2

Pages: eadf0873

Abstract: Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset.

Tags: None

71. Machine Learning-Assisted Synthesis of Two-Dimensional Materials, ACS Applied Materials & Interfaces (January 11, 2023)

Authors: Mingying Lu, Haining Ji, Yong Zhao, Yongxing Chen, Jundong Tao, Yangyong Ou, Yi Wang, Yan Huang, Junlong Wang, Guolin Hao

Volume & Issue: Volume 15, Issue 1

Pages: 1871-1878

Abstract: Two-dimensional (2D) materials have intriguing physical and chemical properties, which exhibit promising applications in the fields of electronics, optoelectronics, as well as energy storage. However, the controllable synthesis of 2D materials is highly desirable but remains challenging. Machine learning (ML) facilitates the development of insights and discoveries from a large amount of data in a short time for the materials synthesis, which can significantly reduce the computational costs and shorten the development cycles. Based on this, taking the 2D material MoS2 as an example, the parameters of successfully synthesized materials by chemical vapor deposition (CVD) were explored through four ML algorithms: XGBoost, Support Vector Machine (SVM), Naïve Bayes (NB), and Multilayer Perceptron (MLP). Recall, specificity, accuracy, and other metrics were used to assess the performance of these four models. By comparison, XGBoost was the best performing model among all the models, with an average prediction accuracy of over 88% and a high area under the receiver operating characteristic (AUROC) reaching 0.91. And these findings showed that the reaction temperature (T) had a crucial influence on the growth of MoS2. Furthermore, the importance of the features in the growth mechanism of MoS2 was optimized, such as the reaction temperature (T), Ar gas flow rate (Rf), reaction time (t), and so on. The results demonstrated that ML assisted materials preparation can significantly minimize the time spent on exploration and trial-and-error, which provided perspectives in the preparation of 2D materials.

Tags: None

70. Toward the design of ultrahigh-entropy alloys via mining six million texts, Nature Communications (January 04, 2023)

Authors: Zongrui Pei, Junqi Yin, Peter K. Liaw, Dierk Raabe

Volume & Issue: Volume 14, Issue 1

Pages: 54

Abstract: It has long been a norm that researchers extract knowledge from literature to design materials. However, the avalanche of publications makes the norm challenging to follow. Text mining (TM) is efficient in extracting information from corpora. Still, it cannot discover materials not present in the corpora, hindering its broader applications in exploring novel materials, such as high-entropy alloys (HEAs). Here we introduce a concept of “context similarity” for selecting chemical elements for HEAs, based on TM models that analyze the abstracts of 6.4 million papers. The method captures the similarity of chemical elements in the context used by scientists. It overcomes the limitations of TM and identifies the Cantor and Senkov HEAs. We demonstrate its screening capability for six- and seven-component lightweight HEAs by finding nearly 500 promising alloys out of 2.6 million candidates. The method thus brings an approach to the development of ultrahigh-entropy alloys and multicomponent materials.

Tags: Computational methods, Metals and alloys

69. On scientific understanding with artificial intelligence, Nature Reviews Physics (December 2022)

Authors: Mario Krenn, Robert Pollice, Si Yue Guo, Matteo Aldeghi, Alba Cervera-Lierta, Pascal Friederich, Gabriel dos Passos Gomes, Florian Häse, Adrian Jinich, AkshatKumar Nigam, Zhenpeng Yao, Alán Aspuru-Guzik

Volume & Issue: Volume 4, Issue 12

Pages: 761-769

Abstract: Scientific understanding is one of the main aims of science. This Perspective discusses how advanced computational systems, and artificial intelligence in particular, can contribute to driving scientific understanding.

Tags: None

68. A data-science approach to predict the heat capacity of nanoporous materials, Nature Materials (December 2022)

Authors: Seyed Mohamad Moosavi, Balázs Álmos Novotny, Daniele Ongari, Elias Moubarak, Mehrdad Asgari, Özge Kadioglu, Charithea Charalambous, Andres Ortega-Guerrero, Amir H. Farmahini, Lev Sarkisov, Susana Garcia, Frank Noé, Berend Smit

Volume & Issue: Volume 21, Issue 12

Pages: 1419-1425

Abstract: The heat capacity of a material is a fundamental property of great practical importance. For example, in a carbon capture process, the heat required to regenerate a solid sorbent is directly related to the heat capacity of the material. However, for most materials suitable for carbon capture applications, the heat capacity is not known, and thus the standard procedure is to assume the same value for all materials. In this work, we developed a machine learning approach, trained on density functional theory simulations, to accurately predict the heat capacity of these materials, that is, zeolites, metal–organic frameworks and covalent–organic frameworks. The accuracy of our prediction is confirmed with experimental data. Finally, for a temperature swing adsorption process that captures carbon from the flue gas of a coal-fired power plant, we show that for some materials, the heat requirement is reduced by as much as a factor of two using the correct heat capacity.

Tags: Theory and computation, Chemical engineering, Materials chemistry, Metal–organic frameworks

67. A universal graph deep learning interatomic potential for the periodic table, Nature Computational Science (November 2022)

Authors: Chi Chen, Shyue Ping Ong

Volume & Issue: Volume 2, Issue 11

Pages: 718-728

Abstract: A machine learning interatomic potential model is designed and trained on diverse crystal data comprising 89 elements, enabling materials discovery across a vast chemical space without retraining.

Tags: None

66. Into the Unknown: How Computation Can Help Explore Uncharted Material Space, Journal of the American Chemical Society (October 19, 2022)

Authors: Austin M. Mroz, Victor Posligua, Andrew Tarzia, Emma H. Wolpert, Kim E. Jelfs

Volume & Issue: Volume 144, Issue 41

Pages: 18730-18743

Abstract: Novel functional materials are urgently needed to help combat the major global challenges facing humanity, such as climate change and resource scarcity. Yet, the traditional experimental materials discovery process is slow and the material space at our disposal is too vast to effectively explore using intuition-guided experimentation alone. Most experimental materials discovery programs necessarily focus on exploring the local space of known materials, so we are not fully exploiting the enormous potential material space, where more novel materials with unique properties may exist. Computation, facilitated by improvements in open-source software and databases, as well as computer hardware has the potential to significantly accelerate the rational development of materials, but all too often is only used to postrationalize experimental observations. Thus, the true predictive power of computation, where theory leads experimentation, is not fully utilized. Here, we discuss the challenges to successful implementation of computation-driven materials discovery workflows, and then focus on the progress of the field, with a particular emphasis on the challenges to reaching novel materials.

Tags: None

65. The endless search for better alloys, Science (October 07, 2022)

Authors: Qing-Miao Hu, Rui Yang

Volume & Issue: Volume 378, Issue 6615

Pages: 26-27

Abstract: No abstract available

Tags: None

64. An artificial intelligence enabled chemical synthesis robot for exploration and optimization of nanomaterials, Science Advances (October 07, 2022)

Authors: Yibin Jiang, Daniel Salley, Abhishek Sharma, Graham Keenan, Margaret Mullin, Leroy Cronin

Volume & Issue: Volume 8, Issue 40

Pages: eabo2626

Abstract: We present an autonomous chemical synthesis robot for the exploration, discovery, and optimization of nanostructures driven by real-time spectroscopic feedback, theory, and machine learning algorithms that control the reaction conditions and allow the selective templating of reactions. This approach allows the transfer of materials as seeds between cycles of exploration, opening the search space like gene transfer in biology. The open-ended exploration of the seed-mediated multistep synthesis of gold nanoparticles (AuNPs) via in-line ultraviolet-visible characterization led to the discovery of five categories of nanoparticles by only performing ca. 1000 experiments in three hierarchically linked chemical spaces. The platform optimized nanostructures with desired optical properties by combining experiments and extinction spectrum simulations to achieve a yield of up to 95%. The synthetic procedure is outputted in a universal format using the chemical description language (χDL) with analytical data to produce a unique digital signature to enable the reproducibility of the synthesis.

Tags: None

63. Machine learning–enabled high-entropy alloy discovery, Science (October 07, 2022)

Authors: Ziyuan Rao, Po-Yen Tung, Ruiwen Xie, Ye Wei, Hongbin Zhang, Alberto Ferrari, T.P.C. Klaver, Fritz Körmann, Prithiv Thoudden Sukumar, Alisson Kwiatkowski da Silva, Yao Chen, Zhiming Li, Dirk Ponge, Jörg Neugebauer, Oliver Gutfleisch, Stefan Bauer, Dierk Raabe

Volume & Issue: Volume 378, Issue 6615

Pages: 78-85

Abstract: High-entropy alloys are solid solutions of multiple principal elements that are capable of reaching composition and property regimes inaccessible for dilute materials. Discovering those with valuable properties, however, too often relies on serendipity, because thermodynamic alloy design rules alone often fail in high-dimensional composition spaces. We propose an active learning strategy to accelerate the design of high-entropy Invar alloys in a practically infinite compositional space based on very sparse data. Our approach works as a closed-loop, integrating machine learning with density-functional theory, thermodynamic calculations, and experiments. After processing and characterizing 17 new alloys out of millions of possible compositions, we identified two high-entropy Invar alloys with extremely low thermal expansion coefficients around 2 × 10−6 per degree kelvin at 300 kelvin. We believe this to be a suitable pathway for the fast and automated discovery of high-entropy alloys with optimal thermal, magnetic, and electrical properties.

Tags: None

62. Multi-environment robotic transitions through adaptive morphogenesis, Nature (October 2022)

Authors: Robert Baines, Sree Kalyan Patiballa, Joran Booth, Luis Ramirez, Thomas Sipple, Andonny Garcia, Frank Fish, Rebecca Kramer-Bottiglio

Volume & Issue: Volume 610, Issue 7931

Pages: 283-289

Abstract: A design strategy termed ‘adaptive morphogenesis’ enables a robot inspired by aquatic and terrestrial turtles to adapt its limb morphology and gait to specialize for locomotion in different environments.

Tags: None

61. Autonomous optimization of non-aqueous Li-ion battery electrolytes via robotic experimentation and machine learning coupling, Nature Communications (September 27, 2022)

Authors: Adarsh Dave, Jared Mitchell, Sven Burke, Hongyi Lin, Jay Whitacre, Venkatasubramanian Viswanathan

Volume & Issue: Volume 13, Issue 1

Pages: 5454

Abstract: Developing high-energy and efficient battery technologies is a crucial aspect of advancing the electrification of transportation and aviation. However, battery innovations can take years to deliver. In the case of non-aqueous battery electrolyte solutions, the many design variables in selecting multiple solvents, salts and their relative ratios make electrolyte optimization time-consuming and laborious. To overcome these issues, we propose in this work an experimental design that couples robotics (a custom-built automated experiment named “Clio”) to machine-learning (a Bayesian optimization-based experiment planner named “Dragonfly”). An autonomous optimization of the electrolyte conductivity over a single-salt and ternary solvent design space identifies six fast-charging non-aqueous electrolyte solutions in two work-days and forty-two experiments. This result represents a six-fold time acceleration compared to a random search performed by the same automated experiment. To validate the practical use of these electrolytes, we tested them in a 220 mAh graphite∣∣LiNi0.5Mn0.3Co0.2O2 pouch cell configuration. All the pouch cells containing the robot-developed electrolytes demonstrate improved fast-charging capability against a baseline experiment that uses a non-aqueous electrolyte solution selected a priori from the design space.

Tags: Mechanical engineering, Software, Electrical and electronic engineering, Batteries, Materials for energy and catalysis

60. Data-Driven Materials Innovation and Applications, Advanced Materials (September 08, 2022)

Authors: Zhuo Wang, Zhehao Sun, Hang Yin, Xinghui Liu, Jinlan Wang, Haitao Zhao, Cheng Heng Pang, Tao Wu, Shuzhou Li, Zongyou Yin, Xue-Feng Yu

Volume & Issue: Volume 34, Issue 36

Pages: 2104113

Abstract: Owing to the rapid developments to improve the accuracy and efficiency of both experimental and computational investigative methodologies, the massive amounts of data generated have led the field of materials science into the fourth paradigm of data-driven scientific research. This transition requires the development of authoritative and up-to-date frameworks for data-driven approaches for material innovation. A critical discussion on the current advances in the data-driven discovery of materials with a focus on frameworks, machine-learning algorithms, material-specific databases, descriptors, and targeted applications in the field of inorganic materials is presented. Frameworks for rationalizing data-driven material innovation are described, and a critical review of essential subdisciplines is presented, including: i) advanced data-intensive strategies and machine-learning algorithms; ii) material databases and related tools and platforms for data generation and management; iii) commonly used molecular descriptors used in data-driven processes. Furthermore, an in-depth discussion on the broad applications of material innovation, such as energy conversion and storage, environmental decontamination, flexible electronics, optoelectronics, superconductors, metallic glasses, and magnetic materials, is provided. Finally, how these subdisciplines (with insights into the synergy of materials science, computational tools, and mathematics) support data-driven paradigms is outlined, and the opportunities and challenges in data-driven material innovation are highlighted.

Tags: machine learning, data-driven research, material applications, material informatics, material innovation

59. Imaging and computing with disorder, Nature Physics (September 2022)

Authors: Sylvain Gigan

Volume & Issue: Volume 18, Issue 9

Pages: 980-985

Abstract: Multiple scattering of light in complex and disordered media scrambles optical information. This Perspective showcases how this often detrimental physical mixing can be exploited to extract and process information for optical imaging and computing.

Tags: None

58. Deep-learning seismology, Science (August 12, 2022)

Authors: S. Mostafa Mousavi, Gregory C. Beroza

Volume & Issue: Volume 377, Issue 6607

Pages: eabm4470

Abstract: Seismic waves from earthquakes and other sources are used to infer the structure and properties of Earth’s interior. The availability of large-scale seismic datasets and the suitability of deep-learning techniques for seismic data processing have pushed deep learning to the forefront of fundamental, long-standing research investigations in seismology. However, some aspects of applying deep learning to seismology are likely to prove instructive for the geosciences, and perhaps other research areas more broadly. Deep learning is a powerful approach, but there are subtleties and nuances in its application. We present a systematic overview of trends, challenges, and opportunities in applications of deep-learning methods in seismology.

Tags: None

57. Strong yet ductile nanolamellar high-entropy alloys by additive manufacturing, Nature (August 2022)

Authors: Jie Ren, Yin Zhang, Dexin Zhao, Yan Chen, Shuai Guan, Yanfang Liu, Liang Liu, Siyuan Peng, Fanyue Kong, Jonathan D. Poplawsky, Guanhui Gao, Thomas Voisin, Ke An, Y. Morris Wang, Kelvin Y. Xie, Ting Zhu, Wen Chen

Volume & Issue: Volume 608, Issue 7921

Pages: 62-68

Abstract: An additive manufacturing strategy is used to produce dual-phase nanolamellar high-entropy alloys that show a combination of enhanced high yield strength and high tensile ductility.

Tags: None

56. Machine learning in the search for new fundamental physics, Nature Reviews Physics (June 2022)

Authors: Georgia Karagiorgi, Gregor Kasieczka, Scott Kravitz, Benjamin Nachman, David Shih

Volume & Issue: Volume 4, Issue 6

Pages: 399-412

Abstract: Owing to the growing volumes of data from high-energy physics experiments, modern deep learning methods are playing an increasingly important role in all aspects of data taking and analysis. This Review provides an overview of key developments, with a focus on the search for physics beyond the standard model.

Tags: None

55. Enhancing computational fluid dynamics with machine learning, Nature Computational Science (June 2022)

Authors: Ricardo Vinuesa, Steven L. Brunton

Volume & Issue: Volume 2, Issue 6

Pages: 358-366

Abstract: Machine learning has been used to accelerate the simulation of fluid dynamics. However, despite the recent developments in this field, there are still challenges to be addressed by the community, a fact that creates research opportunities.

Tags: None

54. Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature, Scientific Data (May 25, 2022)

Authors: Zheren Wang, Olga Kononova, Kevin Cruse, Tanjin He, Haoyan Huo, Yuxing Fei, Yan Zeng, Yingzhi Sun, Zijian Cai, Wenhao Sun, Gerbrand Ceder

Volume & Issue: Volume 9, Issue 1

Pages: 231

Abstract: The development of a materials synthesis route is usually based on heuristics and experience. A possible new approach would be to apply data-driven approaches to learn the patterns of synthesis from past experience and use them to predict the syntheses of novel materials. However, this route is impeded by the lack of a large-scale database of synthesis formulations. In this work, we applied advanced machine learning and natural language processing techniques to construct a dataset of 35,675 solution-based synthesis procedures extracted from the scientific literature. Each procedure contains essential synthesis information including the precursors and target materials, their quantities, and the synthesis actions and corresponding attributes. Every procedure is also augmented with the reaction formula. Through this work, we are making freely available the first large dataset of solution-based inorganic materials synthesis procedures.

Tags: Computational methods, Design, synthesis and processing, Materials science, Cheminformatics

53. High-entropy nanoparticles: Synthesis-structure-property relationships and data-driven discovery, Science (April 08, 2022)

Authors: Yonggang Yao, Qi Dong, Alexandra Brozena, Jian Luo, Jianwei Miao, Miaofang Chi, Chao Wang, Ioannis G. Kevrekidis, Zhiyong Jason Ren, Jeffrey Greeley, Guofeng Wang, Abraham Anapolsky, Liangbing Hu

Volume & Issue: Volume 376, Issue 6589

Pages: eabn3103

Abstract: High-entropy nanoparticles have become a rapidly growing area of research in recent years. Because of their multielemental compositions and unique high-entropy mixing states (i.e., solid-solution) that can lead to tunable activity and enhanced stability, these nanoparticles have received notable attention for catalyst design and exploration. However, this strong potential is also accompanied by grand challenges originating from their vast compositional space and complex atomic structure, which hinder comprehensive exploration and fundamental understanding. Through a multidisciplinary view of synthesis, characterization, catalytic applications, high-throughput screening, and data-driven materials discovery, this review is dedicated to discussing the important progress of high-entropy nanoparticles and unveiling the critical needs for their future development for catalysis, energy, and sustainability applications.

Tags: None

52. Distributed representations of atoms and materials for machine learning, npj Computational Materials (March 18, 2022)

Authors: Luis M. Antunes, Ricardo Grau-Crespo, Keith T. Butler

Volume & Issue: Volume 8, Issue 1

Pages: 44

Abstract: The use of machine learning is becoming increasingly common in computational materials science. To build effective models of the chemistry of materials, useful machine-based representations of atoms and their compounds are required. We derive distributed representations of compounds from their chemical formulas only, via pooling operations of distributed representations of atoms. These compound representations are evaluated on ten different tasks, such as the prediction of formation energy and band gap, and are found to be competitive with existing benchmarks that make use of structure, and even superior in cases where only composition is available. Finally, we introduce an approach for learning distributed representations of atoms, named SkipAtom, which makes use of the growing information in materials structure databases.

Tags: Computational methods, Atomistic models

51. Autonomous platforms for data-driven organic synthesis, Nature Communications (February 28, 2022)

Authors: Wenhao Gao, Priyanka Raghavan, Connor W. Coley

Volume & Issue: Volume 13, Issue 1

Pages: 1075

Abstract: Achieving autonomous multi-step synthesis of novel molecular structures in chemical discovery processes is a goal shared by many researchers. In this Comment, we discuss key considerations of what an ideal platform may look like and the apparent state of the art. While most hardware challenges can be overcome with clever engineering, other challenges will require advances in both algorithms and data curation.

Tags: Automation, Chemical engineering, Cheminformatics, Organic chemistry

50. A self-driving laboratory advances the Pareto front for material properties, Nature Communications (February 22, 2022)

Authors: Benjamin P. MacLeod, Fraser G. L. Parlane, Connor C. Rupnow, Kevan E. Dettelbach, Michael S. Elliott, Thomas D. Morrissey, Ted H. Haley, Oleksii Proskurin, Michael B. Rooney, Nina Taherimakhsousi, David J. Dvorak, Hsi N. Chiu, Christopher E. B. Waizenegger, Karry Ocean, Mehrdad Mokhtari, Curtis P. Berlinguette

Volume & Issue: Volume 13, Issue 1

Pages: 995

Abstract: Useful materials must satisfy multiple objectives, where the optimization of one objective is often at the expense of another. The Pareto front reports the optimal trade-offs between these conflicting objectives. Here we use a self-driving laboratory, Ada, to define the Pareto front of conductivities and processing temperatures for palladium films formed by combustion synthesis. Ada discovers new synthesis conditions that yield metallic films at lower processing temperatures (below 200 °C) relative to the prior art for this technique (250 °C). This temperature difference makes possible the coating of different commodity plastic materials (e.g., Nafion, polyethersulfone). These combustion synthesis conditions enable us to to spray coat uniform palladium films with moderate conductivity (1.1 × 105 S m−1) at 191 °C. Spray coating at 226 °C yields films with conductivities (2.0 × 106 S m−1) comparable to those of sputtered films (2.0 to 5.8 × 106 S m−1). This work shows how a self-driving laboratoy can discover materials that provide optimal trade-offs between conflicting objectives.

Tags: Characterization and analytical techniques, Design, Electronic devices, synthesis and processing

49. Density of states prediction for materials discovery via contrastive learning from probabilistic embeddings, Nature Communications (February 17, 2022)

Authors: Shufeng Kong, Francesco Ricci, Dan Guevarra, Jeffrey B. Neaton, Carla P. Gomes, John M. Gregoire

Volume & Issue: Volume 13, Issue 1

Pages: 949

Abstract: Machine learning for materials discovery has largely focused on predicting an individual scalar rather than multiple related properties, where spectral properties are an important example. Fundamental spectral properties include the phonon density of states (phDOS) and the electronic density of states (eDOS), which individually or collectively are the origins of a breadth of materials observables and functions. Building upon the success of graph attention networks for encoding crystalline materials, we introduce a probabilistic embedding generator specifically tailored to the prediction of spectral properties. Coupled with supervised contrastive learning, our materials-to-spectrum (Mat2Spec) model outperforms state-of-the-art methods for predicting ab initio phDOS and eDOS for crystalline materials. We demonstrate Mat2Spec’s ability to identify eDOS gaps below the Fermi energy, validating predictions with ab initio calculations and thereby discovering candidate thermoelectrics and transparent conductors. Mat2Spec is an exemplar framework for predicting spectral properties of materials via strategically incorporated machine learning techniques.

Tags: Computational methods, Computer science, Thermoelectrics

48. Data-driven modeling and prediction of non-linearizable dynamics via spectral submanifolds, Nature Communications (February 15, 2022)

Authors: Mattia Cenedese, Joar Axås, Bastian Bäuerlein, Kerstin Avila, George Haller

Volume & Issue: Volume 13, Issue 1

Pages: 872

Abstract: We develop a methodology to construct low-dimensional predictive models from data sets representing essentially nonlinear (or non-linearizable) dynamical systems with a hyperbolic linear part that are subject to external forcing with finitely many frequencies. Our data-driven, sparse, nonlinear models are obtained as extended normal forms of the reduced dynamics on low-dimensional, attracting spectral submanifolds (SSMs) of the dynamical system. We illustrate the power of data-driven SSM reduction on high-dimensional numerical data sets and experimental measurements involving beam oscillations, vortex shedding and sloshing in a water tank. We find that SSM reduction trained on unforced data also predicts nonlinear response accurately under additional external forcing.

Tags: Mechanical engineering, Scientific data

47. Innovative Materials Science via Machine Learning, Advanced Functional Materials (February 03, 2022)

Authors: Chaochao Gao, Xin Min, Minghao Fang, Tianyi Tao, Xiaohong Zheng, Yangai Liu, Xiaowen Wu, Zhaohui Huang

Volume & Issue: Volume 32, Issue 1

Pages: 2108044

Abstract: Nowadays, the research on materials science is rapidly entering a phase of data-driven age. Machine learning, one of the most powerful data-driven methods, have been being applied to materials discovery and performances prediction with undoubtedly tremendous application foreground. Herein, the challenges and current progress of machine learning are summarized in materials science, the design strategies are classified and highlighted, and possible perspectives are proposed for the future development. It is hoped this review can provide important scientific guidance for innovating materials science and technology via machine learning in the future.

Tags: machine learning, materials discovery, materials science, data-driven, performances prediction, perspective

46. An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties, Matter (January 05, 2022)

Authors: Zekun Ren, Siyu Isaac Parker Tian, Juhwan Noh, Felipe Oviedo, Guangzong Xing, Jiali Li, Qiaohao Liang, Ruiming Zhu, Armin G. Aberle, Shijing Sun, Xiaonan Wang, Yi Liu, Qianxiao Li, Senthilnath Jayavelu, Kedar Hippalgaonkar, Yousung Jung, Tonio Buonassisi

Volume & Issue: Volume 5, Issue 1

Pages: 314-335

Abstract: Realizing general inverse design could greatly accelerate the discovery of new materials with user-defined properties. However, state-of-the-art generative models tend to be limited to a specific composition or crystal structure. Herein, we present a framework capable of general inverse design (not limited to a given set of elements or crystal structures), featuring a generalized invertible representation that encodes crystals in both real and reciprocal space, and a property-structured latent space from a variational autoencoder (VAE). In three design cases, the framework generates 142 new crystals with user-defined formation energies, bandgap, thermoelectric (TE) power factor, and combinations thereof. These generated crystals, absent in the training database, are validated by first-principles calculations. The success rates (number of first-principles-validated target-satisfying crystals/number of designed crystals) ranges between 7.1% and 38.9%. These results represent a significant step toward property-driven general inverse design using generative models, although practical challenges remain when coupled with experimental synthesis.

Tags: machine learning, thermoelectrics, general inverse design, generalized crystallographic representation, generative model, invertible crystallographic representation, property-structured latent space, solid-state materials, variational autoencoder

45. Applied Machine Learning for Developing Next-Generation Functional Materials, Advanced Functional Materials (December 16, 2021)

Authors: Filip Dinic, Kamalpreet Singh, Tony Dong, Milad Rezazadeh, Zhibo Wang, Ali Khosrozadeh, Tiange Yuan, Oleksandr Voznyy

Volume & Issue: Volume 31, Issue 51

Pages: 2104195

Abstract: Machine learning (ML) is a versatile technique to rapidly and efficiently generate insights from multidimensional data. It offers a much-needed avenue to accelerate the exploration and investigation of new materials to address time-sensitive global challenges such as climate change. The availability of large datasets in recent years has enabled the development of ML algorithms for various applications including experimental/device optimization and material discovery. This perspective provides a summary of the recent applications of ML in material discovery in a range of fields, from optoelectronics to batteries and electrocatalysis, as well as an overview of the methods behind these advances. The paper also attempts to summarize some key challenges and trends in current research methodologies.

Tags: machine learning, materials discovery, batteries, electrocatalysis, optoelectronics, solid-state electrolytes

44. Inverse design of two-dimensional materials with invertible neural networks, npj Computational Materials (December 09, 2021)

Authors: Victor Fung, Jiaxin Zhang, Guoxiang Hu, P. Ganesh, Bobby G. Sumpter

Volume & Issue: Volume 7, Issue 1

Pages: 200

Abstract: The ability to readily design novel materials with chosen functional properties on-demand represents a next frontier in materials discovery. However, thoroughly and efficiently sampling the entire design space in a computationally tractable manner remains a highly challenging task. To tackle this problem, we propose an inverse design framework (MatDesINNe) utilizing invertible neural networks which can map both forward and reverse processes between the design space and target property. This approach can be used to generate materials candidates for a designated property, thereby satisfying the highly sought-after goal of inverse design. We then apply this framework to the task of band gap engineering in two-dimensional materials, starting with MoS2. Within the design space encompassing six degrees of freedom in applied tensile, compressive and shear strain plus an external electric field, we show the framework can generate novel, high fidelity, and diverse candidates with near-chemical accuracy. We extend this generative capability further to provide insights regarding metal-insulator transition in MoS2 which are important for memristive neuromorphic applications, among others. This approach is general and can be directly extended to other materials and their corresponding design spaces and target properties.

Tags: Theory and computation, Electronic materials, Materials for devices

43. Machine Learning Driven Synthesis of Few-Layered WTe2 with Geometrical Control, Journal of the American Chemical Society (November 03, 2021)

Authors: Manzhang Xu, Bijun Tang, Yuhao Lu, Chao Zhu, Qianbo Lu, Chao Zhu, Lu Zheng, Jingyu Zhang, Nannan Han, Weidong Fang, Yuxi Guo, Jun Di, Pin Song, Yongmin He, Lixing Kang, Zhiyong Zhang, Wu Zhao, Cuntai Guan, Xuewen Wang, Zheng Liu

Volume & Issue: Volume 143, Issue 43

Pages: 18103-18113

Abstract: Reducing the lateral scale of two-dimensional (2D) materials to one-dimensional (1D) has attracted substantial research interest not only to achieve competitive electronic applications but also for the exploration of fundamental physical properties. Controllable synthesis of high-quality 1D nanoribbons (NRs) is thus highly desirable and essential for further study. Here, we report the implementation of supervised machine learning (ML) for the chemical vapor deposition (CVD) synthesis of high-quality quasi-1D few-layered WTe2 NRs. Feature importance analysis indicates that H2 gas flow rate has a profound influence on the formation of WTe2, and the source ratio governs the sample morphology. Notably, the growth mechanism of 1T′ few-layered WTe2 NRs is further proposed, which provides new insights for the growth of intriguing 2D and 1D tellurides and may inspire the growth strategies for other 1D nanostructures. Our findings suggest the effectiveness and capability of ML in guiding the synthesis of 1D nanostructures, opening up new opportunities for intelligent materials development.

Tags: None

42. Automating crystal-structure phase mapping by combining deep learning with constraint reasoning, Nature Machine Intelligence (September 2021)

Authors: Di Chen, Yiwei Bai, Sebastian Ament, Wenting Zhao, Dan Guevarra, Lan Zhou, Bart Selman, R. Bruce van Dover, John M. Gregoire, Carla P. Gomes

Volume & Issue: Volume 3, Issue 9

Pages: 812-822

Abstract: Incorporating prior knowledge in deep learning models can overcome the difficulties of supervised learning, including the need for large amounts of annotated data. An approach in this area called deep reasoning networks is applied to the complex task of mapping crystal structures from X-ray diffraction data for multi-element oxide structures, and identified 13 phases from 307 X-ray diffraction patterns in the previously unsolved Bi-Cu-V oxide system.

Tags: None

41. Accurate prediction of protein structures and interactions using a three-track neural network, Science (August 20, 2021)

Authors: Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N. Kinch, R. Dustin Schaeffer, Claudia Millán, Hahnbeom Park, Carson Adams, Caleb R. Glassman, Andy DeGiovanni, Jose H. Pereira, Andria V. Rodrigues, Alberdina A. van Dijk, Ana C. Ebrecht, Diederik J. Opperman, Theo Sagmeister, Christoph Buhlheller, Tea Pavkov-Keller, Manoj K. Rathinaswamy, Udit Dalwadi, Calvin K. Yip, John E. Burke, K. Christopher Garcia, Nick V. Grishin, Paul D. Adams, Randy J. Read, David Baker

Volume & Issue: Volume 373, Issue 6557

Pages: 871-876

Abstract: DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo–electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.

Tags: None

40. Highly accurate protein structure prediction with AlphaFold, Nature (August 2021)

Authors: John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis

Volume & Issue: Volume 596, Issue 7873

Pages: 583-589

Abstract: Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1–4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’8—has been an important open research problem for more than 50 years9. Despite recent progress10–14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.

Tags: Machine learning, Protein structure predictions, Structural biology, Computational biophysics

39. Nanoparticle synthesis assisted by machine learning, Nature Reviews Materials (August 2021)

Authors: Huachen Tao, Tianyi Wu, Matteo Aldeghi, Tony C. Wu, Alán Aspuru-Guzik, Eugenia Kumacheva

Volume & Issue: Volume 6, Issue 8

Pages: 701-716

Abstract: Machine learning can be applied for the controlled synthesis of nanoparticles with precise properties. This Review discusses different machine learning approaches for the synthesis of semiconductor, metal, carbon-based and polymeric nanoparticles, and highlights key approaches for the collection of large datasets.

Tags: None

38. Physics-informed machine learning, Nature Reviews Physics (June 2021)

Authors: George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, Liu Yang

Volume & Issue: Volume 3, Issue 6

Pages: 422-440

Abstract: The rapidly developing field of physics-informed learning integrates data and mathematical models seamlessly, enabling accurate inference of realistic and high-dimensional multiphysics problems. This Review discusses the methodology and provides diverse examples and an outlook for further developments.

Tags: None

37. Democratising deep learning for microscopy with ZeroCostDL4Mic, Nature Communications (April 15, 2021)

Authors: Lucas von Chamier, Romain F. Laine, Johanna Jukkala, Christoph Spahn, Daniel Krentzel, Elias Nehme, Martina Lerche, Sara Hernández-Pérez, Pieta K. Mattila, Eleni Karinou, Séamus Holden, Ahmet Can Solak, Alexander Krull, Tim-Oliver Buchholz, Martin L. Jones, Loïc A. Royer, Christophe Leterrier, Yoav Shechtman, Florian Jug, Mike Heilemann, Guillaume Jacquemet, Ricardo Henriques

Volume & Issue: Volume 12, Issue 1

Pages: 2276

Abstract: Deep Learning (DL) methods are powerful analytical tools for microscopy and can outperform conventional image processing pipelines. Despite the enthusiasm and innovations fuelled by DL technology, the need to access powerful and compatible resources to train DL networks leads to an accessibility barrier that novice users often find difficult to overcome. Here, we present ZeroCostDL4Mic, an entry-level platform simplifying DL access by leveraging the free, cloud-based computational resources of Google Colab. ZeroCostDL4Mic allows researchers with no coding expertise to train and apply key DL networks to perform tasks including segmentation (using U-Net and StarDist), object detection (using YOLOv2), denoising (using CARE and Noise2Void), super-resolution microscopy (using Deep-STORM), and image-to-image translation (using Label-free prediction - fnet, pix2pix and CycleGAN). Importantly, we provide suitable quantitative tools for each network to evaluate model performance, allowing model optimisation. We demonstrate the application of the platform to study multiple biological processes.

Tags: Machine learning, Cellular imaging, Image processing

36. Crystallography companion agent for high-throughput materials discovery, Nature Computational Science (April 2021)

Authors: Phillip M. Maffettone, Lars Banko, Peng Cui, Yury Lysogorskiy, Marc A. Little, Daniel Olds, Alfred Ludwig, Andrew I. Cooper

Volume & Issue: Volume 1, Issue 4

Pages: 290-297

Abstract: Crystallography companion agent enables autonomous characterization of powder X-ray diffraction data for organic and inorganic materials.

Tags: None

35. Materials design by synthetic biology, Nature Reviews Materials (April 2021)

Authors: Tzu-Chieh Tang, Bolin An, Yuanyuan Huang, Sangita Vasikaran, Yanyi Wang, Xiaoyu Jiang, Timothy K. Lu, Chao Zhong

Volume & Issue: Volume 6, Issue 4

Pages: 332-350

Abstract: Materials synthetic biology merges synthetic biology with materials science for the redesign of living systems into smart materials. This Review discusses how synthetic-biology tools can be applied for the engineering of self-organizing functional materials and programmable hybrid living materials.

Tags: None

34. Digital Transformation in Materials Science: A Paradigm Change in Material’s Development, Advanced Materials (February 24, 2021)

Authors: Julian Kimmig, Stefan Zechel, Ulrich S. Schubert

Volume & Issue: Volume 33, Issue 8

Pages: 2004940

Abstract: The ongoing digitalization is rapidly changing and will further revolutionize all parts of life. This statement is currently omnipresent in the media as well as in the scientific community; however, the exact consequences of the proceeding digitalization for the field of materials science in general and the way research will be performed in the future are still unclear. There are first promising examples featuring the potential to change discovery and development approaches toward new materials. Nevertheless, a wide range of open questions have to be solved in order to enable the so-called digital-supported material research. The current state-of-the-art, the present and future challenges, as well as the resulting perspectives for materials science are described.

Tags: machine learning, artificial intelligence, automation, combinatorial science, digital transformation

33. Bayesian reaction optimization as a tool for chemical synthesis, Nature (February 2021)

Authors: Benjamin J. Shields, Jason Stevens, Jun Li, Marvin Parasram, Farhan Damani, Jesus I. Martinez Alvarado, Jacob M. Janey, Ryan P. Adams, Abigail G. Doyle

Volume & Issue: Volume 590, Issue 7844

Pages: 89-96

Abstract: Bayesian optimization is applied in chemical synthesis towards the optimization of various organic reactions and is found to outperform scientists in both average optimization efficiency and consistency.

Tags: None

32. On-the-fly closed-loop materials discovery via Bayesian active learning, Nature Communications (November 24, 2020)

Authors: A. Gilad Kusne, Heshan Yu, Changming Wu, Huairuo Zhang, Jason Hattrick-Simpers, Brian DeCost, Suchismita Sarker, Corey Oses, Cormac Toher, Stefano Curtarolo, Albert V. Davydov, Ritesh Agarwal, Leonid A. Bendersky, Mo Li, Apurva Mehta, Ichiro Takeuchi

Volume & Issue: Volume 11, Issue 1

Pages: 5966

Abstract: Active learning—the field of machine learning (ML) dedicated to optimal experiment design—has played a part in science as far back as the 18th century when Laplace used it to guide his discovery of celestial mechanics. In this work, we focus a closed-loop, active learning-driven autonomous system on another major challenge, the discovery of advanced materials against the exceedingly complex synthesis-processes-structure-property landscape. We demonstrate an autonomous materials discovery methodology for functional inorganic compounds which allow scientists to fail smarter, learn faster, and spend less resources in their studies, while simultaneously improving trust in scientific results and machine learning tools. This robot science enables science-over-the-network, reducing the economic impact of scientists being physically separated from their labs. The real-time closed-loop, autonomous system for materials exploration and optimization (CAMEO) is implemented at the synchrotron beamline to accelerate the interconnected tasks of phase mapping and property optimization, with each cycle taking seconds to minutes. We also demonstrate an embodiment of human-machine interaction, where human-in-the-loop is called to play a contributing role within each cycle. This work has resulted in the discovery of a novel epitaxial nanocomposite phase-change memory material.

Tags: Characterization and analytical techniques, Information storage

31. Identifying domains of applicability of machine learning models for materials science, Nature Communications (September 04, 2020)

Authors: Christopher Sutton, Mario Boley, Luca M. Ghiringhelli, Matthias Rupp, Jilles Vreeken, Matthias Scheffler

Volume & Issue: Volume 11, Issue 1

Pages: 4428

Abstract: Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the average model test error. This can render different models indistinguishable although their performance differs substantially across materials, or it can make a model appear generally insufficient while it actually works well in specific sub-domains. Here, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of models within a materials class. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides. We find that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance.

Tags: Computational methods, Computer science, Structure prediction

30. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts, Nature Communications (July 14, 2020)

Authors: Baicheng Weng, Zhilong Song, Rilong Zhu, Qingyu Yan, Qingde Sun, Corey G. Grice, Yanfa Yan, Wan-Jian Yin

Volume & Issue: Volume 11, Issue 1

Pages: 3513

Abstract: Symbolic regression (SR) is an approach of interpretable machine learning for building mathematical formulas that best fit certain datasets. In this work, SR is used to guide the design of new oxide perovskite catalysts with improved oxygen evolution reaction (OER) activities. A simple descriptor, μ/t, where μ and t are the octahedral and tolerance factors, respectively, is identified, which accelerates the discovery of a series of new oxide perovskite catalysts with improved OER activity. We successfully synthesise five new oxide perovskites and characterise their OER activities. Remarkably, four of them, Cs0.4La0.6Mn0.25Co0.75O3, Cs0.3La0.7NiO3, SrNi0.75Co0.25O3, and Sr0.25Ba0.75NiO3, are among the oxide perovskite catalysts with the highest intrinsic activities. Our results demonstrate the potential of SR for accelerating the data-driven design and discovery of new materials with improved properties.

Tags: Computational methods, Electrocatalysis

29. A mobile robotic chemist, Nature (July 2020)

Authors: Benjamin Burger, Phillip M. Maffettone, Vladimir V. Gusev, Catherine M. Aitchison, Yang Bai, Xiaoyan Wang, Xiaobo Li, Ben M. Alston, Buyi Li, Rob Clowes, Nicola Rankin, Brandon Harris, Reiner Sebastian Sprick, Andrew I. Cooper

Volume & Issue: Volume 583, Issue 7815

Pages: 237-241

Abstract: A mobile robot autonomously operates analytical instruments in a wet chemistry laboratory, performing a photocatalyst optimization task much faster than a human would be able to.

Tags: None

28. Deep-Learning-Enabled Fast Optical Identification and Characterization of 2D Materials, Advanced Materials (June 09, 2020)

Authors: Bingnan Han, Yuxuan Lin, Yafang Yang, Nannan Mao, Wenyue Li, Haozhe Wang, Kenji Yasuda, Xirui Wang, Valla Fatemi, Lin Zhou, Joel I.-Jan Wang, Qiong Ma, Yuan Cao, Daniel Rodan-Legrain, Ya-Qing Bie, Efrén Navarro-Moratalla, Dahlia Klein, David MacNeill, Sanfeng Wu, Hikari Kitadai, Xi Ling, Pablo Jarillo-Herrero, Jing Kong, Jihao Yin, Tomás Palacios

Volume & Issue: Volume 32, Issue 29

Pages: 2000953

Abstract: Advanced microscopy and/or spectroscopy tools play indispensable roles in nanoscience and nanotechnology research, as they provide rich information about material processes and properties. However, the interpretation of imaging data heavily relies on the “intuition” of experienced researchers. As a result, many of the deep graphical features obtained through these tools are often unused because of difficulties in processing the data and finding the correlations. Such challenges can be well addressed by deep learning. In this work, the optical characterization of 2D materials is used as a case study, and a neural-network-based algorithm is demonstrated for the material and thickness identification of 2D materials with high prediction accuracy and real-time processing capability. Further analysis shows that the trained network can extract deep graphical features such as contrast, color, edges, shapes, flake sizes, and their distributions, based on which an ensemble approach is developed to predict the most relevant physical properties of 2D materials. Finally, a transfer learning technique is applied to adapt the pretrained network to other optical identification applications. This artificial-intelligence-based material characterization approach is a powerful tool that would speed up the preparation, initial characterization of 2D materials and other nanomaterials, and potentially accelerate new material discoveries.

Tags: machine learning, 2D materials, deep learning, material characterization, optical microscopy

27. Artificial Chemist: An Autonomous Quantum Dot Synthesis Bot, Advanced Materials (June 04, 2020)

Authors: Robert W. Epps, Michael S. Bowen, Amanda A. Volk, Kameel Abdel-Latif, Suyong Han, Kristofer G. Reyes, Aram Amassian, Milad Abolhasani

Volume & Issue: Volume 32, Issue 30

Pages: 2001626

Abstract: The optimal synthesis of advanced nanomaterials with numerous reaction parameters, stages, and routes, poses one of the most complex challenges of modern colloidal science, and current strategies often fail to meet the demands of these combinatorially large systems. In response, an Artificial Chemist is presented: the integration of machine-learning-based experiment selection and high-efficiency autonomous flow chemistry. With the self-driving Artificial Chemist, made-to-measure inorganic perovskite quantum dots (QDs) in flow are autonomously synthesized, and their quantum yield and composition polydispersity at target bandgaps, spanning 1.9 to 2.9 eV, are simultaneously tuned. Utilizing the Artificial Chemist, eleven precision-tailored QD synthesis compositions are obtained without any prior knowledge, within 30 h, using less than 210 mL of total starting QD solutions, and without user selection of experiments. Using the knowledge generated from these studies, the Artificial Chemist is pre-trained to use a new batch of precursors and further accelerate the synthetic path discovery of QD compositions, by at least twofold. The knowledge-transfer strategy further enhances the optoelectronic properties of the in-flow synthesized QDs (within the same resources as the no-prior-knowledge experiments) and mitigates the issues of batch-to-batch precursor variability, resulting in QDs averaging within 1 meV from their target peak emission energy.

Tags: machine learning, autonomous synthesis, microfluidics, perovskites, quantum dots

26. Coevolutionary search for optimal materials in the space of all possible compounds, npj Computational Materials (May 14, 2020)

Authors: Zahed Allahyari, Artem R. Oganov

Volume & Issue: Volume 6, Issue 1

Pages: 55

Abstract: Over the past decade, evolutionary algorithms, data mining, and other methods showed great success in solving the main problem of theoretical crystallography: finding the stable structure for a given chemical composition. Here, we develop a method that addresses the central problem of computational materials science: the prediction of material(s), among all possible combinations of all elements, that possess the best combination of target properties. This nonempirical method combines our new coevolutionary approach with the carefully restructured “Mendelevian” chemical space, energy filtering, and Pareto optimization to ensure that the predicted materials have optimal properties and a high chance to be synthesizable. The first calculations, presented here, illustrate the power of this approach. In particular, we find that diamond (and its polytypes, including lonsdaleite) are the hardest possible materials and that bcc-Fe has the highest zero-temperature magnetization among all possible compounds.

Tags: Computational methods, Mechanical properties, Magnetic properties and materials

25. Self-driving laboratory for accelerated discovery of thin-film materials, Science Advances (May 13, 2020)

Authors: B. P. MacLeod, F. G. L. Parlane, T. D. Morrissey, F. Häse, L. M. Roch, K. E. Dettelbach, R. Moreira, L. P. E. Yunker, M. B. Rooney, J. R. Deeth, V. Lai, G. J. Ng, H. Situ, R. H. Zhang, M. S. Elliott, T. H. Haley, D. J. Dvorak, A. Aspuru-Guzik, J. E. Hein, C. P. Berlinguette

Volume & Issue: Volume 6, Issue 20

Pages: eaaz8867

Abstract: Discovering and optimizing commercially viable materials for clean energy applications typically takes more than a decade. Self-driving laboratories that iteratively design, execute, and learn from materials science experiments in a fully autonomous loop present an opportunity to accelerate this research process. We report here a modular robotic platform driven by a model-based optimization algorithm capable of autonomously optimizing the optical and electronic properties of thin-film materials by modifying the film composition and processing conditions. We demonstrate the power of this platform by using it to maximize the hole mobility of organic hole transport materials commonly used in perovskite solar cells and consumer electronics. This demonstration highlights the possibilities of using autonomous laboratories to discover organic and inorganic materials relevant to materials sciences and clean energy technologies.

Tags: None

24. Accelerated discovery of CO2 electrocatalysts using active machine learning, Nature (May 2020)

Authors: Miao Zhong, Kevin Tran, Yimeng Min, Chuanhao Wang, Ziyun Wang, Cao-Thang Dinh, Phil De Luna, Zongqian Yu, Armin Sedighian Rasouli, Peter Brodersen, Song Sun, Oleksandr Voznyy, Chih-Shan Tan, Mikhail Askerka, Fanglin Che, Min Liu, Ali Seifitokaldani, Yuanjie Pang, Shen-Chuan Lo, Alexander Ip, Zachary Ulissi, Edward H. Sargent

Volume & Issue: Volume 581, Issue 7807

Pages: 178-183

Abstract: Machine learning predicts Cu-Al electrocatalysts provide better efficiency and productivity than copper when using intermittent renewable electricity to convert carbon dioxide to useful chemicals and fuels.

Tags: None

23. Improved protein structure prediction using potentials from deep learning, Nature (January 2020)

Authors: Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, Demis Hassabis

Volume & Issue: Volume 577, Issue 7792

Pages: 706-710

Abstract: AlphaFold predicts the distances between pairs of residues, is used to construct potentials of mean force that accurately describe the shape of a protein and can be optimized with gradient descent to predict protein structures.

Tags: None

22. Machine learning and the physical sciences, Reviews of Modern Physics (December 06, 2019)

Authors: Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, Lenka Zdeborová

Volume & Issue: Volume 91, Issue 4

Pages: 045002

Abstract: Machine learning encompasses a broad range of algorithms and modeling tools used for a vast array of data processing tasks, which has entered most scientific disciplines in recent years. We review in a selective way the recent research on the interface between machine learning and physical sciences. This includes conceptual developments in machine learning (ML) motivated by physical insights, applications of machine learning techniques to several domains in physics, and cross-fertilization between the two fields. After giving basic notion of machine learning methods and principles, we describe examples of how statistical physics is used to understand methods in ML. We then move to describe applications of ML methods in particle physics and cosmology, quantum many body physics, quantum computing, and chemical and material physics. We also highlight research and development into novel computing architectures aimed at accelerating ML. In each of the sections we describe recent successes as well as domain-specific methodology and challenges.

Tags: Physics - Computational Physics, Astrophysics - Cosmology and Nongalactic Astrophysics, Condensed Matter - Disordered Systems and Neural Networks, High Energy Physics - Theory, Quantum Physics

21. Data-Driven Materials Science: Status, Challenges, and Perspectives, Advanced Science (September 01, 2019)

Authors: Lauri Himanen, Amber Geurts, Adam Stuart Foster, Patrick Rinke

Volume & Issue: Volume 6, Issue 21

Pages: 1900808

Abstract: Data-driven science is heralded as a new paradigm in materials science. In this field, data is the new resource, and knowledge is extracted from materials datasets that are too big or complex for traditional human reasoning—typically with the intent to discover new or improved materials or materials phenomena. Multiple factors, including the open science movement, national funding, and progress in information technology, have fueled its development. Such related tools as materials databases, machine learning, and high-throughput methods are now established as parts of the materials research toolset. However, there are a variety of challenges that impede progress in data-driven materials science: data veracity, integration of experimental and computational data, data longevity, standardization, and the gap between industrial interests and academic efforts. In this perspective article, the historical development and current state of data-driven materials science, building from the early evolution of open science to the rapid expansion of materials data infrastructures are discussed. Key successes and challenges so far are also reviewed, providing a perspective on the future development of the field.

Tags: machine learning, artificial intelligence, materials science, data science, databases, materials, open innovation, open science

20. Unsupervised word embeddings capture latent knowledge from materials science literature, Nature (July 2019)

Authors: Vahe Tshitoyan, John Dagdelen, Leigh Weston, Alexander Dunn, Ziqin Rong, Olga Kononova, Kristin A. Persson, Gerbrand Ceder, Anubhav Jain

Volume & Issue: Volume 571, Issue 7763

Pages: 95-98

Abstract: Natural language processing algorithms applied to three million materials science abstracts uncover relationships between words, material compositions and properties, and predict potential new thermoelectric materials.

Tags: None

19. 2DMatPedia, an open computational database of two-dimensional materials from top-down and bottom-up approaches, Scientific Data (June 12, 2019)

Authors: Jun Zhou, Lei Shen, Miguel Dias Costa, Kristin A. Persson, Shyue Ping Ong, Patrick Huck, Yunhao Lu, Xiaoyang Ma, Yiming Chen, Hanmei Tang, Yuan Ping Feng

Volume & Issue: Volume 6, Issue 1

Pages: 86

Abstract: Two-dimensional (2D) materials have been a hot research topic in the last decade, due to novel fundamental physics in the reduced dimension and appealing applications. Systematic discovery of functional 2D materials has been the focus of many studies. Here, we present a large dataset of 2D materials, with more than 6,000 monolayer structures, obtained from both top-down and bottom-up discovery procedures. First, we screened all bulk materials in the database of Materials Project for layered structures by a topology-based algorithm and theoretically exfoliated them into monolayers. Then, we generated new 2D materials by chemical substitution of elements in known 2D materials by others from the same group in the periodic table. The structural, electronic and energetic properties of these 2D materials are consistently calculated, to provide a starting point for further material screening, data mining, data analysis and artificial intelligence applications. We present the details of computational methodology, data record and technical validation of our publicly available data (http://www.2dmatpedia.org/).

Tags: Theory and computation, Electronic properties and materials

18. Structure prediction drives materials discovery, Nature Reviews Materials (May 2019)

Authors: Artem R. Oganov, Chris J. Pickard, Qiang Zhu, Richard J. Needs

Volume & Issue: Volume 4, Issue 5

Pages: 331-348

Abstract: Recent breakthroughs in crystal structure prediction have enabled the discovery of new materials and of new physical and chemical phenomena. This Review surveys structure prediction methods and presents examples of results in different classes of materials.

Tags: None

17. Capturing chemical intuition in synthesis of metal-organic frameworks, Nature Communications (February 01, 2019)

Authors: Seyed Mohamad Moosavi, Arunraj Chidambaram, Leopold Talirz, Maciej Haranczyk, Kyriakos C. Stylianou, Berend Smit

Volume & Issue: Volume 10, Issue 1

Pages: 539

Abstract: We report a methodology using machine learning to capture chemical intuition from a set of (partially) failed attempts to synthesize a metal-organic framework. We define chemical intuition as the collection of unwritten guidelines used by synthetic chemists to find the right synthesis conditions. As (partially) failed experiments usually remain unreported, we have reconstructed a typical track of failed experiments in a successful search for finding the optimal synthesis conditions that yields HKUST-1 with the highest surface area reported to date. We illustrate the importance of quantifying this chemical intuition for the synthesis of novel materials.

Tags: Chemical synthesis, Cheminformatics, Energy, Metal–organic frameworks

16. Active learning for accelerated design of layered materials, npj Computational Materials (December 10, 2018)

Authors: Lindsay Bassman Oftelie, Pankaj Rajak, Rajiv K. Kalia, Aiichiro Nakano, Fei Sha, Jifeng Sun, David J. Singh, Muratahan Aykol, Patrick Huck, Kristin Persson, Priya Vashishta

Volume & Issue: Volume 4, Issue 1

Pages: 74

Abstract: Hetero-structures made from vertically stacked monolayers of transition metal dichalcogenides hold great potential for optoelectronic and thermoelectric devices. Discovery of the optimal layered material for specific applications necessitates the estimation of key material properties, such as electronic band structure and thermal transport coefficients. However, screening of material properties via brute force ab initio calculations of the entire material structure space exceeds the limits of current computing resources. Moreover, the functional dependence of material properties on the structures is often complicated, making simplistic statistical procedures for prediction difficult to employ without large amounts of data collection. Here, we present a Gaussian process regression model, which predicts material properties of an input hetero-structure, as well as an active learning model based on Bayesian optimization, which can efficiently discover the optimal hetero-structure using a minimal number of ab initio calculations. The electronic band gap, conduction/valence band dispersions, and thermoelectric performance are used as representative material properties for prediction and optimization. The Materials Project platform is used for electronic structure computation, while the BoltzTraP code is used to compute thermoelectric properties. Bayesian optimization is shown to significantly reduce the computational cost of discovering the optimal structure when compared with finding an optimal structure by building a regression model to predict material properties. The models can be used for predictions with respect to any material property and our software, including data preparation code based on the Python Materials Genomics (PyMatGen) library as well as python-based machine learning code, is available open source.

Tags: Computational methods, Chemical engineering

15. Molecular Dynamics Simulation for All, Neuron (September 19, 2018)

Authors: Scott A. Hollingsworth, Ron O. Dror

Volume & Issue: Volume 99, Issue 6

Pages: 1129-1143

Abstract: No abstract available

Tags: allostery, biomolecular simulation, conformational change, drug design, drug discovery, experimental design, MD simulations, protein, structural biology

14. Deep neural networks for accurate predictions of crystal stability, Nature Communications (September 18, 2018)

Authors: Weike Ye, Chi Chen, Zhenbin Wang, Iek-Heng Chu, Shyue Ping Ong

Volume & Issue: Volume 9, Issue 1

Pages: 3800

Abstract: Predicting the stability of crystals is one of the central problems in materials science. Today, density functional theory (DFT) calculations remain comparatively expensive and scale poorly with system size. Here we show that deep neural networks utilizing just two descriptors—the Pauling electronegativity and ionic radii—can predict the DFT formation energies of C3A2D3O12 garnets and ABO3 perovskites with low mean absolute errors (MAEs) of 7–10 meV atom−1 and 20–34 meV atom−1, respectively, well within the limits of DFT accuracy. Further extension to mixed garnets and perovskites with little loss in accuracy can be achieved using a binary encoding scheme, addressing a critical gap in the extension of machine-learning models from fixed stoichiometry crystals to infinite universe of mixed-species crystals. Finally, we demonstrate the potential of these models to rapidly transverse vast chemical spaces to accurately identify stable compositions, accelerating the discovery of novel materials with potentially superior properties.

Tags: Theory and computation, Density functional theory, Structural properties

13. Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning, Nature Communications (August 24, 2018)

Authors: Shuaihua Lu, Qionghua Zhou, Yixin Ouyang, Yilv Guo, Qiang Li, Jinlan Wang

Volume & Issue: Volume 9, Issue 1

Pages: 3405

Abstract: Rapidly discovering functional materials remains an open challenge because the traditional trial-and-error methods are usually inefficient especially when thousands of candidates are treated. Here, we develop a target-driven method to predict undiscovered hybrid organic-inorganic perovskites (HOIPs) for photovoltaics. This strategy, combining machine learning techniques and density functional theory calculations, aims to quickly screen the HOIPs based on bandgap and solve the problems of toxicity and poor environmental stability in HOIPs. Successfully, six orthorhombic lead-free HOIPs with proper bandgap for solar cells and room temperature thermal stability are screened out from 5158 unexplored HOIPs and two of them stand out with direct bandgaps in the visible region and excellent environmental stability. Essentially, a close structure-property relationship mapping the HOIPs bandgap is established. Our method can achieve high accuracy in a flash and be applicable to a broad class of functional material design.

Tags: Computational methods, Solar cells

12. Inverse molecular design using machine learning: Generative models for matter engineering, Science (July 27, 2018)

Authors: Benjamin Sanchez-Lengeling, Alán Aspuru-Guzik

Volume & Issue: Volume 361, Issue 6400

Pages: 360-365

Abstract: The discovery of new materials can bring enormous societal and technological progress. In this context, exploring completely the large space of potential materials is computationally intractable. Here, we review methods for achieving inverse design, which aims to discover tailored materials from the starting point of a particular desired functionality. Recent advances from the rapidly growing field of artificial intelligence, mostly from the subfield of machine learning, have resulted in a fertile exchange of ideas, where approaches to inverse molecular design are being proposed and employed at a rapid pace. Among these, deep generative models have been applied to numerous classes of materials: rational design of prospective drugs, synthetic routes to organic compounds, and optimization of photovoltaics and redox flow batteries, as well as a variety of other solid-state materials.

Tags: None

11. Insightful classification of crystal structures using deep learning, Nature Communications (July 17, 2018)

Authors: Angelo Ziletti, Devinder Kumar, Matthias Scheffler, Luca M. Ghiringhelli

Volume & Issue: Volume 9, Issue 1

Pages: 2775

Abstract: Computational methods that automatically extract knowledge from data are critical for enabling data-driven materials science. A reliable identification of lattice symmetry is a crucial first step for materials characterization and analytics. Current methods require a user-specified threshold, and are unable to detect average symmetries for defective structures. Here, we propose a machine learning-based approach to automatically classify structures by crystal symmetry. First, we represent crystals by calculating a diffraction image, then construct a deep learning neural network model for classification. Our approach is able to correctly classify a dataset comprising more than 100,000 simulated crystal structures, including heavily defective ones. The internal operations of the neural network are unraveled through attentive response maps, demonstrating that it uses the same landmarks a materials scientist would use, although never explicitly instructed to do so. Our study paves the way for crystal structure recognition of—possibly noisy and incomplete—three-dimensional structural data in big-data materials science.

Tags: Theory and computation, Applied mathematics, Structural materials

10. Machine learning for molecular and materials science, Nature (July 2018)

Authors: Keith T. Butler, Daniel W. Davies, Hugh Cartwright, Olexandr Isayev, Aron Walsh

Volume & Issue: Volume 559, Issue 7715

Pages: 547-555

Abstract: Recent progress in machine learning in the chemical sciences and future directions in this field are discussed.

Tags: None

9. ChemOS: Orchestrating autonomous experimentation, Science Robotics (June 20, 2018)

Authors: Loïc M. Roch, Florian Häse, Christoph Kreisbeck, Teresa Tamayo-Mendoza, Lars P. E. Yunker, Jason E. Hein, Alán Aspuru-Guzik

Volume & Issue: Volume 3, Issue 19

Pages: eaat5559

Abstract: ChemOS aims to catalyze the expansion of autonomous laboratories and to disrupt the conventional approach to experimentation.

Tags: None

8. Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments, Science Advances (April 13, 2018)

Authors: Fang Ren, Logan Ward, Travis Williams, Kevin J. Laws, Christopher Wolverton, Jason Hattrick-Simpers, Apurva Mehta

Volume & Issue: Volume 4, Issue 4

Pages: eaaq1566

Abstract: With more than a hundred elements in the periodic table, a large number of potential new materials exist to address the technological and societal challenges we face today; however, without some guidance, searching through this vast combinatorial space is frustratingly slow and expensive, especially for materials strongly influenced by processing. We train a machine learning (ML) model on previously reported observations, parameters from physiochemical theories, and make it synthesis method–dependent to guide high-throughput (HiTp) experiments to find a new system of metallic glasses in the Co-V-Zr ternary. Experimental observations are in good agreement with the predictions of the model, but there are quantitative discrepancies in the precise compositions predicted. We use these discrepancies to retrain the ML model. The refined model has significantly improved accuracy not only for the Co-V-Zr system but also across all other available validation data. We then use the refined model to guide the discovery of metallic glasses in two additional previously unreported ternaries. Although our approach of iterative use of ML and HiTp experiments has guided us to rapid discovery of three new glass-forming systems, it has also provided us with a quantitatively accurate, synthesis method–sensitive predictor for metallic glasses that improves performance with use and thus promises to greatly accelerate discovery of many new metallic glasses. We believe that this discovery paradigm is applicable to a wider range of materials and should prove equally powerful for other materials and properties that are synthesis path–dependent and that current physiochemical theories find challenging to predict.

Tags: None

7. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds, Nature Nanotechnology (March 2018)

Authors: Nicolas Mounet, Marco Gibertini, Philippe Schwaller, Davide Campi, Andrius Merkys, Antimo Marrazzo, Thibault Sohier, Ivano Eligio Castelli, Andrea Cepellotti, Giovanni Pizzi, Nicola Marzari

Volume & Issue: Volume 13, Issue 3

Pages: 246-252

Abstract: The largest available database of potentially exfoliable 2D materials has been obtained via high-throughput calculations using van der Waals density functional theory.

Tags: None

6. Accelerated Discovery of Large Electrostrains in BaTiO3-Based Piezoelectrics Using Active Learning, Advanced Materials (January 08, 2018)

Authors: Ruihao Yuan, Zhen Liu, Prasanna V. Balachandran, Deqing Xue, Yumei Zhou, Xiangdong Ding, Jun Sun, Dezhen Xue, Turab Lookman

Volume & Issue: Volume 30, Issue 7

Pages: 1702884

Abstract: A key challenge in guiding experiments toward materials with desired properties is to effectively navigate the vast search space comprising the chemistry and structure of allowed compounds. Here, it is shown how the use of machine learning coupled to optimization methods can accelerate the discovery of new Pb-free BaTiO3 (BTO-) based piezoelectrics with large electrostrains. By experimentally comparing several design strategies, it is shown that the approach balancing the trade-off between exploration (using uncertainties) and exploitation (using only model predictions) gives the optimal criterion leading to the synthesis of the piezoelectric (Ba0.84Ca0.16)(Ti0.90Zr0.07Sn0.03)O3 with the largest electrostrain of 0.23% in the BTO family. Using Landau theory and insights from density functional theory, it is uncovered that the observed large electrostrain is due to the presence of Sn, which allows for the ease of switching of tetragonal domains under an electric field.

Tags: machine learning, active learning, electrostrain, optimal experimental design, piezoelectric

5. Virtual screening of inorganic materials synthesis parameters with deep learning, npj Computational Materials (December 01, 2017)

Authors: Edward Kim, Kevin Huang, Stefanie Jegelka, Elsa Olivetti

Volume & Issue: Volume 3, Issue 1

Pages: 53

Abstract: Virtual materials screening approaches have proliferated in the past decade, driven by rapid advances in first-principles computational techniques, and machine-learning algorithms. By comparison, computationally driven materials synthesis screening is still in its infancy, and is mired by the challenges of data sparsity and data scarcity: Synthesis routes exist in a sparse, high-dimensional parameter space that is difficult to optimize over directly, and, for some materials of interest, only scarce volumes of literature-reported syntheses are available. In this article, we present a framework for suggesting quantitative synthesis parameters and potential driving factors for synthesis outcomes. We use a variational autoencoder to compress sparse synthesis representations into a lower dimensional space, which is found to improve the performance of machine-learning tasks. To realize this screening framework even in cases where there are few literature data, we devise a novel data augmentation methodology that incorporates literature synthesis data from related materials systems. We apply this variational autoencoder framework to generate potential SrTiO3 synthesis parameter sets, propose driving factors for brookite TiO2 formation, and identify correlations between alkali-ion intercalation and MnO2 polymorph selection.

Tags: Computational methods, Design, synthesis and processing, Materials science

4. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach, Nature Materials (October 2016)

Authors: Rafael Gómez-Bombarelli, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, David Duvenaud, Dougal Maclaurin, Martin A. Blood-Forsythe, Hyun Sik Chae, Markus Einzinger, Dong-Gwang Ha, Tony Wu, Georgios Markopoulos, Soonok Jeon, Hosuk Kang, Hiroshi Miyazaki, Masaki Numata, Sunghan Kim, Wenliang Huang, Seong Ik Hong, Marc Baldo, Ryan P. Adams, Alán Aspuru-Guzik

Volume & Issue: Volume 15, Issue 10

Pages: 1120-1127

Abstract: A high-throughput virtual screening approach is used to select molecules with efficient, thermally activated delayed fluorescence. The good performance of several selected emitters in organic LED applications has also been confirmed experimentally.

Tags: None

3. Machine-learning-assisted materials discovery using failed experiments, Nature (May 2016)

Authors: Paul Raccuglia, Katherine C. Elbert, Philip D. F. Adler, Casey Falk, Malia B. Wenny, Aurelio Mollo, Matthias Zeller, Sorelle A. Friedler, Joshua Schrier, Alexander J. Norquist

Volume & Issue: Volume 533, Issue 7601

Pages: 73-76

Abstract: Failed chemical reactions are rarely reported, even though they could still provide information about the bounds on the reaction conditions needed for product formation; here data from such reactions are used to train a machine-learning algorithm, which is subsequently able to predict reaction outcomes with greater accuracy than human intuition.

Tags: Computer science, Theory and computation, Solid-state chemistry

2. Accelerated search for materials with targeted properties by adaptive design, Nature Communications (April 15, 2016)

Authors: Dezhen Xue, Prasanna V. Balachandran, John Hogden, James Theiler, Deqing Xue, Turab Lookman

Volume & Issue: Volume 7, Issue 1

Pages: 11241

Abstract: Finding new materials with targeted properties has traditionally been guided by intuition, and trial and error. With increasing chemical complexity, the combinatorial possibilities are too large for an Edisonian approach to be practical. Here we show how an adaptive design strategy, tightly coupled with experiments, can accelerate the discovery process by sequentially identifying the next experiments or calculations, to effectively navigate the complex search space. Our strategy uses inference and global optimization to balance the trade-off between exploitation and exploration of the search space. We demonstrate this by finding very low thermal hysteresis (ΔT) NiTi-based shape memory alloys, with Ti50.0Ni46.7Cu0.8Fe2.3Pd0.2 possessing the smallest ΔT (1.84 K). We synthesize and characterize 36 predicted compositions (9 feedback loops) from a potential space of ∼800,000 compositions. Of these, 14 had smaller ΔT than any of the 22 in the original data set.

Tags: Metals and alloys, Condensed-matter physics

1. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies, npj Computational Materials (December 11, 2015)

Authors: Scott Kirklin, James E. Saal, Bryce Meredig, Alex Thompson, Jeff W. Doak, Muratahan Aykol, Stephan Rühl, Chris Wolverton

Volume & Issue: Volume 1, Issue 1

Pages: 15010

Abstract: The Open Quantum Materials Database (OQMD) is a high-throughput database currently consisting of nearly 300,000 density functional theory (DFT) total energy calculations of compounds from the Inorganic Crystal Structure Database (ICSD) and decorations of commonly occurring crystal structures. To maximise the impact of these data, the entire database is being made available, without restrictions, at www.oqmd.org/download. In this paper, we outline the structure and contents of the database, and then use it to evaluate the accuracy of the calculations therein by comparing DFT predictions with experimental measurements for the stability of all elemental ground-state structures and 1,670 experimental formation energies of compounds. This represents the largest comparison between DFT and experimental formation energies to date. The apparent mean absolute error between experimental measurements and our calculations is 0.096 eV/atom. In order to estimate how much error to attribute to the DFT calculations, we also examine deviation between different experimental measurements themselves where multiple sources are available, and find a surprisingly large mean absolute error of 0.082 eV/atom. Hence, we suggest that a significant fraction of the error between DFT and experimental formation energies may be attributed to experimental uncertainties. Finally, we evaluate the stability of compounds in the OQMD (including compounds obtained from the ICSD as well as hypothetical structures), which allows us to predict the existence of ~3,200 new compounds that have not been experimentally characterised and uncover trends in material discovery, based on historical data available within the ICSD.

Tags: Computational methods, Materials chemistry, Materials for energy and catalysis, Electronic structure

AI4(M)S Paper Collection

Update Time: 2025-12-09 17:08:52

📈 Publication Timeline

📝 Review/Survey (58 papers)

📚 Venue Index

ACS Applied Materials & Interfaces (1 papers)

Acta Materialia (7 papers)

Advanced Functional Materials (4 papers)

Advanced Materials (24 papers)

Advanced Science (4 papers)

Advances in Neural Information Processing Systems (4 papers)

Angewandte Chemie (2 papers)

Cell (1 papers)

Chemical Science (1 papers)

Chemical Society Reviews (1 papers)

Communications Chemistry (1 papers)

Communications Materials (2 papers)

Communications Physics (1 papers)

Computational Materials Science (1 papers)

Computer Methods in Applied Mechanics and Engineering (1 papers)

International Conference on Learning Representations (2 papers)

International Conference on Machine Learning (1 papers)

Journal of Chemical Theory and Computation (1 papers)

Journal of Molecular Biology (1 papers)

Journal of the American Chemical Society (16 papers)

Matter (11 papers)

Nature (43 papers)

Nature Aging (2 papers)

Nature Biomedical Engineering (3 papers)

Nature Biotechnology (3 papers)

Nature Catalysis (3 papers)

Nature Chemical Biology (2 papers)

Nature Chemical Engineering (6 papers)

Nature Chemistry (4 papers)

Nature Communications (78 papers)

Nature Computational Science (18 papers)

Nature Energy (1 papers)

Nature Human Behaviour (3 papers)

Nature Machine Intelligence (26 papers)

Nature Materials (6 papers)

Nature Medicine (1 papers)

Nature Methods (16 papers)

Nature Microbiology (1 papers)

Nature Nanotechnology (2 papers)

Nature Physics (1 papers)

Nature Reviews Bioengineering (1 papers)

Nature Reviews Chemistry (2 papers)

Nature Reviews Materials (7 papers)

Nature Reviews Methods Primers (1 papers)

Nature Reviews Physics (6 papers)

Nature Synthesis (10 papers)

Neuron (1 papers)

Preprint (39 papers)

Proceedings of the National Academy of Sciences (3 papers)

Review of Materials Research (2 papers)

Reviews of Modern Physics (1 papers)

Science (21 papers)

Science Advances (13 papers)

Science Immunology (1 papers)

Science Robotics (2 papers)

Scientific Data (4 papers)

Scientific Reports (4 papers)

Small (1 papers)

Sustainable Cities and Society (1 papers)

npj Computational Materials (25 papers)

📑 Papers (Chronological Order)

451. Real-space machine learning of correlation density functionals, Nature Communications (December 01, 2025)

450. A call to elevate the role of processing in AI-driven materials design, Nature Reviews Materials (December 2025)

449. Microstructure-agnostic deep learning for mechanistic discovery of corrosion-resistant Co-Cr-Fe-Ni MPEAs, Acta Materialia (December 01, 2025)

448. Accelerating science with AI, Science (November 25, 2025)

447. Leveraging spatial-angular redundancy for self-supervised denoising of 3D fluorescence imaging without temporal dependency, Nature Communications (November 24, 2025)

446. Discovering physical laws with parallel symbolic enumeration, Nature Computational Science (November 21, 2025)

445. Multimodal learning enables chat-based exploration of single-cell data, Nature Biotechnology (November 11, 2025)

444. Multilingualism protects against accelerated aging in cross-sectional and longitudinal analyses of 27 European countries, Nature Aging (November 10, 2025)

443. Emulating human-like adaptive vision for efficient and flexible machine visual perception, Nature Machine Intelligence (November 06, 2025)

442. Squidiff: predicting cellular development and responses to perturbations using a diffusion model, Nature Methods (November 03, 2025)

441. Language models cannot reliably distinguish belief from knowledge and fact, Nature Machine Intelligence (November 03, 2025)

440. Self-driving labs for biotechnology, Nature Computational Science (November 2025)

439. Enzyme specificity prediction using cross-attention graph neural networks, Nature (November 2025)

438. A multimodal whole-slide foundation model for pathology, Nature Medicine (November 2025)