2. Research
  3. Opening of the world’s largest polymeric data

Opening of the world’s largest polymeric data

To create the world’s largest polymeric objectivity database utilizing RadonPy, a high molecular material automatic computing system

Using the high molecular dynamics automatic calculation system RadonPy based on the molecular dynamics (MD) simulation in which the representative body group is under development, the world’s largest polymeric object database is constructed. RadonPy is a high molecular weight with LAMMPS. An overview of the chemical structure of the reticulated unit of polymerization and the ephelium of the polymerization system of 1

In the latest version, 14 types of objectivity (thermal conductivity, specific heat, thermal expansion, Young, etc.) of linear polymers (homopolymers, copolymers) Can be calculated automatically. In this project, 105-107of polymer skeleton is included by drawing the expansion and sophistication of the RadonPy system in a hyper parallel calculator and utilizing the 10-weightederation resources.

図1. 高分子物性自動計算システムRadonPyの概要

Polymer informatics with continued absence of database

The purpose of the research is to create an academic foundation for data-driven polymeric material research (polymer in informatics). The source of data-driven research is, of course, data. However, at this time, data-driven research There is no database of polymeric properties that contributes to. Therefore, at least in the short term, it is expected that high molecular infomatics will be parsed with strong data that can be produced in university laboratories and companies.

Driving force to achieve the mission 1: RadonPy

The automation of MD calculations for polymolecular properties has various technical difficulties, such as the generation of initial structures and the hassle of the verification process of equilibration calculations, and has not yet been realized. In addition to the technical barriers of automatic calculation, the huge amount of computation becomes an impediment, and the trend towards creating a comprehensive database is completely domestic and international. RadonPy is the only source of high molecular material MD automatic computation in the world at this time.

Drivers to achieve missions 2: Joint production of data beyond the roots of obstetrics

Another detonator to drive this project is an obstetric union to overcome a huge amount of computational walls. Because MD computations of polymolecular materiality involve enormous computational costs, data-driven research in small and medium-sized groups. Unable to produce level data that contributes to. There, representative institutions (statistical mathematical laboratories formulating data scientific research centers) are developed by the production union of a large number of companies, universities and national surveys.

Figure 2. Co-creation of the world’s largest polymeric database by the Obstetrics Union

To create the world’s largest polymeric database

Performs an objectivity calculation of 105-107 high molecules within the project period and provides it to society as an academic basis for data-driven polymer research. The polymer list to be calculated is an existing polymer registered in a public database. And the virtual library consists of a virtual library. For the former, the number of polymers is limited. So, in this project, the main subject of calculation is virtual polymer. Classified into 20 types and machine learning generation model is applied.
Here, we introduce a machine learning analysis technique that is the beginning of a deep layer generation model. Train a machine learning model using the chemical structure of an existing polymer and imitate the frequency patterns (fragments, binding rules, etc.) that appear in existing molecules. Build a structure generator. In order to train the model, a high molecular database of the National Research and Development Corporation and Material Research Organization PoLyInfo is used for the high molecular skeleton research system (representative. J Comput Aided Mol Des. 31 (4): 379-391 (2017); Yamada et al. ACS Cent Sci. 5 (10): 1717-1730 (2019); Wu et al. Mol Inform. 39:1900107 (2020) etc.). Also make a virtual library of copolymer and a mixture of low and high molecules. The size of the library reaches at least 109 or more.

Figure 3. Generation of virtual libraries by machine learning.

The figure below is an example of the generated virtual polyimide.

Integration of molecular design and polymeric property automatic calculation by machine learning

The implementation item of this study also includes a basic study of material’s informatics for macromolecules. The basic workflow of material’s informatics consists of sequential and reverse problems. The purpose of the forward problem is to the input of the system S. Predictive of output Y to. For example, input variables are equivalent to materiality. Inferred to this algorithm for predictive models. J Comput Aided Mol Des. 31 (4): 379-391 (2017); Wu et al. Mol Inform. 39:1900107 (2020)). Wu et al. (2019) successfully applied this molecular design algorithm to discover new high thermal conductivity polymers (Wu et al. npj Comput Matter. 5:66 (2019)).
In this project, we will show a fusion of this polymer-designed machine learning algorithm and the polymeric automatic computing system RadonPy (Figure 4: MD automatic calculation of polymolecular properties and machine learning fusion algorithm SPACIER). Generally innovative material There is no data around the material. Mechanical learning algorithm predicts future data.

Figure 4. Fusion algorithm SPACIER for MD automatic calculation of polymeric properties and machine learning

Academic contribution to high molecule science and industry

At the stage of reaching the milestones, we can observe the material distribution on the vast chemical space stretched by 105-107 polymers, which allows us to gain systematic knowledge of the polymeric science. In particular, by observing the simultaneous distribution of multiple properties, we can discover the position of the palate frontier in the physical space and the structural features of the polymer on the border. This alone must make a great academic contribution to the high molecular science.
In addition, a combination of database and machine learning may also discover new polymers with innovative properties. Using the data from this project, we will guide a machine learning model that predicts materiality from structure. Using this surrogate model, a large number of virtual library physical assessments predict a group of polymers with innovative properties from a vast chemical space. Figure 5 shows a multi-thing revealed from the screening of the virtual library of polyimide mentioned above
Also, a group of statistical mathematical laboratories and data science research centers is the JST-CREST heat control area “Termolecular Materials Informatics of High molecules” (representative: Tokyo Institute of Technology Junko Morikawa: Main Collaborator Research Institute of Statistical Research Yoshida) is working on high molecular conductivity research, which is generally required by high molecular research.

Figure 5. Simultaneous distribution of polymorphism revealed by large-scale virtual screening of polyimide polymers. Explore candidate molecules that simultaneously meet the required multiple requirements for high molecular material of 5G devices.