Development of coarse-grained potential of silica species

ABSTRACT Understanding the molecular transformation of silica aggregates during silica polymerisation is crucial to control the properties of final products such as zeolite, nanoparticles, mesoporous silica and other polymorphs. Nevertheless, molecular events during synthesis are challenging to probe using experimental techniques. Computational tools are beneficial to probe molecular length and time scales; however, the complex systems with significant degrees of freedom require significant computational resources. In this work, we have developed coarse-grained (CG) potentials for silica system wherein every silica tetrahedron (Si(OH) O ) having ‘z’ number of bridging oxygen is mapped to a unique CG silica bead ( ). Different composition of these CG silica beads represent a polymerised silica system with varying degrees of condensation. We performed molecular dynamics (MD) simulations of all-atom silica system to obtain reference radial distribution functions between various silica species ( ), which was used in Monte Carlo simulations of the CG system to obtain CG potentials. We iteratively refined the CG potentials using iterative Boltzmann inversion and inverse Monte Carlo techniques. We found two attractive minima in the CG potential between arising due to direct and indirect interactions of species representing first and second coordination shells. The strength of CG potential increases for both of minima with increasing number of bridging oxygen in species. We performed MD simulations of CG silica beads using developed CG potentials at two degree of condensations and found equilibrium structure of spherical silica nanoparticle. We found that and species are predominantly located at the centre of nanoparticles, whereas species are present uniformly. GRAPHICAL ABSTRACT


Introduction
Silica precursors interact with each other during polymerisation leading to their aggregation. Depending on the synthesis parameters, the aggregate transforms into nanoparticles, zeolite, mesoporous silica, amorphous material or other silica polymorphs [1][2][3][4][5]. The molecular mechanism of the aggregates still baffles the researchers, which is essential to design and control the properties of the desired end product. For example, the ZSM-5 zeolite was synthesised for the first time in 1972 [6], yet it is still an active area of research to probe the effect of various parameters on synthesis [7][8][9][10][11]. In a recent ZSM-5 synthesis study [12], it was observed that the crystallisation of ZSM-5 with significant porous structure occurs much earlier (at around 12 h) for the system containing a higher Si/ Al ratio (=100) compared to (48 h for) system with low Si/Al ratio (=20). The study claims that higher Al content in the system promotes the dissolution of silica nuclei formed, thus delaying the process of crystallisation [12]. In addition to Si/ Al ratio, several other parameters such as silica source, the concentration of structure-directing agent (SDA), pH value, water content, presence of seeds and temperature also affect the self-assembly rate and morphology of formed particles [1,2,7,9,[13][14][15][16][17][18][19][20][21][22][23]. The presence of multiple reaction pathways and transient intermediates makes silica systems experimentally challenging to probe at a molecular length scale and understand their behaviour [16,21]. Computational approaches are advantageous to resolve at molecular length and time scale. However, probing such systems with varied parameters requires accurate potential models, forcefield parameters, larger system sizes and probing at much larger time scales, which demands significant computational resources. Reducing the large degrees of freedom in these systems would facilitate simulations and analysis in reasonable time with current computing resources. One such approach is the coarse-graining technique, where an effective interaction potential between species of interest is obtained by integrating the solvent degrees of freedom, thus reducing the system sizes. In this work, we present a similar technique to obtain coarsegrained (CG) potential between various types of silica species using a combination of molecular dynamics (MD) and Monte Carlo (MC) techniques. We later use these CG potentials to study the self-assembly of silica species at various stages in silica polymerisation.
Polymerisation of silica precursors in the absence of SDA gives rise to gels (either alcogel [3,24], hydrogel [25][26][27] or aerogels [28,29] as per the synthesis protocol). In a detailed and prolonged NMR study, Devreux et al. [30] probed silica gel formation from tetraethoxysilane by extracting the concentration of B z species (i.e. silica tetrahedron, Si(OH) 4−z O z , having 'z' bridging oxygen). They found that the kinetics of gel formation is slow, and the mechanism of silica cluster formation follows the reaction-limited aggregation kinetics.
While such studies provide useful information on silica-particle synthesis, most of the mechanistic insights are indirectly concluded from the macroscopic model and properties of the resultant particles.
Computational methods can provide a molecular-level understanding of such complex systems; however, they have their own challenges, which are (i) accurate potential models, (ii) energetic of the reactions, (iii) forcefield parameters, (iv) large systems sizes and (v) longer simulation time. Earlier simulation work focused on obtaining the energetic of various reactions using quantum level calculations [31][32][33][34]; however, these simulations were limited to smaller silica clusters due to the significant computational costs. Further, these approaches could not provide the kinetics of aggregation of silica during the polymerisation process. MD techniques are capable of capturing the kinetics of polymerisation, but require accurate forcefields and longer simulation time. Further, due to the presence of explicit solvent molecules (water) in these simulations, the system contains around 10 5 atoms for a reasonable size 10 3 silica tetrahedrons system, which further increases exponentially for larger system sizes. Due to present limitations in computational resources, only a few nanoseconds can be simulated during which only oligomerisation of silica could be observed [35][36][37][38]. Hence, elevated temperatures were used to accelerate polymerisation kinetics so that significant extent of reactions could be probed during the simulations [39]. However, such an approach does not reproduce experimental findings. Another technique to sample larger phase space is by swapping of simulation systems running at different temperatures, as done in the replica-exchange parallel tempering technique. Swapping of systems facilitates overcoming of energy barriers that allow access to a large phase space [4,40]. However, the system leaps from one equilibrium structure to another; thus the evolution of the real system is lost in this process. The kinetic MC technique is used to study the evolution of periodic mesoscopic silica by performing reaction, displacement and swapping moves [41]. Such technique leads to the generation of the realistic structure of silica polymorph; however, it does not capture kinetics of the polymerisation due to no relation between simulation and real time. In our previous work [42][43][44][45], we used a simple silica model derived from normal mode analysis of silica polymorphs [46] and developed MC-based algorithms that include displacement, rotation and reaction moves. We appropriately scaled the movement (displacement and rotation) of the cluster with monomeric movement that enabled us to successfully capture the kinetics of silica polymerisation. The developed algorithm is capable of studying a large system size and for a longer polymerisation time. However, the model does not have a direct interaction between silica species in an implicit solvent; hence, the silica clusters did not form at dilute concentrations below 104 mg/cm 3 , whereas nanoparticles and zeolites are often synthesised even at much dilute concentrations in experiments [5,[47][48][49][50]. Such MC simulation needs to be run significantly longer with a larger system size to see nanoparticle formation in this concentrations domain, which incurs expensive computational costs. The primitive model used in MC simulation is improved here by developing an effective CG model for a silica system. Jorge et al. [51] studied the silica-surfactant system using CG representation of neutral silica, charged silica and surfactant molecule based on Martini forcefields, where linear and cyclic oligomers were modelled using CG beads, with parameters for the explicit solvent model obtained from detailed atomistic simulations. Their model was able to capture the self-assembly of silica-surfactant into a mesophase that occurs during the synthesis of nanoporous materials [52]. Molinero et al. [53] modelled silicon tetrahedron and SDA as CG beads and used Stillinger-Weber's (SW) potential to represent their interactions. By introducing frustration interaction between silicon and SDA, they were able to observe mesophase formation in their system. Different phases of zeolite can be observed in the simulation by varying the interaction strength between silicon and SDA beads along with the concentration of silicon beads [54]. This model was further extended to study the evolution of zeolite from mesophase and phase transitions [55,56]. However, this model lacks angle flexibility between silicon beads and the formation and breaking of bonds, which could result in more silica polymorphs.
In this work, each silica species (Si(OH) 4−z O z , z = 0 to 4) is represented by unique CG beads (B z ) with effective CG potentials obtained from all-atom (AA) simulations. In particular, we performed MD simulations of an AA system containing B i and B j species from where reference radial distribution functions (RDF) between species are extracted. MC simulations were then used to iteratively obtain and refine the interactive potentials (of corresponding B i and B j beads) through inverse Monte Carlo (IMC) and iterative Boltzmann inversion (IBI) schemes. Both, the AA and CG simulations, do not account for condensation reaction between two silica species and probe the self-assembly due to physical interactions alone. We studied the polymerised silica systems using obtained CG potentials at two different degree of condensations (DoCs) having a different distribution of B z species in the system. In the following section, we describe the details of our simulation methodology followed by results and discussion, and finally, we present a brief summary and conclusion.

All-Atom (AA) simulations
The objective of the AA simulations containing water and silica species (Si x (OH) y O z ) is to obtain the CG potential for various silica species observed in a real system. Five different types of silica species namely monomer (Si(OH) 4 ) to pentamer (Si 5 (OH) 12 O 4 ) are considered, as shown in Figure 1(a). Polymers of various sizes, shapes and isomers are observed during silica polymerisation and uniquely identifying all of them is impossible. Hence, the silica cluster is generally described by counting the number of bridging oxygen (BO) in each silica tetrahedral unit (Si(OH) 4−z O z ). In this manuscript, we use the notation B z to refer both silica tetrahedral unit (Si(OH) 4−z O z ) in AA simulations and CG bead in CG simulations. A similar strategy has been used in a previous experimental study [30] and as well as in our previous simulation study [42,44] where B z was referred by Q n notation. To illustrate, monomer (B 0 ) has no BO (z = 0), dimer has two tetrahedrons containing one BO (2B 1 ), trimer has 2B 1 and a B 2 , whereas pentamer has B 4 and 4B 1 tetrahedrons. All silica clusters formed during polymerisation are made of these B 1 to B 4 silica species, hence they were selected to obtain all the cross-interactions between them (i.e. B i vs. B j ).
The van der Waals and electrostatic interaction between two atomistic sites are represented using Lennard-Jones (LJ) and Coulomb potential as where s ij and e ij are the LJ parameters evaluated using Lorentz-Berthelot mixing rule, s ij = (s ii + s jj )/2, e ij = e ii e jj √ ; s ii , e ii and q i are the size, LJ interaction strength and charge of atom i, respectively; r ij is the distance between two atoms and 1 0 is the vacuum permittivity. The bonds and angles in the atomistic silica species (Figure 1a) are flexible, and their interactions are calculated using harmonic potentials as where K B and K A are strengths of bond and angle potential, respectively, u ijk is the angle formed between connected i, j and k atoms, r 0 and u 0 are equilibrium bond length and angle, respectively. Water molecules were modelled using the rigid SPC/E model, [57] whereas the silica species are modelled using Emami et al. [58], forcefields as given in Table 1.
We then performed 15 independent simulations varying the composition of the silica species, number of water molecules and box size such that the net concentration of silica in water is 104 mg/cm 3 . Details of these simulations are given in the supplementary information (SI). The choice of this silica concentration was motivated by our earlier simulation studies [45], where significant silica clusters were obtained above this concentration only. The MD simulations are performed using GROMACS 5.3 software [59], where the equation of motions for all atoms are integrated with a step size of 2 fs using the Leap-Frog algorithm. Periodic boundary conditions are applied in all three directions. The long-range electrostatic interactions are treated with particle mesh Ewald technique with Fourier grid spacing of 1.2 Å and accuracy of 10 −4 . The LJ and short-range part of Ewald summation are evaluated within a spherical cutoff of 12 Å. The rigid structure of water molecules is maintained using the SETTLE algorithm [60]. First, we performed short equilibration (100 ps) in an isothermal-isobaric ensemble to ensure that the density of the system is equilibrated using a Berendsen thermostat and barostat to maintain pressure and temperature at 1 bar and 300 K, respectively. Then the system is equilibrated for 1 ns in canonical (NVT) ensemble, followed by the production run in NVT ensemble for 2 ns during which trajectories are recorded for every 2 ps. We used the velocity re-scaling technique to maintain the temperature at 300 K during NVT simulations. Yellow, red, black and blue spheres represent silicon, oxygen, hydrogen and bridging oxygen (BO), respectively. (b) Coarse-grained (CG) representation of silica species. The notation B z is used to represent Si(OH) 4−z O z tetrahedron in the AA silica system and B z bead in the CG silica system. The Si-O-Si connectivity between two tetrahedron is represented by bond between two CG beads. Black, red, green, blue and yellow spheres represent B 0 -B 4 beads, respectively. Table 1. Force-field parameters for non-bonded and bonded potential used for AA simulation.

Molecule
Atom

Methodology to obtain CG potentials
As mentioned in the previous section, we have used five different silica oligomers (monomer to pentamer), which include B 0 to B 4 silica species. We used a simple mapping to represent each silica tetrahedron (Si(OH) 4−z O z ) by a CG bead of B z , and a bond between two CG beads represent the Si-O-Si bonds which occur experimentally during silica polymerisation. Figure 1 represents the AA representation and corresponding CG representation of the given silica species used in this study. For example, B 0 is a monomer represented by a single (non-bonded) sphere, whereas a pentamer has one B 4 connected with four B 1 beads. Using a different composition of CG beads, we can simulate a silica system with various DoC. In this work, no reactions are performed between silica species but intended to obtain effective interaction between different silica species, which accounts for silica-silica and silica-solvent interaction in the AA system. To generate the effective CG potential (U CG ) between various CG beads, we follow the algorithm illustrated in Figure 2.
Henderson's theorem [61] claim that a unique set of interatomic potential will generate a unique set of RDF (g(r)). Thus we first calculate pair-wise g(r) for various silica species from the AA simulations (where silicon atom is considered as the centre of the tetrahedron) and use that as reference RDF (g (r) ref ). Then an iterative technique is used to generate the U CG such that the RDF of CG beads (g(r) CG ) matches with the g (r) ref obtained from AA simulations (as described below). We use the potential of mean force between two beads as an initial guess for U CG (i.e. U CG 0 = −k B T ln g(r) ref ). MC simulations of the system containing only CG beads (in respective proportion as that of AA simulation) using the obtained U CG 0 potential is performed in the NVT ensemble. The box size and temperature of the system is similar to that of AA simulations resulting in same concentration for both the systems. We performed three MC moves: (i) displacement of the bead (individual B z ), (ii) displacement of the molecule (dimer to pentamer is a molecule) and (iii) rotation of the molecule with the probability of 0.6, 0.2, and 0.2, respectively. The maximum distance and rotation angle allowed in displacement and rotation moves, respectively, were dynamically changed such that the acceptance ratio of these MC moves is around 0.5 [62,63]. We performed 2 × 10 6 MC moves for equilibration of the system, followed by 3 × 10 6 MC moves of the production run during which g(r) CG was calculated.
The obtained g(r) CG is used to calculate the improvement factor of the CG potential, DU CG , for which two approaches, IBI and IMC techniques, have been used in this manuscript. In the IBI technique, the DU CG is evaluated from the logarithmic difference of RDFs as whereas in the IMC technique, it is evaluated from the Jacobian matrix of the potential as The calculated DU CG is used to update the CG potential iteratively as where χ is the relaxation factor used to control the evolution of U CG i and has a value between 0 and 1. The revised U CG i is used to perform MC simulations of the CG system as explained in the previous paragraph to obtain DU CG and improve U CG i iteratively. The IMC is a better technique with a faster convergence rate; however, it requires a better initial guess and Jacobian matrix (calculation of which adds extra computational expense). Thus we used a combination of both techniques first performing IBI iterations to provide a better estimate of U CG i later implementing the IMC technique. We performed a maximum of 45 iterations (25+20 iterations of IBI and IMC techniques), and if the U CG i converges within these iterations, then the iteration is stopped. We have used the Magic 2.1 open-source package [64] to perform the MC simulations of CG systems to obtain (DU CG ) and perform iterative updating of U CG i . We first optimised nonbonded potentials, and then a similar strategy was used to optimise bonded potential between beads.

Property calculations
The RDF between two particles (A and B) is calculated as where d(x) is the Dirac delta function evaluated numerically as the combination of the Heavyside step function defined as where Dx is the bin thickness, V is the volume of the simulation box, and N A , N B are the total number of particles A and B, respectively. The distribution of bond length (d Si ) and angle (u Si ) between two and three connected silicon atoms is respectively calculated as where N b , N a are the number of Si-Si bonds and Si-Si-Si angles in the system, respectively. We also calculated the radius of gyration (R g ) of the silica cluster formed in the simulation as where r i and r com are the coordinates of particle i and centre of mass (CoM) of the cluster.

Interaction between B i − B j species from AA simulations
The RDFs between various silica species (B 0 to B 4 ) could be obtained from one single AA simulation containing all species. However, such RDFs between a pair of species would also have an effect due to the presence of other species in the system. Hence, we carefully designed simulations such that species of interest are dominant in the system. For example, S0 and S4 systems (see Table S1 of SI) contain only monomers (B 0 ) and pentamer (B 4 and connected B 1 ) species, which were used to calculate B 0 − B 0 and B 4 − B 4 RDFs, respectively. Similarly, S04 and S34 systems contain monomers + pentamers and tetramer + pentamers from which B 0 − B 4 and B 3 − B 4 RDFs were calculated, respectively. These additional simulation systems add a significant computational expense; however, they allow for detailed analysis on understanding the effect of the third component on two-body interactions. Figure 3 shows the RDFs obtained between various silica species (B i − B j , i, j = 0 − 4), where each RDF was evaluated from a single simulation system. The RDF between monomer species (B 0 − B 0 ) exhibits a single peak at r = 5 Å and flattens radially outwards (Figure 3 a). The first peak corresponds to the position where two silicon monomers interact through hydrogen bonds between hydroxyl (OH) groups of the monomers, as highlighted in Figure 4(a). Feuston and Garofalini [35] also observed a similar low energy configuration between two silica monomers in their MD studies. Due to the presence of OH groups, the location of the first peak between silicon atoms is at a farther distance of 5 Å. To obtain the g(r) ref between B 0 and other species B z (z = 1, 4), we considered a binary mixture of S0z systems. The RDFs obtained from these simulations are shown in Figure 3(a), where two prominent peaks in RDFs are observed located approximately at r = 5 and 7.7 Å. We found that with an increase in the number of BOs (z) in the B z species, the intensity of the first peak in AA g(r) between B 0 and B z decreases, while the intensity of the second peak consistently increases (see inset of Figure 3 a). Further, the location of both the peaks remains almost constant with an increase in BO in B z species. The reason for the observed change in peak intensity in g(r) is shown in the analysis of atomic configurations (Figure 4). With an increase in the number of BOs (z), the central tetrahedron is surrounded by more B 1 species, which creates a steric hindrance for direct interactions (i.e. in the first neighbouring shell), whereas the interaction through intermediate tetrahedron (in-direct interaction) increases. For example, when B 0 species interacts with B 1 species, it can do so from all directions except one where BO is present whereas in the case of B 0 interacting with B 3 , the interaction is limited from the side where hydroxyl group is present. Conversely, the indirect interactions are more prone to establish between B 3 -B 0 and less between B 1 -B 0 species, resulting in higher and lesser intensity in the second peak in RDFs, respectively.
We have also obtained the RDF between B 1 and other species (B z , z = 1-4) from a system containing a mixture of species, as shown in Figure 3(b). The simulated system contains dimer and onwards, which have bonded tetrahedrons, so we have calculated only RDF between non-bonded species. For example, when RDF between B 1 species was calculated in the S1 system, the intramolecular bonded B 1 − B 1 pairs (in Figure 4b) were ignored, and only non-bonded intermolecular B 1 − B 1 pairs were considered. We found that due to tetrahedrons connected with each other, the shielding is significant; as a result, two peaks in RDFs at locations similar to B 0 − B z are observed for B 1 − B z as well. However, we observe that the location of the first peak slightly shifts towards larger distances with an increase in number of BOs in B z species. The qualitative behaviour of decrease and increase in the intensity of first and second peaks respectively was observed in B 1 − B z RDFs as well. The effect of this steric hindrance due to surrounding B 1 species is significant in the case of B 2 − B z and B 3 − B z RDFs, where the peak intensity decreases drastically and in the case of B 3 − B 4 it disappears at around 5 Å. This suggests that the direct interaction is limited due to the shielding of central species by surrounding B 1 species. For the case of B 4 − B 4 RDFs, we do not observe the first peak in RDF at the usual place of r = 5 Å, as observed in other cases, and only peaks at higher distances of r = 8 and 10.49 Å are observed due to in-direct interactions, as shown in Figure 4(e).

Non-bonded potential for CG silica beads
The refined CG potentials (U CG ) obtained between B i − B j beads using the iterative procedure described in Figure 2 are shown in Figure 5. The evolution of interaction potential and RDF between CG beads with respect to iteration for the sample B 0 system is shown in Figure S1 of SI. The interaction potential between two B 0 beads exhibits a minimum at around r = 4.98 Å and a very shallow second minimum at r = 7.7 Å (Figure 5 a). The interaction between B 0 and B z beads exhibits qualitatively similar behaviour, except that the strength of interaction potential at both minima locations increases with an increase in BO in the B z species. The observed trend of the strength of interaction potential is in contrast with the trend of the peak intensity of B 0 − B z RDFs, especially for the first peak (Figure 3 a). While the exact reason is unknown, we believe it could be due to intramolecular interactions in the dimer and other molecules. As B z species are shielded by peripheral B 1 species, interaction of B 0 species with central B z species will lead to possible overlap between B 0 and B 1 species. However, to avoid such overlap, changes in intramolecular configuration of B z + B 1 cluster may occur to provide direct interaction between B 0 and B z species. When we analysed snapshot of the B 0 − B 4 system (Figure 6), we indeed found that monomer (B 0 species) is present closer to central B 4 species of pentamer, where two adjoining B 1 species are separated, making a higher B 1 − B 4 − B 1 angle. This distortion of angle in the pentamer molecule leads to energy penalty in the configurational energy, which could be overall compensated by stronger attractive B 0 − B 4 interaction potential (i.e. first minimum). Figure 5(b) shows the interaction potential obtained between B 1 and other B z species of the system, consisting of two prominent minima with the strength of interaction potentials at both minima increasing with the number of BOs in B z species. The overall behaviour of the B 1 − B z interaction potential is similar to that of B 0 − B z potentials. The B 0 species are monomers (freely movable), whereas B 1 species, though bounded are present on the peripheral surface of the cluster, thus enjoys more degrees of freedom compared to B 2 , B 3 and B 4 species. Hence, the B 1 species directly interacts with the other B z species in a manner similar to B 0 species, leading to similar interaction potentials. The interaction potentials of B 2 and B 3 species with other B z species (Figure 5 c,d) also exhibit two minima, however, with an increase in the number of BOs of B z species, the attractive strength of the first minimum is unaffected whereas the second minimum increases. In comparison with B 0 and B 1 species, these B 2 and B 3 species are shielded by B 1 species, and their interaction with other shielded B z species (z ≥ 2) is thus limited. This leads to similar strength of interaction potential at around 1 kJ mol −1 for the first minima. The B 4 species is completely shielded from all side, and similar to the B 4 − B 4 RDF, the interaction potential also exhibits no minimum at around r = 5 Å and the only minimum is observed at around r = 8 Å.
As mentioned in the methodology section, we used each system to obtain unique B i − B j interaction potential. The S0 and S1 systems contain only B 0 and B 1 species from where B 0 − B 0 and B 1 − B 1 interaction potentials were calculated, respectively. In systems containing mixtures of species, all pair interactions could be evaluated. For example, the S04 system has B 0 , B 1 and B 4 species from where all self and crossinteraction potentials could be calculated, but our main focus was B 0 − B 4 interactions. As per the IBI and IMC techniques, we obtained other cross-interaction potentials and found no significant difference between them. Figure 7 shows the B 1 − B 1 RDFs and corresponding interaction potentials obtained from various systems. We chose B 1 as an example as it is present in all systems studied except the S0 system. No significant differences are present in the RDFs and interaction potentials obtained (Figure 7). However, when Yellow, red, black, and blue spheres represent silicon, oxygen, hydrogen and BO, respectively. Water molecules are not shown for clarity. As the number of BOs in B z species increases, the direct interaction (within r ≤ 5 Å) with non-bonded neighbouring species decreases, and in-direct interactions (for r>5 Å) occurring through mediating B 1 species increase. simulating a large system containing all species to obtain all fifteen interaction potentials, the algorithm in Figure 2 could not lead to converged CG potentials within significant iterations.

Bonded potential for CG silica beads
The bonded potential between two and three silica beads was obtained following a similar procedure as described for obtaining non-bonded CG potential. The Si-O-Si bond is represented as the bond connecting two silica beads, whereas the interaction between two silica (within a cluster) separated by two Si-O-Si bonds was described by the Si-Si-Si angle (u Si ). The bonded dihedral interactions in a polymer (1-2-3-4) is a four-body interaction between 1 and 4 CG beads separated by 3 bonds. In AA simulations, the interaction between corresponding silicon atoms is modelled as non-bonded interactions, hence it is captured by non-bonded interactions in the CG system as well. We, therefore, omit bonded dihedral interaction in our CG model. From the AA simulations appropriate bond and angle distributions, P(d Si ) and P(u Si ) in various systems are calculated (examples are shown in Figure 8). We observe that P(d Si ) exhibits a single peak at around d Si = 3.31 Å for dimers (B 1 − B 1 connected species), and the peak location shifts to smaller values for an increase in BO in B z species connected to B 1 . With more BO, the central B z species is pulled from all directions; thus the bond length decreases slightly. In the case of Si-Si-Si angles, we found that the geometry of molecules considered plays a significant  role. For example, a trimer (B 1 − B 2 − B 1 ) system having only one u Si per molecule exhibits a uni-model distribution of P(u Si ) with a maximum peak located at u Si = 80 degrees. However, the pentamer molecule (Figure 8 b, inset) has six u Si (between beads 152, 153, 154, 253, 254, 354 beads) where the central B 4 bead (bead 5) is pulled in all directions. As a result, a much broader distribution in P(u Si ) is observed with three peaks (Figure 8 b,d).
To obtain the bonded CG potential, we considered the initial guess as U CG,b 0 = −k B T ln (P(d Si ) * d Si ) for bond potential (where d Si = 3.3 is the average bond length) and U CG,a 0 = −k B T ln P(u Si ) for the angle potential. We performed potential refinement iterations, and the converged potentials for various systems are shown in Figure 8(c,d). Based on the nature of CG bond potential, we fitted harmonic potential 2 and found the K b parameter to vary between 783 and 843 kJ mol −1 Å −2 with an average value of K b = 813.5 kJ mol −1 Å −2 . Since the nature of angle potential did not conform to any regular mathematical function, thus we did not fit these potentials and rather used them as tabulated potentials in CG simulations.

CG simulations of large silica system
Since the developed CG potential does not capture reaction between two silica species, we cannot demonstrate the usefulness of this model in a non-polymerised silica system. Thus we chose already polymerised silica systems from our earlier MC simulations in which polymerisation reaction between silica species were incorporated as per the reaction ensemble approach [44,45]. We considered two system at extent of reaction (also referred as DoC) = 0.62 and 0.73 where the system makes one large connected cluster of 1000 silica system, though having different distribution of B z species (given in Table 2). Using the above developed CG potentials, we have simulated CG silica systems at these DoCs. To perform CG simulations, an initial configuration was generated by randomly inserting B z beads within the box, and the connection between various beads was established randomly such that all beads satisfy their bonds as per their type (i.e. B z bead has z bonds). We performed two independent simulations of the CG silica system at each DoC by generating two random initial configurations. All MD simulations were performed using GROMACS software (version 5.3 [59]), where the CG potential between various beads was provided as tabulated potential. The simulations were performed in an NVT ensemble where the temperature was maintained at 300 K by velocity re-scaling, and the Leapfrog algorithm was used for time integration. We performed a total of 6 × 10 6 MD steps, out of which the initial 10 6 number of steps were considered as equilibration, and the remaining steps were considered as production runs. During these MD simulations, no condensation reactions are performed; hence the B z distribution does not change with respect to time.
After performing the MD simulation, the systems form a large spherical cluster (connected silica beads) as shown in Figure 9. To analyse the structure of the CG silica system, we calculated R g and the radial density distribution of B z beads from the CoM of the cluster. The silica beads form a spherical cluster having R g of around 14.7 Å and 12.6 Å for systems with DoC = 0.62 and 0.73, respectively. The current observation is consistent with B z distribution, where the system with a higher number of B 4 beads having DoC = 0.73 has more silica-silica bonds resulting in a smaller-sized cluster. The density distributions of beads at DOC = 0.62 system (Figure 9a) is as follows: (i) B 1 are distributed uniformly throughout the particle, (ii) B 2 beads are present at all distances from the CoM of the cluster, predominantly at the periphery and lesser at the centre of the cluster, (iii) B 3 beads reside more in the central region of the cluster with their density continuously decreasing with an increasing radius of the cluster and (iv) B 4 beads mostly residing closer to the centre of the cluster. The density distribution of beads in replica systems ( Figure 9b) is similar, suggesting that equilibrium configurations are independent of the initial configuration. The silica system with a higher DoC of 0.73 (Figure 9 c,d) shows that the density of B 4 beads is higher in the central region due to higher concentration, while the distribution of rest of the beads is qualitatively similar to silica system at shifted for clarity and inset shows the curves without shift. The qualitative similarity of RDFs and U CG in different systems confirm that the strategy used to obtain CG potentials are robust and interaction between the species/beads are not affected by the presence of other silica species/beads. DoC = 0.62. We found that B 2 beads are present predominantly in the outer periphery of the cluster with smaller fractions of B 3 beads as well.

Summary
To better control the mechanism of polymerisation in sol-gel synthesis and properties of the resultant end product, a detailed molecular-level understanding is warranted. Molecular simulations using current forcefields are computationally intensive due to the requirement of larger system sizes and longer simulation times. We have developed a CG potential between various silica species formed during silica polymerisation by integrating the solvent degrees of freedom (implicit solvent model) to reduce the computational load. We performed MD simulations of AA simulations from where reference RDFs were calculated and performed MC simulations of a mapped CG system to obtain refined CG potentials using IMC and IBI techniques iteratively. First non-bonded potentials were obtained, followed by bonded potentials between various CG silica species.
The mapped CG silica beads (B z ) were categorised based on the number of BOs present in the silica tetrahedron (Si(OH)-4−z O z , z = 0 to 4). We found a single attractive minimum in the CG potentials between two B 0 (monomer) species, whereas two minima were observed for the rest of the cases. The first minimum is due to direct hydrogen bonded interaction between two B 0 species in AA configurations, whereas the second minimum is due to indirect interaction of the second coordination shell. We found that the attractive strength of CG potential between B 0 and B z increases at both minima locations with an increase in the number of BOs in B z species. Qualitatively similar behaviour of CG potential is observed between B i and B j species (for i, j ≥ 1 − 3). The CG potential between two B 4 species does not have a first minimum due to the steric hindrance of surrounding B 1 species. The bond stretching potential between two bonded silica species exhibits harmonic behaviour and it is independent of B z species involved. However, the angle potential exhibits multiple peaks depending upon the number of BOs present in the central silica species. We performed MD simulations of the CG    a and b (similarly c and d) show two different realisations for the silica system at DoC = 0.62 (and 0.73) studied by generating two different initial configurations. (a-d) Radial density distribution of CG beads (B i 's) and corresponding (e-h) snapshots of the CG silica system after 5 × 10 6 MD steps. Red, green, blue and yellow spheres in snapshots represent B 1 to B 4 CG beads, respectively. silica system using the developed CG potentials for systems at two different DoCs having different composition of B z beads. The equilibrium structure of the cluster represents a spherical silica nanoparticle. The distribution of B z species in these nanoparticles shows that B 3 and B 4 species are predominantly located at the centre of nanoparticles, whereas B 2 species are present uniformly. The developed potential is useful to study the self-assembly of the different species to obtain the equilibrium structure of silica polymorphs at different DoCs. The CG simulation could be repeated with external reaction events to mimic the polymerising system. We plan to couple reaction and their kinetics in future models.