Energy-Based Molecular Orbital Localization in a Specific Spatial Region

We present a novel energy-based localization procedure able to localize molecular orbitals into specific spatial regions. The method is applied to several cases including both conjugated and non-conjugated systems. The obtained localized molecular orbitals are used in a multiscale framework based on the multilevel Hartree-Fock approach. An almost perfect agreement with reference values is achieved for both ground state properties, such as dipole moments, and local excitation energies calculated at the coupled cluster level. The proposed approach is useful to extend the application range of high level electron correlation methods. In fact, the reduced number of molecular orbitals can lead to a large reduction in the computational cost of correlated calculations.


TOC graphics 1 Introduction
Many processes in chemistry take place in a specific spatial region of a molecular system.
To rationalize local phenomena, the concept of local occupied molecular orbitals (LMOs) is particularly useful in bridging chemical intuition and theoretical chemistry. 1 The LMOs are very convenient in describing electron correlation, as they can potentially reduce the computational cost of many-body methods. 2 Among the large variety of different localization procedures developed in the past, 3-14 only a few are able to localize MOs into a specific spatial region of a molecular system. 1,6,[13][14][15] However, such a localization procedure is important when dealing with phenomena taking place in a limited spatial location, for instance local electronic excitations. 15,16 This allows a fragmentation of the target moiety in (at least) two different parts: the active, where the phenomenon takes place, and the inactive, that indirectly influences the active part. Such a partitioning defines the so-called focused models. 17 Most focused models are formulated in terms of quantum mechanical (QM)/classical approaches in which the active-inactive interaction is usually limited to electrostatics. [18][19][20][21] In addition, several fragmentation approaches have been proposed in the last years, demonstrating their capability to treat large molecular systems. [22][23][24][25][26] However, all the developed methods are based on approximations, for instance, how fragments are defined, where covalent bonds are cut, how the vacant cap of each fragment is fixed. Also, they are commonly limited to the ground state energy, although some of them have been extended to treat excitation energies, [27][28][29] and the interaction between the monomers is usually treated at the electrostatic level. 23,30 In this paper, we are proposing a novel and rigorous method which can provide localized occupied MOs in a specific spatial region of the system. This is achieved by formulating a localization criterion entirely based on energetics. Therefore, our approach differs conceptually from common MO localization procedures, 1,3,4,31 and from projection-based approaches. 28,[32][33][34] To the best of our knowledge, this is the first time that an energy-based variational method is used to localize MOs in a specific region of the molecule. This also ensures the continuity of the potential energy surface (PES), which is not guaranteed for common localization procedure such as Boys.
The method is based on multilevel Hartree-Fock (MLHF) theory, 35,36 which is a rigorous method to partition a molecular system into two different fragments A (active) and B (inactive). Such a partitioning is performed by selecting the number of electrons belonging to the active part, and consequently the number of occupied orbitals is set. As previously reported in Ref. 35, MLHF method differs from most projection-based approaches, either developed in Density Functional Theory (DFT) or in Hartree-Fock (HF) frameworks. The starting point of projection-based approaches is commonly a Self Consistent Field (SCF) calculation on the entire system, and the optimized MOs are then assigned to active and inactive parts by using an priori orbital assignment. 28,32,33,[37][38][39][40][41][42][43][44] The quantum embedding Hamiltonian is then constructed by including an exact or approximated embedding operator. [39][40][41][42][43][44] In particular, we want to highlight that the selection of the active MOs is usually performed by using a predefined threshold metric, which however may cause the a wrong MO selection, as reported by Kallay and co-workers. 38 In MLHF, no approximations are introduced neither in the bond fragmentation or in the interaction energy between the active and inactive parts. In this paper, we show that our novel energy-based localization procedure can provided MOs that are well localized on the specified active fragment, without carring out an a priori orbital assignment. We demonstrate that our approach can provide not only accurate ground state properties, but also accurate local excitation energies (calculated at the coupled cluster (CC) level). Within this scheme, coupled cluster ground and excited state calculations are performed using the MOs of the active fragment only, thus the intrinsic computational cost of high level calculations is reduced, similarly to other multilevel methods. 45,46 For the same reason, the accuracy of the computed local excitations depends crucially on the quality of the LMOs.
The manuscript is organized as follows. In the next section, the MLHF theory is briefly summarized and the energy-based orbital localization is discussed. Then, the computational procedure and the numerical applications are presented, with particular emphasis on the spread of the obtained localized MOs and on the accuracy of the novel approach in predicting local properties such as dipole moments and excitation energies. Summary and conclusions of the present work end the manuscript.

Theory
The active-inactive partitioning in MLHF is realized by decomposing the density of the whole system into active and inactive densities (D = D A + D B ). The total HF energy can be written as: where h and G are the usual one and two-electron matrices, and h nuc is the nuclear repulsion.
The G(D) X with X = {A, B} matrix is defined as: The main idea of MLHF is to optimize the density of fragment A in the field generated by the density B, which is kept fixed. This procedure is performed by minimizing the energy (see Eq. 1) in the MO basis of the active part, reducing the dimensionality of the problem.
In MLHF, the Fock matrix in AO basis is expressed by differentiating Eq. 1 with respect to D A : In Eq. 3, the last term G µν (D B ) is a one-electron contribution, because the D B density is kept frozen during the SCF procedure.
Equation 1 is formally equal to the full HF energy when D is the converged SCF density for the entire system. However, Eq. 1 does not have an apparent physical interpretation because the different energy terms are not assigned to the individual fragments. Such a physical insight can be achieved by dividing the one-electron term into the kinetic (T) and the electron-nuclear attraction operators for the two parts (V A and V B ). Thus, Eq. 1 can be written as: where h A nuc , h B nuc and h AB nuc are nuclear repulsion terms; E A and E B are the energies of the two fragments, whereas E AB is the interaction energy. The latter term is composed of the electron-nuclear attraction between A and B and viceversa, and the coulomb and exchange interactions between the two fragments.
Although Eq. 4 is equivalent to Eq. 1, it permits the definition of our localization procedure in a fragment-based model such as MLHF. In fact, an additional SCF procedure can be performed to optimize the energy of part A and/or B, in the occupied space of both fragments, i.e. without changing the total energy. The procedure is general and can be performed on the basis of any density matrix D that is decomposed into two densities belonging to two fragments.
Two alternatives can be defined. First, the energy term E A (see Eq. 4) can be minimized (denoted MLHF-A). In such a case, the Fock matrix reads: In the second approach (called MLHF-AB), the sum of E A and E B is minimized.
where the last two terms depend on the total density only, and are therefore constant energy terms. The first three terms are instead similar to the MLHF energy contributions (see Eq. 1), because they are characterized by one-electron and two-electron contributions involving the active density only. The third term Tr D A G(D) is instead the two-electron interaction between the active and the constant total density D. The Fock matrix of the active part can be written as: which is again in the same form as Eq. 3, because it is characterized by one-electron con- , a two-electron contribution on the active density 2G µν (D A ), and a constant contribution due to the total density G µν (D).
Notice that minimizing the sum of A and B parts in MLHF-AB (see Eq. 6) is equivalent to maximizing the interaction energy E AB . Physically, this means that the repulsion between the two parts is maximized, and the occupied orbitals obtained by this scheme are those maximally located in the two fragments.

Computational Procedure
The two approaches are implemented in a development version of the electronic structure program e T , 47 and follow the computation protocol graphically depicted in Fig. 1: 1. Construction of the initial density by means of superposition of atomic densities (SAD), followed by a diagonalization of the initial Fock matrix.
2. Partitioning of the resulting density into A and B densities, using Cholesky decomposition for the active occupied orbitals and projected atomic orbitals (PAOs) for active virtual orbitals. 10,[48][49][50][51] We note that the Cholesky decomposition of the total density D into D A and D B is a mathematical method to decompose a matrix, which is unique if the same pivots are used. In this work, the Cholesky decomposition is performed by selecting the diagonals corresponding to the basis functions which are centered on the active atoms. 35,49 In particular, D A is written in the AO basis {α, β} as: 52 where I and J are the diagonal elements which are decomposed,D is the submatrix of

Numerical Applications
The capabilities of MLHF-A/AB are illustrated for four different molecular moieties, that have previously been studied theoretically and experimentally. 53 The orbital second central moment (orbital variance) is used to quantitatively characterize orbital locality. The second central moment µ p 2 of an MO ϕ p is defined as: 31 The orbital spread σ p is defined as the square root of µ p 2 . We also defined ξ as the average value of σ p , i.e. ξ is a measure of the mean locality of the considered set of MOs. 1 In this paper, MLHF-A/AB MOs are compared with canonical MLHF ones (named Cholesky because they are obtained through a Cholesky decomposition of the initial density matrix), that are also localized with the Boys procedure (Cholesky-Boys). 31 Notice that in Boys localization, the sum over p of µ p 2 in Eq. 9 is minimized, 31 and the obtained MOs can therefore be used as reference for both MLHF-A and MLHF-AB approaches.

MLHF-A/AB Localized MOs
The most delocalized MOs of ANS and graphene are depicted in Fig. 3, and the value of ξ and the maximum σ p are also reported (see Sec. S1.1 and S1.2 in Supporting Information -   between the two parts, i.e. to maximizing the repulsion between them. As a consequence, the active occupied orbitals calculated by MLHF-AB are more localized on the active part. The same does not apply to MLHF-A where the active energy is minimized in the occupiedoccupied space. Thus no constraints are imposed neither on the inactive energy or on the interaction energy. However, notice that a few MOs have tails in both methods (see Sec. S1.1 and S1.2 in SI), since MLHF-A/AB orbitals are orthogonal. 1 The tails can be reduced by further localizing MLHF-A/AB orbitals using standard localization procedures. Notice also that the MLHF-AB ξ and max{σ p } for ANS computed by using cc-pVTZ and aug-cc-pVDZ give very similar results, thus showing the consistency of our approach when diffuse functions are included (see Table S1 in the SI). The observations for ANS and graphene also apply to nicotine and PCP, whose MOs and corresponding spreads are reported in Sec. S1.3 and S1.4 in SI. For the latter systems, the ξ for Cholesky-Boys are lower than the corresponding MLHF-A/AB counterparts, but the MOs also spread in the inactive region. To illustrate the robustness of our approach, a different definition of active/inactive parts of ANS is also investigated (see Sec. S1.1.2 in the SI). The calculated results confirm the findings here discussed.
In Fig. 4, we report the local MOs belonging to the active fragment of graphene as calculated by using the MLHF-AB method. All the MLHF-AB local MOs are well-confined in the active part, and the symmetry of each orbital is evident. As a final comment, it is worth noticing that MLHF-AB orbitals may be further confined by using common localization procedures, such as Boys. The resulting local MOs will be more localized on the active atoms, than those obtained by performing an hypothetical localization on the MOs resulting from SCF procedure of the entire system.

Ground State Dipole Moments
The MLHF-A and MLHF-AB methods are also applied to calculate ground state properties.
We study the dipole moments of the active and inactive regions, together with the total dipole moments predicted by Cholesky, MLHF-A and MLHF-AB. We are not reporting the results using Cholesky-Boys because a rotation among the active occupied orbitals does not change the active density and the density-related properties, such as the dipole moment. The numerical values of active, inactive and total dipole moments for both nicotine and ANS are reported in Fig. 5, together a graphical representation of active (blue) and inactive (red) densities. Full HF densities and ground state dipole moments for both molecules are also given and used as reference. Dipole moments of both graphene and PCP structures are not reported because they are zero due to symmetry.
The MLHF (Cholesky) predicts large active/inactive dipole moments for both systems; for nicotine, they are almost 80 D, whereas for ANS almost 230 D. Such large dipole moments can be explained by investigating the spatial extension of active and inactive densities, which in both molecules are overlapping. This is due to the fact that the initial Cholesky decomposition defines an inactive density that overlaps with the active part and viceversa.
Such issues are solved by MLHF-A/AB. Although both methods start from the same densities as those obtained by MLHF (Cholesky), the occupied-occupied rotations make the active and inactive densities more confined in their specific spatial regions, with a partial overlap limited to the bonding regions. As a consequence, the calculated dipole moments are very similar, in particular for ANS.
In both molecules, the MLHF-AB dipole moments are much lower than the corresponding ones for MLHF-A (see Fig. 5). This is due to the maximization of the active-inactive repulsion in MLHF-AB. Thus, a further confinement of the two densities in their specific spatial regions takes place.This can be appreciated by inspecting the bonding regions in both nicotine and ANS (Fig. 5), showing that the overlap between active and inactive densities is lower in MLHF-AB than in MLHF-A. Notice also that the bonding electrons of nicotine are assigned to the active part. Therefore, the active (blue) density in Fig. 5 defines the bond, whereas for ANS the opposite applies. In case of both nicotine and ANS, the total dipole moments are in very good agreement with the full HF reference value, with the largest discrepancy given by MLHF-AB for nicotine (error = 37%). The numerical value of the total dipole moments can be improved by using a different initial guess density, as for instance superposition of molecular densities (obtained by means of molecular fractionation with conjugate caps 63 ).

MLHF-AB vs. projection-based approaches
In this section, the MLHF-AB model is compared to HF projection-based approach (called projected-HF). In particular, we compute ground state energies and dipole moments of the active fragment of nicotine and ANS molecules as a function of the elongation of the activeinactive covalent bond, which is cut by the partitioning into two fragments. The projected- we calculate the percentage (p A i ) of the i-th MO in the active part A as: where, the C iµ is the MO coefficient of the i-th MO in AO basis {µ}. The n o active MOs in projected-HF calculations are then selected as those having the highest percentage in the active atoms. It is worth pointing out that the active MOs in projected-HF models can also be selected as those having a percentage ≥ 50%, instead of fixing the number of active MOs to n o . However, when applied to PES studies, such a choice leads to unavoidable PES discontinuities because a different number of active MOs may be selected depending on the active-inactive distance. Also, different methods to calculate the MO percentage in A can be arbitrarily chosen, thus the results are not unique. For these reasons, we prefer to keep the number of active MOs fixed to n o . We notice that such an arbitrariness is almost absent in MLHF-AB calculations, which only depend on the active-inactive partitioning of the electrons in the studied system.
In projected-HF method, once the active MO coefficients are selected, the active density is constructed (D A µν = ij C µi C jν ), and the active energy is calculated as the E A term in Eq.
4.  Figure 6: MLHF-AB and projected-HF active ground state energy E A (left) and dipole moment of the active part (right) of nicotine as calculated by using aug-cc-pVDZ (a) or 6-31+G* (b) basis sets.
In Fig. 6, MLHF-AB and projected-HF methods are applied to the calculation of ground state energies and dipoles of the active fragment of nicotine as a function of the elongation of the single covalent bond connecting the active and inactive parts. In particular, two different basis sets, aug-cc-pVDZ (panel a) and 6-31+G* (panel b), are used. The equilibrium distance is 1.51Å, and the covalent is bond is varied from 1.00 to 2.1Å. In both approaches, the number of active occupied MOs n o is fixed to 21, because, as stated above, the bonding electrons are assigned to the active fragment.
The results reported in Fig. 6a, clearly show that by using the aug-cc-pVDZ basis set both MLHF-AB and projected-HF do not display any PES discontinuities (left panel). Also the energy difference between the two approaches rapidly decreases as the active-inactive distance is elongated. At the equilibrium geometry the MLHF-AB-projected-HF energy difference is of about 0.1 Hartree, with the MLHF-AB energy that is lower than projected-HF one at all the considered distances. This is not surprising and results from the minimization procedure in MLHF-AB (see Eq. 6). The dipole moment of the active part is reported in the right panel of Fig. 6a. Also in this case, the curves obtained by using both approaches do not display any discontinuities, and a difference of about 0.3 Debye is reported at the equilibrium geometry.
A different picture arises when the 6-31+G* basis set is used (Fig. 6b). In this case, the projected-HF PES clearly displays a large discontinuity at small active-inactive distances, both in the ground state energy (left) and in the dipole moment of the active part (right).
Such a discontinuity reflects a discontinuity in the Boys space, which is common in MO localization procedures as it has been reported in different contexts. 64 The discontinuity is completely absent in MLHF-AB. However, in the proximity of the equilibrium geometry, both approaches do not display any discontinuities. At the equilibrium geometry, the MLHF-AB energy is lower than the projected-HF by about 0.1 Hartree, and the MLHF-AB-projected-HF active dipole moment difference is about 0.05 Debye. The present analysis shows that the MLHF-AB PES is always continuous, whereas the projected-HF PES can display some discontinuities depending on the selected basis set. Notice that the results discussed for nicotine also apply to the case of ANS molecule (see Fig. S27 given as SI). For both nicotine and ANS, the average spread (ξ) of the LMOs used in projected-HF is smaller than in MLHF-AB (1.71 a.u. vs. 2.11 a.u. (nicotine) and 1.70 a.u. vs. 2.68 a.u. (ANS)). This is expected as the Boys localization minimizes the orbital spread ξ, and therefore provides the LMOs with the lowest ξ. Also, the MLHF-AB is not intended to provide the most localized MOs overall, but the most localized MOs in a specific spatial region. In passing, we note that MLHF-AB PES can display discontinuities depending on the initial Cholesky decomposition. However, this can be avoided by selecting the same pivots during the Cholesky decomposition of the initial density.

Coupled Cluster absorption energies
As a final application of MLHF-A and MLHF-AB, we select two local transitions, i.e. occurring in the selected active parts, exhibited by nicotine and PCP (for which we investigate a though-space charge transfer excitation, 58 see Fig. 7).
Local excitations are a perfect test case for demonstrating the capabilities of both approaches proposed here. In fact, the quality of the localized orbitals is crucial for obtaining a reliable excitation energy. In this work, the excitation energies are computed using CC2 65

Summary and Conclusions
To summarize, we have presented a novel energy-based criterion to localize MOs in specific spatial regions of a molecular system. In particular, this approach is based on a MLHF partitioning of the system. Differently from other fragmentation methods, it is entirely based on HF theory, thus no approximations are introduced in the interaction energy. The prospects of our approach are demonstrated for four selected systems, characterized by both conjugated and non-conjugated skeletons. In particular, we have shown that MLHF-AB approach provides continuous PES, thus solving the discontinuity issues that can arise by exploiting common localization procedures in projection-based approaches.
The accuracy of our approach is then shown for ground state properties (dipole moments) and excitation energies calculated at the full CC2 level. The computational cost is reduced due to the partitioning of the system in active and inactive fragments. Both MLHF-A and MLHF-AB are able to reduce the discrepancy between MLHF and reference full coupled cluster values, in this way demonstrating their reliability in describing local excitations.
Notice that in the present study the procedure is applied to relatively small molecules in order to allow a direct comparison with full coupled cluster results. However, the model has the potential to be applied to very large systems. A detailed benchmark of the performances of MLHF-A/AB on excitation energies will be the topic of future communications.
To conclude, the MO localization provided by our approach can have different applications, ranging from those illustrated in this work (i.e. local ground state properties 66,67 and local excitations 16,68,69 ) to the accurate calculation of interaction and reaction energies of molecular systems in large biological matrices or adsorbed on nanomaterials. 6,70,71 In addition, the local MOs obtained through our procedure may be used to define the different fragment densities in fragmentation approaches, 22 and different boundaries in the cap regions for QM/MM approaches when covalent bonds are cut. 72

Supporting Information
Data related to Figs. 2-5 and 7.