PsiAPI Tutorial: Using Psi4 as a Python Module

transcribed by D. A. Sirianni

Note: Psithon and PsiAPI refer to two modes of interacting with Psi4. In Psithon mode, you write an input file in our domain-specific language (not quite Python) where commands don’t have psi4. in front, then submit it to the executable psi4 which processes the Psithon into pure Python and runs it internally. In PsiAPI mode, you write a pure Python script with import psi4 at the top, and commands are behind the psi4. namespace, then submit it to the python interpreter. Both modes are equally powerful. This tutorial covers the PsiAPI mode.

Note: PsiAPI mode has been available since late 2016 and the v1.1 release. While we aim to provide deprecation warnings and upgrade helpers, the API should not be considered completely stable. Most importantly, as we someday deprecate the last of the global variables, options will be added to the method calls (e.g., energy('scf', molecule=mol, options=opt))

Note: Consult How to run Psi4 as Python module after compilation or How to run Psi4 as a Python module from conda installation for assistance in setting up Psi4.

Here, we will explore the basics of using Psi4 in the interactive PsiAPI style where it is loaded directly as a Python module by reproducing the section A Psi4 Tutorial from the Psi4 manual in an interactive Jupyter Notebook.

Note: If the newest version of Psi4 (v.1.1a2dev42 or newer) is in your path, feel free to execute each cell as you read along by pressing Shift+Enter when the cell is selected.

I. Basic Input Structure

Psi4 is now a Python module; so, we need to import it into our Python environment:

[1]:
# Ignore this block -- it's for the documentation build
try:
    import os, sys
    sys.path.insert(1, os.path.abspath('/psi/gits/hrw-demo/objdir_docs311cf_make/stage//usr/local/psi4/lib/'))
except ImportError:
    pass

# This is the important part
import psi4

Psi4 is now able to be controlled directly from Python. By default, Psi4 will print any output to the screen; this can be changed by giving a file name (with path if not in the current working directory) to the function psi4.core.set_output_file() :psicode:[API] <psi4manual/master/api/psi4.core.set_output_file> , as a string:

[2]:
psi4.core.set_output_file('output.dat', False)

Additionally, output may be suppressed by instead setting psi4.core.be_quiet() :psicode:[API] <psi4manual/master/api/psi4.core.be_quiet> .

II. Running a Basic Hartree-Fock Calculation

In our first example, we will consider a Hartree-Fock SCF computation for the water molecule using a cc-pVDZ basis set. First, we will set the available memory for Psi4 to use with the psi4.set_memory() API function, which takes either a string like '30 GB' (with units!) or an integer number of bytes of memory as its argument. Next, our molecular geometry is passed as a string into psi4.geometry() API. We may input this geometry in either Z-matrix or Cartesian format; to allow the string to break over multiple lines, use Python’s triple-quote """string""" syntax. Finally, we will compute the Hartree-Fock SCF energy with the cc-pVDZ basis set by passing the method/basis set as a string ('scf/cc-pvdz') into the function psi4.energy() API:

[3]:
#! Sample HF/cc-pVDZ H2O Computation

psi4.set_memory('500 MB')

h2o = psi4.geometry("""
O
H 1 0.96
H 1 0.96 2 104.5
""")

psi4.energy('scf/cc-pvdz')
[3]:
-76.02663273488399

If everything goes well, the computation should complete and should report a final restricted Hartree-Fock energy in the output file output.dat in a section like this:

Energy converged.

@DF-RHF Final Energy:     -76.02663273486682

By default, the energy should be converged to about \(1.0\times 10^{-6}\), so agreement is only expected for about the first 6 digits after the decimal. If the computation does not complete, there is probably a problem with the compilation or installation of the program (see the installation instructions in the main Psi4 manual section :psicode:[Compiling and Installing from Source] <psi4manual/master/build_planning.html> .

This very simple input is sufficient to run the requested information. Notice we didn’t tell the program some otherwise useful information like the charge of the molecule (0, it’s neutral), the spin multiplicity (1 for a closed-shell molecule with all electrons paired), or the reference wavefunction to use (restricted Hartree-Fock, or RHF, is usually appropriate for a closed-shell molecule). The program correctly guessed all of these options for us. We can change the default behavior through additional keywords.

Let’s consider what we would do for an open-shell molecule, where not all the electrons are paired. For example, let’s run a computation on methylene (\(\text{CH}_2\)), whose ground electronic state has two unpaired electrons (triplet electronic state, or a spin multiplicity \(2S + 1 = 3\)). In this case, the default spin multiplicity (1) is not correct, so we need to tell the program the true value (3). Like many programs, Psi4 can get the charge and multiplicity as the first two integers in the Z-matrix (note the line with 0 3 at the beginning of the molecule specification below). In this example, we will also specify the bond length and bond angle as variables (\(R\) and \(A\)), whose values are first stored and then inserted into the geometry specification using Python 3 string formatting.

[4]:
#! Sample UHF/6-31G** CH2 Computation

R = 1.075
A = 133.93

ch2 = psi4.geometry("""
0 3
C
H 1 {0}
H 1 {0} 2 {1}
""".format(R, A)
)

psi4.set_options({'reference': 'uhf'})
psi4.energy('scf/6-31g**')
[4]:
-38.925334628859886

Executing this cell should yield the final energy as

@DF-UHF Final Energy:   -38.92533462887677

Notice the new command, psi4.set_options() API, to the input. This function takes a Python dictionary as its argument, which is a key-value list which associates a Psi4 keyword with its user-defined value. For open shell molecules, we have a choice of unrestricted orbitals (unrestricted Hartree-Fock, or UHF) or restricted orbitals (restricted open-shell Hartree-Fock, or ROHF). Usually, UHF is a little easier to converge (although it may be more susceptible to spin contamination than ROHF).

III. Geometry Optimization and Vibrational Frequency Analysis

The above examples were simple single-point energy computations (as specified by the psi4.energy() API function). Of course there are other kinds of computations to perform, such as geometry optimizations and vibrational frequency computations. These can be specified by replacing psi4.energy() API with psi4.optimize() API or psi4.frequency() API, respectively.

Let’s take a look at an example of optimizing the H\(_2\)O molecule using Hartree-Fock with a cc-pVDZ basis set.

Now, here comes the real beauty of running Psi4 interactively: above, when we computed the energy of H\(_2\)O with HF/cc-pVDZ, we defined the Psi4 molecule object h2o. Since we’re still in the Python shell, as long as you executed that block of code, we can reuse the h2o molecule object in our optimization without redefining it, by adding the molecule=h2o argument to the psi4.optimize() API function:

[5]:
psi4.set_options({'reference': 'rhf'})
psi4.optimize('scf/cc-pvdz', molecule=h2o)
Optimizer: Optimization complete!
[5]:
-76.02703272937504

This should perform a series of gradient computations. The gradient points which way is downhill in energy, and the optimizer then modifies the geometry to follow the gradient. After a few cycles, the geometry should converge with a message like Optimization complete!. As indicated in the following table (printed by the optimizer at the end of the computation and grep-able with ~), the energy decreases with each step, and the maximum force on each atom also decreases with each step (in principle, these numbers could increase in some iterations, but here they do not).

--------------------------------------------------------------------------------------------------------------- ~  Step         Total Energy             Delta E       MAX Force       RMS Force        MAX Disp        RMS Disp  ~ --------------------------------------------------------------------------------------------------------------- ~     1     -76.026632734908    -76.026632734908      0.01523518      0.01245755      0.02742222      0.02277530  ~     2     -76.027022666011     -0.000389931104      0.00178779      0.00142946      0.01008137      0.00594928  ~     3     -76.027032729374     -0.000010063363      0.00014019      0.00008488      0.00077463      0.00044738  ~ --------------------------------------------------------------------------------------------------------------- ~

To get harmonic vibrational frequencies, it’s important to keep in mind that the values of the vibrational frequencies are a function of the molecular geometry. Therefore, it’s important to obtain the vibrational frequencies AT THE OPTIMIZED GEOMETRY. Luckily, Psi4 updates the molecule with optimized geometry as it is being optimized. So, the optimized geometry for H\(_2\)O is stored inside the h2o molecule object, which we can access! To compute the frequencies, all we need to do is to again pass the molecule=h2o argument to the psi4.frequency() API function:

[6]:
scf_e, scf_wfn = psi4.frequency('scf/cc-pvdz', molecule=h2o, return_wfn=True)
 6 displacements needed.
 1 2 3 4 5 6

Executing this cell will prompt Psi4 to compute the Hessian (second derivative matrix) of the electronic energy with respect to nuclear displacements. From this, it can obtain the harmonic vibrational frequencies, given below (roundoff errors of around \(0.1\) cm\(^{-1}\) may exist):

  Irrep      Harmonic Frequency
                  (cm-1)
-----------------------------------------------
     A1         1775.6478
     A1         4113.3795
     B2         4212.1814
-----------------------------------------------

Notice the symmetry type of the normal modes is specified (A1, A1, B2). The program also print out the normal modes in terms of Cartesian coordinates of each atom. For example, the normal mode at \(1776\) cm\(^{-1}\) is:

 Frequency:       1775.65
 Force constant:   0.1193
           X       Y       Z           mass
O        0.000   0.000  -0.068      15.994915
H        0.000   0.416   0.536       1.007825
H        0.000  -0.416   0.536       1.007825

where the table shows the displacements in the X, Y, and Z dimensions for each atom along the normal mode coordinate. (This information could be used to animate the vibrational frequency using visualization software.)

Because the vibrational frequencies are available, a thermodynamics analysis is automatically performed at the end of the computation. You can see this in the next section of the output file output.dat. The vibrational frequencies are sufficient to obtain vibrational contributions to enthalpy (H), entropy (S), and Gibbs free energy (G). Similarly, the molecular geometry is used to obtain rotational constants, which are then used to obtain rotational contributions to H, S, and G.

Note: Psi4 has several synonyms for the functions called in this example. For instance, psi4.frequency() API will compute molecular vibrational frequencies, and psi4.optimize() API will perform a geometry optimization.

IV. Analysis of Intermolecular Interactions

Now let’s consider something a little more interesting. Psi4 contains code to analyze the nature of intermolecular interactions between two molecules, via symmetry-adapted perturbation theory (SAPT) (Jeziorski:1994:1887). This kind of analysis gives a lot of insight into the nature of intermolecular interactions, and Psi4 makes these computations easier than ever.

For a SAPT computation, the input needs to provide information on two distinct molecules. This is very easy, we just give a Z-matrix or set of Cartesian coordinates for each molecule, and separate the two with two dashes, like this:

[7]:
# Example SAPT computation for ethene*ethyne (*i.e.*, ethylene*acetylene).
# Test case 16 from S22 Database

dimer = psi4.geometry("""
0 1
C   0.000000  -0.667578  -2.124659
C   0.000000   0.667578  -2.124659
H   0.923621  -1.232253  -2.126185
H  -0.923621  -1.232253  -2.126185
H  -0.923621   1.232253  -2.126185
H   0.923621   1.232253  -2.126185
--
0 1
C   0.000000   0.000000   2.900503
C   0.000000   0.000000   1.693240
H   0.000000   0.000000   0.627352
H   0.000000   0.000000   3.963929
units angstrom
""")

Here’s the second half of the input, where we specify the computation options:

[8]:
psi4.set_options({'scf_type': 'df',
                  'freeze_core': 'true'})

psi4.energy('sapt0/jun-cc-pvdz', molecule=dimer)
[8]:
-0.0022355825227244703

All of the options we have currently set using psi4.set_options() API are “global” options (meaning that they are visible to all parts of the program). Most common Psi4 options can be set like this. If an option needs to be visible only to one part of the program (e.g., we only want to increase the energy convergence in the SCF code, but not the rest of the code), it can be set by with the psi4.set_module_options() API function, psi4.set_module_options('scf', {'e_convergence': '1e-8'}).

Note: The arguments to the functions we’ve used so far, like psi4.set_options() API, psi4.set_module_options() API, psi4.energy() API, psi4.optimize() API, psi4.frequency() API, etc., are case-insensitive.

Back to our SAPT example, we have found that for basic-level SAPT computations (i.e., SAPT0), a good error cancellation is found (Hohenstein:2012:WIREs) with the jun-cc-pVDZ basis (this is the usual aug-cc-pVDZ basis, but without diffuse functions on hydrogen and without diffuse \(d\) functions on heavy atoms) (Papajak:2011:10). So, we’ll use that as our standard basis set. The SAPT code is designed to use density fitting techniques, because they introduce minimal errors while providing much faster computations (Hohenstein:2010:184111,Hohenstein:2010:014101). Since we’re using density fitting for the SAPT, we might as well also use it for the Hartree-Fock computations that are performed as part of the SAPT. We can specify that by adding 'scf_type': 'df' to the dictionary passed to psi4.set_options().

Density fitting procedures require the use of auxiliary basis sets that pair with the primary basis set. Fortunately, Psi4 is usually smart enough to figure out what auxiliary basis sets are needed for a given computation. In this case, jun-cc-pVDZ is a standard enough basis set (just a simple truncation of the very popular aug-cc-pVDZ basis set) that Psi4 correctly guesses that we want the jun-cc-pVDZ-JKFIT auxiliary basis for the Hartree-Fock, and the jun-cc-pVDZ-RI basis set for the SAPT procedure.

To speed up the computation a little, we also tell the SAPT procedure to freeze the core electrons by adding 'freeze_core': 'true' to the dictionary passed to psi4.set_options(). The SAPT procedure is invoked by psi4.energy('sapt0/jun-cc-pvdz', molecule=dimer). Here, Psi4 knows to automatically run two monomer computations and a dimer computation and then use these results to perform the SAPT analysis. The various energy components are printed at the end of the output, in addition to the total SAPT0 interaction energy. An explanation of the various energy components can be found in the review by Jeziorski, Moszynski, and Szalewicz (Jeziorski:1994:1887), and this is discussed in more detail in the SAPT section of the Psi4 manual.

For now, we’ll note that most of the SAPT energy components are negative; this means those are attractive contributions (the zero of energy in a SAPT computation is defined as non-interacting monomers). The exchange contributions are positive (repulsive). In this example, the most attractive contribution between ethylene and acetylene is an electrostatic term of \(-2.12\) kcal/mol (Elst10,r where the 1 indicates the first-order perturbation theory result with respect to the intermolecular interaction, and the 0 indicates zeroth-order with respect to the intramolecular electron correlation). The next most attractive contribution is the Disp20 term (second order intermolecular dispersion, which looks like MP2 in which one excitation is placed on each monomer), contributing an attraction of \(-1.21\) kcal/mol. It is not surprising that the electrostatic contribution is dominant, because the geometry chosen for this example has the acetylene perpendicular to the ethylene, with the acetylene hydrogen pointing directly at the double bond in ethylene; this will be attractive because the H atoms in acetylene bear a partial positive charge, while the electron rich double bond in ethylene bears a partial negative charge. At the same time, the dispersion interaction should be smaller because the perpendicular geometry does not allow much overlap between the monomers. Hence, the SAPT analysis helps clarify (and quantify) our physical understanding about the interaction between the two monomers.

V. Potential Surface Scans and Counterpoise Correction Made Easy

Finally, let’s consider an example which highlights the advantages of being able to interact with Psi4 directly with Python.

Suppose you want to do a limited potential energy surface scan, such as computing the interaction energy between two neon atoms at various interatomic distances. One simple but unappealing way to do this is to generate separate geometries for each distance to be studied. Instead, we can leverage Python loops and string formatting to make our lives simpler. Additionally, let’s suppose you want to do counterpoise (CP) correction to compute interaction energies. Counterpoise correction involves computing the dimer energy and then subtracting out the energies of the two monomers, each evaluated in the dimer basis. Again, each of these computations could be run in a separate input file, but because counterpoise correction is a fairly standard procedure for intermolecular interactions, Psi4 knows about it and has a built-in routine to perform counterpoise correction. It only needs to know what method you want to do the couterpoise correction on (here, let’s consider CCSD(T)), and it needs to know what’s monomer A and what’s monomer B. This last issue of specifying the monomers separately was already dealt with in the previous SAPT example, where we saw that two dashes in the psi4.geometry() string can be used to separate monomers.

So, we’re going to do counterpoise-corrected CCSD(T) energies for Ne\(_2\) at a series of different interatomic distances. And let’s print out a table of the interatomic distances we’ve considered, and the CP-corrected CCSD(T) interaction energies (in kcal/mol) at each geometry:

[9]:
#! Example potential energy surface scan and CP-correction for Ne2

ne2_geometry = """
Ne
--
Ne 1 {0}
"""

Rvals = [2.5, 3.0, 4.0]

psi4.set_options({'freeze_core': 'true'})

# Initialize a blank dictionary of counterpoise corrected energies
# (Need this for the syntax below to work)

ecp = {}

for R in Rvals:
    ne2 = psi4.geometry(ne2_geometry.format(R))
    ecp[R] = psi4.energy('ccsd(t)/aug-cc-pvdz', bsse_type='cp', molecule=ne2)

# Prints to screen
print("CP-corrected CCSD(T)/aug-cc-pVDZ Interaction Energies\n\n")
print("          R [Ang]                 E_int [kcal/mol]       ")
print("---------------------------------------------------------")
for R in Rvals:
    e = ecp[R] * psi4.constants.hartree2kcalmol
    print("            {:3.1f}                        {:1.6f}".format(R, e))

# Prints to output.dat
psi4.core.print_out("CP-corrected CCSD(T)/aug-cc-pVDZ Interaction Energies\n\n")
psi4.core.print_out("          R [Ang]                 E_int [kcal/mol]       \n")
psi4.core.print_out("---------------------------------------------------------\n")
for R in Rvals:
    e = ecp[R] * psi4.constants.hartree2kcalmol
    psi4.core.print_out("            {:3.1f}                        {:1.6f}\n".format(R, e))

CP-corrected CCSD(T)/aug-cc-pVDZ Interaction Energies


          R [Ang]                 E_int [kcal/mol]
---------------------------------------------------------
            2.5                        0.758605
            3.0                        0.015968
            4.0                        -0.016215

First, you can see the geometry string ne2_geometry has a two dashes to separate the monomers from each other. Also note we’ve used a Z-matrix to specify the geometry, and we’ve used a variable (R) as the interatomic distance. We have not specified the value of R in the ne2_geometry string like we normally would. That’s because we are going to vary it during the scan across the potential energy surface, by using a Python loop over the list of interatomic distances Rvals. Before we are able to pass our molecule to Psi4, we need to do two things. First, we must set the value of the intermolecular separation in our Z-matrix (by using Python 3 string formatting) to the particular value of R. Second, we need to turn the Z-matrix string into a Psi4 molecule, by passing it to `psi4.geometry() <http://psicode.org/psi4manual/master/api/psi4.driver.geometry.html#psi4.driver.geometry>`__. The argument bsse_type='cp' tells Psi4 to perform counterpoise (CP) correction on the dimer to compute the CCSD(T)/aug-cc-pVDZ interaction energy, which is stored in our ecp dictionary at each iteration of our Python loop. Note that we didn’t need to specify ghost atoms, and we didn’t need to call the monomer and dimer computations separately. Psi4 does it all for us, automatically.

Near the very end of the output file output.dat, the counterpoise correction Python function will print a nice summary of the results of the counterpoise computation (the energies of the dimer, of monomer 1 with the ghost functions of monomer 2, of monomer 2 with the ghost functions of monomer 1, and the overall counterpoise corrected interaction energy):

N-Body: Computing complex (1/2) with fragments (2,) in the basis of fragments (1, 2).
...
N-Body: Complex Energy (fragments = (2,), basis = (1, 2):  -128.70932405488924)
...
N-Body: Computing complex (2/2) with fragments (1,) in the basis of fragments (1, 2).
...
N-Body: Complex Energy (fragments = (1,), basis = (1, 2):  -128.70932405488935)
...
N-Body: Computing complex (1/1) with fragments (1, 2) in the basis of fragments (1, 2).
...
N-Body: Complex Energy (fragments = (1, 2), basis = (1, 2):  -257.41867403127321)
...
==> N-Body: Counterpoise Corrected (CP)  energies <==

n-Body     Total Energy [Eh]       I.E. [kcal/mol]      Delta [kcal/mol]
     1     -257.418648109779        0.000000000000        0.000000000000
     2     -257.418674031273       -0.016265984132       -0.016265984132

And that’s it! The only remaining part of the example is a little table of the different R values and the CP-corrected CCSD(T) energies, converted from atomic units (Hartree) to kcal mol\(^{-1}\) by multiplying by the automatically-defined conversion factor psi4.constants.hartree2kcalmol. Psi4 provides several built-in physical constants and conversion factors, as described in the Psi4 manual section Physical Constants. The table can be printed either to the screen, by using standard Python ``print()` syntax <https://docs.python.org/3/whatsnew/3.0.html#print-is-a-function>`__, or to the designated output file output.dat using Psi4’s built-in function psi4.core.print_out() :psicode:[API] <psi4manual/master/api/psi4.core.print_out> (C style printing).

As we’ve seen so far, the combination of Psi4 and Python creates a unique, interactive approach to quantum chemistry. The next section will explore this synergistic relationship in greater detail, describing how even very complex tasks can be done very easily with Psi4.

[ ]: