Automated transition state theory calculations for high-throughput kinetics

Pierre L. Bhoorasingh, Belinda L. Slakman, Fariba Seyedzadeh Khanshan, Jason Y. Cain, Richard H. West.

Analysis for the manuscript with the above name.

In [1]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
import matplotlib

The first analysis uses kinetics generated for an earlier version of the manuscript, stored in kinetics.csv, and focuses on the H Abstraction reactions.

In [2]:
kinetics = pd.read_csv('kinetics.csv')
In [3]:
h_abs = kinetics[kinetics['family']=='H_Abstraction']
In [4]:
h_abs.head()
Out[4]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary
0 2 H_Abstraction [OH] [H][H] [H] O 3.106270e+07 NaN 1.437547e+06 362821.827253 oh+h2=h+h2o ... Exact match found for rate rule (H2;O_pri_rad)
1 4 H_Abstraction [O][O] [H][H] [H] [O]O 6.780722e-06 NaN 2.091163e-05 0.000030 h2+o2=h+ho2 ... Exact match found for rate rule (H2;O2b)
2 8 H_Abstraction [H] OO [O]O [H][H] 6.799489e+05 NaN 1.049946e+06 228891.466188 h2o2+h=h2+ho2 ... Average of (O/H/NonDeC;H_rad + O/H/NonDeN;H_ra...
3 9 H_Abstraction [O] OO [O]O [OH] 1.923345e+06 NaN 1.295325e+06 996711.005718 h2o2+o=oh+ho2 ... Average of (O/H/NonDeC;O_atom_triplet + O/H/No...
4 17 H_Abstraction C=O [O]OC=O O=COO [CH]=O 1.149955e+04 NaN 5.631393e+03 3824.023252 ch2o+o2cho=hco+ho2cho ... Estimated using template (CO_pri;O_rad/NonDeO)...
In [5]:
h_abs.plot.scatter('AutoTST_fwd', 'Rate Rules', loglog=True, 
                   xlim=(1e-1,1e7), ylim=(1e-1,1e7), figsize=(6,6))
plt.gca().set_aspect('equal')

Note the significant banding, where many reactions are given the same rate estimate. Lets parse the "comment" that RMG puts on the reaction estimate, to see how each was estimated:

In [6]:
def getRMGSource(rule):
    rule = rule.lower().strip()
    if 'average' in rule:
        return 'Average'
    elif 'estimated' in rule:
        return 'Other estimate'
    elif 'exact' in rule:
        return 'Exact match'
    else:
        return 'Unknown'
In [7]:
h_abs = h_abs.dropna(subset=['Rule summary'])
h_abs['RMGSource'] = h_abs.apply(lambda row: getRMGSource(row['Rule summary']), axis=1)
In [8]:
#sns.set_style('ticks')
fg = sns.FacetGrid(data=h_abs, hue='RMGSource', size=6, aspect=1.0, xlim=(1e-1,1e7), ylim=(1e-1,1e7))
ax = fg.map(plt.scatter, 'AutoTST_fwd', 'Rate Rules')
ax.set(xscale="log", yscale="log")
ax.axes[0][0].plot((0.1, 1e7), (0.1, 1e7), ls='--', label='Parity', color='gray')
plt.legend()
plt.show()

It looks like the "Other estimate" causes most of the banding, and causes Rate Rules to over-estimate the rate relative to AutoTST. Lets calculate some discrepancy histograms, showing how AutoTST varies with respect to the other source (Rate Rules or LLNL)

In [9]:
h_abs['DifferenceRMG'] = np.log(h_abs['AutoTST_fwd']/h_abs['Rate Rules'])
h_abs.head()
Out[9]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary RMGSource DifferenceRMG
0 2 H_Abstraction [OH] [H][H] [H] O 3.106270e+07 NaN 1.437547e+06 362821.827253 oh+h2=h+h2o ... Exact match found for rate rule (H2;O_pri_rad) Exact match 4.449851
1 4 H_Abstraction [O][O] [H][H] [H] [O]O 6.780722e-06 NaN 2.091163e-05 0.000030 h2+o2=h+ho2 ... Exact match found for rate rule (H2;O2b) Exact match -1.498202
2 8 H_Abstraction [H] OO [O]O [H][H] 6.799489e+05 NaN 1.049946e+06 228891.466188 h2o2+h=h2+ho2 ... Average of (O/H/NonDeC;H_rad + O/H/NonDeN;H_ra... Average 1.088770
3 9 H_Abstraction [O] OO [O]O [OH] 1.923345e+06 NaN 1.295325e+06 996711.005718 h2o2+o=oh+ho2 ... Average of (O/H/NonDeC;O_atom_triplet + O/H/No... Average 0.657360
4 17 H_Abstraction C=O [O]OC=O O=COO [CH]=O 1.149955e+04 NaN 5.631393e+03 3824.023252 ch2o+o2cho=hco+ho2cho ... Estimated using template (CO_pri;O_rad/NonDeO)... Other estimate 1.101005
In [10]:
fg = sns.FacetGrid(data=h_abs, hue='RMGSource', size=6, hue_kws={'alpha': [1.0, 0.7, 0.5]})
ax = fg.map(plt.hist, 'DifferenceRMG', bins=np.arange(-10.5, 11.5)).add_legend()

"Exact match" matches closest (rate rule and AutoTST), and for "Other estimate" the AutoTST is biased to be slower than the Rate Rules (probably the Rate Rules estimate is on average too fast). Now compare AutoTST with the LLNL (Sarathy) model:

In [11]:
h_abs['DifferenceSarathy'] = np.log(h_abs['AutoTST_fwd']/h_abs['Sarathy'])
h_abs.head()
Out[11]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary RMGSource DifferenceRMG DifferenceSarathy
0 2 H_Abstraction [OH] [H][H] [H] O 3.106270e+07 NaN 1.437547e+06 362821.827253 oh+h2=h+h2o ... Exact match found for rate rule (H2;O_pri_rad) Exact match 4.449851 3.073069
1 4 H_Abstraction [O][O] [H][H] [H] [O]O 6.780722e-06 NaN 2.091163e-05 0.000030 h2+o2=h+ho2 ... Exact match found for rate rule (H2;O2b) Exact match -1.498202 -1.126222
2 8 H_Abstraction [H] OO [O]O [H][H] 6.799489e+05 NaN 1.049946e+06 228891.466188 h2o2+h=h2+ho2 ... Average of (O/H/NonDeC;H_rad + O/H/NonDeN;H_ra... Average 1.088770 -0.434477
3 9 H_Abstraction [O] OO [O]O [OH] 1.923345e+06 NaN 1.295325e+06 996711.005718 h2o2+o=oh+ho2 ... Average of (O/H/NonDeC;O_atom_triplet + O/H/No... Average 0.657360 0.395304
4 17 H_Abstraction C=O [O]OC=O O=COO [CH]=O 1.149955e+04 NaN 5.631393e+03 3824.023252 ch2o+o2cho=hco+ho2cho ... Estimated using template (CO_pri;O_rad/NonDeO)... Other estimate 1.101005 0.713951
In [12]:
h_abs.hist('DifferenceSarathy', bins=np.arange(-10.5, 11.5))
Out[12]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x117780f50>]], dtype=object)

A slight overall bias, with AutoTST usually thinking things should be slower than Sarathy thinks they should be. Compare that with a similar plot for AutoTST vs RMG rate rules:

In [13]:
h_abs.hist('DifferenceRMG', bins=np.arange(-10.5, 11.5))
Out[13]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x119298050>]], dtype=object)

Still shows that AutoTST thinks things are slower than RMG thinks. Slightly less of a bias, suggesting AutoTST is closer to RMG estimates than Sarathy's estimates? Let's compare RMG to Sarathy directly:

In [14]:
h_abs['DifferenceSarathyRMG'] = np.log(h_abs['Rate Rules']/h_abs['Sarathy'])
h_abs.hist('DifferenceSarathyRMG',bins=np.arange(-10.5, 11.5))
Out[14]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x118ef5bd0>]], dtype=object)

Start again with new CSV

An updated version of RMG now calculates the nodal distance between the most specific node in the hierarchical decision tree, and the node that was actually used to provide the estimate. If the tree was full of data, this would be zero, but when you have to climb the tree looking for data, it is greater than zero, and you presumably have a less accurate rate (or at least a less precisely defined estimate).

However, the updated version of RMG also changed how the nodal discance information is used to decide which rules to average, to give (hopefully) better estimates. Therefore the "RMG rate rule" estimates have changed. We repeat with the new analysis, stored in the spreadsheet kineticsNEW.csv:

In [15]:
new_kinetics = pd.read_csv('kineticsNEW.csv')
In [16]:
new_kinetics.head()
Out[16]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
0 2 H_Abstraction [OH] [H][H] [H] O 3.106270e+07 NaN 1.437547e+06 362821.827253 oh+h2=h+h2o ... Exact match found for rate rule (H2;O_pri_rad) 362821.827253 Exact match found for rate rule [H2;O_pri_rad] 0.0
1 4 H_Abstraction [O][O] [H][H] [H] [O]O 6.780722e-06 NaN 2.091163e-05 0.000030 h2+o2=h+ho2 ... Exact match found for rate rule (H2;O2b) 0.000030 Exact match found for rate rule [H2;O2b] 0.0
2 8 H_Abstraction [H] OO [O]O [H][H] 6.799489e+05 NaN 1.049946e+06 228891.466188 h2o2+h=h2+ho2 ... Average of (O/H/NonDeC;H_rad + O/H/NonDeN;H_ra... 6575.656253 Average of [H2O2;O_rad + H2O2;Cd_rad + H2O2;CO... 1.0
3 9 H_Abstraction [O] OO [O]O [OH] 1.923345e+06 NaN 1.295325e+06 996711.005718 h2o2+o=oh+ho2 ... Average of (O/H/NonDeC;O_atom_triplet + O/H/No... 80956.957436 Average of [Average of [O/H/NonDeC;O_atom_trip... 2.0
4 17 H_Abstraction C=O [O]OC=O O=COO [CH]=O 1.149955e+04 NaN 5.631393e+03 3824.023252 ch2o+o2cho=hco+ho2cho ... Estimated using template (CO_pri;O_rad/NonDeO)... 3824.023252 Estimated using template [CO_pri;O_rad/NonDeO]... 1.0
In [17]:
# The 'AutoTST_rev' column contains some strings, so the numbers are also typed as strings by Pandas.
# Convert them to numbers when possible.
def float_if_possible(k):
    "Like `float()` but returns the original input if it can't be converted."
    try:
        return float(k)
    except:
        return k

new_kinetics.ix[:,'AutoTST_rev'] = map(float_if_possible,new_kinetics.ix[:,'AutoTST_rev'])
new_kinetics.ix[8:10]
Out[17]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
8 24 H_Abstraction [O]O C=O [CH]=O OO 1507.849432 NaN 7.186491e+03 3824.023252 ch2o+ho2=hco+h2o2 ... Exact match found for rate rule (CO_pri;O_rad/... 3824.023252 Exact match found for rate rule [CO_pri;O_rad/... 0.0
9 26 R_Addition_MultipleBond O=CO [H] [O]CO NaN 76105.289734 3.3034e+10 5.542124e+10 751515.021321 hoch2o=hocho+h ... Exact match found for rate rule (CO-NdH_O;HJ) 751515.021321 Exact match found for rate rule [CO-NdH_O;HJ] 0.0
10 28 H_Abstraction OO [O]C=O [O]O O=CO 43606.343014 NaN 1.565882e+04 25.911332 ocho+h2o2=hocho+ho2 ... H2O2;O_rad/Cd\H_Cd\H\CsExact match found for r... 25.911332 Average of [H2O2;O_rad/Cd]. Estimated using an... 0.0

Let's see how many of the rate rule estimates have actually changed with the new version of RMG

In [18]:
print len(new_kinetics[new_kinetics['Rate Rules'] == new_kinetics['Rate Rules (New)']]), " the same"
print len(new_kinetics[new_kinetics['Rate Rules'] != new_kinetics['Rate Rules (New)']]), " have changed"
508  the same
270  have changed

Unit conversion

In RMG (and thus the CSV files) the units are combinations of [m,mol,s], so to plot in cm3/mol/s, as expected by most combustion chemists, we must multiply the rate coefficients for the bimolecular reactions (H_Abstraction and R_Addition_MultipleBond) by 106

In [19]:
# Hydrogen abstraction. All rates need converting.
h_abs = new_kinetics['family']=='H_Abstraction'
rate_columns = 'AutoTST_fwd', 'Sarathy', 'Rate Rules', 'Rate Rules (New)'
for col in rate_columns:
    new_kinetics.loc[h_abs,col] *= 1e6
new_kinetics.loc[h_abs].head(3)
Out[19]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
0 2 H_Abstraction [OH] [H][H] [H] O 3.106270e+13 NaN 1.437547e+12 3.628218e+11 oh+h2=h+h2o ... Exact match found for rate rule (H2;O_pri_rad) 3.628218e+11 Exact match found for rate rule [H2;O_pri_rad] 0.0
1 4 H_Abstraction [O][O] [H][H] [H] [O]O 6.780722e+00 NaN 2.091163e+01 3.033449e+01 h2+o2=h+ho2 ... Exact match found for rate rule (H2;O2b) 3.033449e+01 Exact match found for rate rule [H2;O2b] 0.0
2 8 H_Abstraction [H] OO [O]O [H][H] 6.799489e+11 NaN 1.049946e+12 2.288915e+11 h2o2+h=h2+ho2 ... Average of (O/H/NonDeC;H_rad + O/H/NonDeN;H_ra... 6.575656e+09 Average of [H2O2;O_rad + H2O2;Cd_rad + H2O2;CO... 1.0
In [20]:
# Radical addition to a multiple bond.
# Only the values in the bimolecular direction need converting. 
# For now, leave alone the Sarathy column.
r_addition = new_kinetics['family']=='R_Addition_MultipleBond'
rate_columns = 'AutoTST_fwd', 'Rate Rules', 'Rate Rules (New)'
for col in rate_columns:
    new_kinetics.loc[r_addition,col] *= 1e6
new_kinetics.loc[r_addition].head(3)
Out[20]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
9 26 R_Addition_MultipleBond O=CO [H] [O]CO NaN 7.610529e+10 3.3034e+10 5.542124e+10 7.515150e+11 hoch2o=hocho+h ... Exact match found for rate rule (CO-NdH_O;HJ) 7.515150e+11 Exact match found for rate rule [CO-NdH_O;HJ] 0.0
11 30 R_Addition_MultipleBond [H] C=O C[O] NaN 1.676554e+12 6.87309e+07 6.947078e+06 2.117622e+12 ch3o(+M)=ch2o+h(+M) ... Exact match found for rate rule (CO-HH_O;HJ) 2.117622e+12 Exact match found for rate rule [CO-HH_O;HJ] 0.0
13 33 R_Addition_MultipleBond C=O [H] [CH2]O NaN 2.562862e+11 not reverse 3.499867e+05 2.588827e+11 ch2o+h(+M)=ch2oh(+M) ... Exact match found for rate rule (Od_CO-HH;HJ) 2.588827e+11 Exact match found for rate rule [Od_CO-HH;HJ] 0.0
In [21]:
# These rates Sarathy had in the recombination (bimolecular) direction too
# (the others he had as beta scission, and we leave them alone)
r_addition_sarathy = new_kinetics['AutoTST_rev']=='not reverse'
new_kinetics.loc[r_addition_sarathy,'Sarathy'] *= 1e6
new_kinetics.loc[r_addition_sarathy].head(2)
Out[21]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
13 33 R_Addition_MultipleBond C=O [H] [CH2]O NaN 2.562862e+11 not reverse 3.499867e+11 2.588827e+11 ch2o+h(+M)=ch2oh(+M) ... Exact match found for rate rule (Od_CO-HH;HJ) 2.588827e+11 Exact match found for rate rule [Od_CO-HH;HJ] 0.0
38 75 R_Addition_MultipleBond [CH3] C=O CC[O] NaN 2.757500e+09 not reverse 1.237140e+10 2.594071e+10 ch3+ch2o=c2h5o ... Exact match found for rate rule (CO-HH_O;CsJ-HHH) 2.594071e+10 Exact match found for rate rule [CO-HH_O;CsJ-H... 0.0
In [22]:
new_kinetics.head(10)
Out[22]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
0 2 H_Abstraction [OH] [H][H] [H] O 3.106270e+13 NaN 1.437547e+12 3.628218e+11 oh+h2=h+h2o ... Exact match found for rate rule (H2;O_pri_rad) 3.628218e+11 Exact match found for rate rule [H2;O_pri_rad] 0.0
1 4 H_Abstraction [O][O] [H][H] [H] [O]O 6.780722e+00 NaN 2.091163e+01 3.033449e+01 h2+o2=h+ho2 ... Exact match found for rate rule (H2;O2b) 3.033449e+01 Exact match found for rate rule [H2;O2b] 0.0
2 8 H_Abstraction [H] OO [O]O [H][H] 6.799489e+11 NaN 1.049946e+12 2.288915e+11 h2o2+h=h2+ho2 ... Average of (O/H/NonDeC;H_rad + O/H/NonDeN;H_ra... 6.575656e+09 Average of [H2O2;O_rad + H2O2;Cd_rad + H2O2;CO... 1.0
3 9 H_Abstraction [O] OO [O]O [OH] 1.923345e+12 NaN 1.295325e+12 9.967110e+11 h2o2+o=oh+ho2 ... Average of (O/H/NonDeC;O_atom_triplet + O/H/No... 8.095696e+10 Average of [Average of [O/H/NonDeC;O_atom_trip... 2.0
4 17 H_Abstraction C=O [O]OC=O O=COO [CH]=O 1.149955e+10 NaN 5.631393e+09 3.824023e+09 ch2o+o2cho=hco+ho2cho ... Estimated using template (CO_pri;O_rad/NonDeO)... 3.824023e+09 Estimated using template [CO_pri;O_rad/NonDeO]... 1.0
5 21 H_Abstraction C=O [H] [CH]=O [H][H] 2.478174e+12 NaN 7.245994e+12 4.808223e+12 ch2o+h=hco+h2 ... CO/H/NonDe;H_rad. Estimated using template (CO... 5.802955e+10 Average of [Average of [CO_sec;H_rad] + Averag... 1.0
6 22 H_Abstraction [O] C=O [OH] [CH]=O 1.649824e+13 NaN 5.657989e+12 2.659981e+12 ch2o+o=hco+oh ... Exact match found for rate rule (CO_pri;O_atom... 2.659981e+12 Exact match found for rate rule [CO_pri;O_atom... 0.0
7 23 H_Abstraction [CH3] C=O [CH]=O C 1.905433e+10 NaN 5.258136e+10 2.880153e+10 ch2o+ch3=hco+ch4 ... Exact match found for rate rule (CO_pri;C_methyl) 2.880153e+10 Exact match found for rate rule [CO_pri;C_meth... 0.0
8 24 H_Abstraction [O]O C=O [CH]=O OO 1.507849e+09 NaN 7.186491e+09 3.824023e+09 ch2o+ho2=hco+h2o2 ... Exact match found for rate rule (CO_pri;O_rad/... 3.824023e+09 Exact match found for rate rule [CO_pri;O_rad/... 0.0
9 26 R_Addition_MultipleBond O=CO [H] [O]CO NaN 7.610529e+10 3.3034e+10 5.542124e+10 7.515150e+11 hoch2o=hocho+h ... Exact match found for rate rule (CO-NdH_O;HJ) 7.515150e+11 Exact match found for rate rule [CO-NdH_O;HJ] 0.0

OK, we're ready for some plots. First set some style parameters.

In [23]:
plt.rc('font', size=16)          # controls default text sizes
plt.rc('axes', titlesize=16)     # fontsize of the axes title
plt.rc('axes', labelsize=16)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=16)    # fontsize of the tick labels
plt.rc('ytick', labelsize=16)    # fontsize of the tick labels
plt.rc('legend', fontsize=16)    # legend fontsize
plt.rc('figure', titlesize=16)  # fontsize of the figure title
In [24]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15,5))

new_kinetics[new_kinetics['family'] == 'H_Abstraction'].plot.scatter(
                    'AutoTST_fwd', 'Rate Rules (New)',  
                    c= 'Nodal distance', cmap = 'plasma',
                    loglog=True, 
                    xlim=(1e5,1e15),
                    ylim=(1e5,1e15),
                    s=50, vmin=0, vmax=3.0, 
                    edgecolor='black', linewidth=0.5, 
                    ax=axes[0], colorbar=False,
                    title = u"Hydrogen abstraction\n$k$(1000K) / cm$^3$mol$^{-1}$s$^{-1}$")

new_kinetics[new_kinetics['family'] == 'R_Addition_MultipleBond'].plot.scatter(
                    'AutoTST_fwd', 'Rate Rules (New)',  
                    c= 'Nodal distance', cmap = 'plasma', 
                    loglog=True, 
                    xlim=(1e3,1e14),
                    ylim=(1e3,1e14), 
                    s=50, vmin=0, vmax=3.0, 
                    edgecolor='black', linewidth=0.5, 
                    ax=axes[1], colorbar=False,
                    title = u"R addition\n$k$(1000K) / cm$^3$mol$^{-1}$s$^{-1}$")

new_kinetics[new_kinetics['family'] == 'intra_H_migration'].plot.scatter(
                    'AutoTST_fwd', 'Rate Rules (New)',  
                    c= 'Nodal distance', cmap = 'plasma', 
                    loglog=True, 
                    xlim=(1e-2,1e10),
                    ylim=(1e-2,1e10), 
                    s=50, vmin=0, vmax=3.0, 
                    edgecolor='black', linewidth=0.5, 
                    ax=axes[2], colorbar=False,
                    title = u"Intramolecular H migration\n$k$(1000K) / s$^{-1}$")

fig.subplots_adjust(wspace=0.05, hspace=0.4, right=0.8)

im = plt.gca().get_children()[0]
cax = fig.add_axes([1.0,0.2,0.02,0.7]) 
fig.colorbar(im, cax=cax, label='Nodal distance')

for ax in axes:
    ax.set(aspect='equal')
    ax.set_xlabel(u'AutoTST')
    ax.set_ylabel(u'RMG Rate Rules')
    
    startx, endx = ax.get_xlim()
    xticks = [10**i for i in np.arange(np.log10(startx), np.log10(endx)+1, 2)]
    ax.set_xticks(xticks)
    starty, endy = ax.get_ylim()
    yticks = [10**i for i in np.arange(np.log10(starty), np.log10(endy)+1, 2)]
    ax.set_yticks(yticks) 
    
    ax.plot(ax.get_xlim(),ax.get_xlim(), ls='--', color='gray', zorder=0)
    
plt.tight_layout()

fig.savefig('RateRulesvsAutoTST.pdf', bbox_inches='tight')
/Users/rwest/anaconda/lib/python2.7/site-packages/matplotlib/figure.py:1743: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect.
  warnings.warn("This figure includes Axes that are not "

For the LLNL plots we need to deal with Radical Addition and Beta Scission reactions carefully. They're in the same Sarathy column, but mean different things. The indication is whether the AutoTST_rev column has "not reverse" or a value.

In [25]:
r_add = new_kinetics[(new_kinetics['family'] == 'R_Addition_MultipleBond') &
                     (new_kinetics['AutoTST_rev'] == 'not reverse')]
beta_scission = new_kinetics[(new_kinetics['family'] == 'R_Addition_MultipleBond') &
                             (new_kinetics['AutoTST_rev'] != 'not reverse')]
print len(r_add), "radical addition to double bond"
print len(beta_scission), "beta scission"
108 radical addition to double bond
23 beta scission
In [26]:
# First, just do radical addition direction (ignore the beta scission)

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15,5))

new_kinetics[new_kinetics['family'] == 'H_Abstraction'].plot.scatter(
                    'AutoTST_fwd', 'Sarathy',
                    loglog=True, 
                    xlim=(1e5,1e14), ylim=(1e5,1e14), 
                    s=50, edgecolor='black', linewidth=0.5, 
                    ax=axes[0], 
                    title="Hydrogen abstraction")
 
r_add.plot.scatter(
                    'AutoTST_fwd', 'Sarathy', 
                    loglog=True, 
                    xlim=(1e3,1e14), ylim=(1e3,1e14), 
                    s=50, edgecolor='black', linewidth=0.5, 
                    ax=axes[1], 
                    title = r"R addition to multiple bond")

new_kinetics[new_kinetics['family'] == 'intra_H_migration'].plot.scatter(
                    'AutoTST_fwd', 'Sarathy', 
                    loglog=True, 
                    xlim=(1e-2,1e10), ylim=(1e-2,1e10), 
                    s=50, edgecolor='black', linewidth=0.5, 
                    ax=axes[2], 
                    title = "Intramolecular H migration")

for ax, units in zip(axes, ['cm$^3$mol$^{-1}$s$^{-1}$']*2+['s$^{-1}$']):
    x_limits = ax.get_xlim()
    y_limits = ax.get_ylim()
    limits = min(x_limits[0], y_limits[0]), max(x_limits[1], y_limits[1])
    ax.plot(limits, limits, ls='--', color='gray', zorder=0)
    ax.set_xlim(limits)
    ax.set_ylim(limits)
    ax.set(aspect='equal')
    ax.set_xlabel(u'AutoTST')
    ax.set_ylabel(u'LLNL Model')
    ax.set_title( ax.title.get_text() + '\n$k$(1000K) / {}'.format(units) )

    locator = matplotlib.ticker.LogLocator()
    ax.xaxis.set_major_locator( locator )
    ax.yaxis.set_major_locator( locator )

    
plt.tight_layout()

plt.show()

Now try just the beta scission reactions

In [27]:
plt.figure(figsize=(6,6))

ax = beta_scission.plot.scatter(
                    'AutoTST_rev', 'Sarathy', 
                    loglog=True, 
                    s=50, edgecolor='black', linewidth=0.5, 
                    color='lightcoral',
                    title = r"$\beta$-scission")
ax.plot(ax.get_xlim(), ax.get_xlim(), ls='--', color='gray')

units = 's$^{-1}$'
ax.set(aspect='equal')
ax.set_xlabel(u'AutoTST')
ax.set_ylabel(u'LLNL Model')
ax.set_title( ax.title.get_text() + '\n$k$(1000K) / {}'.format(units) )
locator = matplotlib.ticker.LogLocator()
ax.xaxis.set_major_locator( locator )
ax.yaxis.set_major_locator( locator )
    
plt.show()
<matplotlib.figure.Figure at 0x1190bc790>

What's that outlier that LLNL has at 1e-6?

In [28]:
r = new_kinetics.loc[ new_kinetics['Sarathy']<1e-4 ]
k =  r['SarathySource']
print k.values[0]
r
ho2ch2co=ch2co+ho2                                  1.000e+00 0.000     0.000    
    PLOG/ 0.010     1.120e+07 -3.760    21.680   /
    PLOG/ 0.100     1.100e+08 -3.760    21.680   /
    PLOG/ 1.000     9.200e+08 -3.730    21.630   /
    PLOG/ 10.000    2.090e+09 -3.550    21.220   /
Out[28]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
64 109 R_Addition_MultipleBond C=C=O [O]O O=[C]COO NaN 2.509610e+07 1.3635e+09 0.000001 4.010655e+07 ho2ch2co=ch2co+ho2 ... Average of (Average of (Cds-HH_Cds-HH;OJ_pri +... 5.302642e+06 Average of [CO_O;OJ-Os + Cds_Cds;OJ-Os]. Estim... 3.0

First, check the 10 atm parameters do give the 1e-6 number we are using and it's not an evaluation problem.

In [29]:
T = 1000.
A, n, Ea = (2.09e9, -3.550, 21.220)
k = A * T**n * np.exp(-Ea / (0.00198720359*T))
print "k =",k
print "log10(k) = ", np.log10(k)
k = 1.07797806745e-06
log10(k) =  -5.96739007522

So that's correct. According to some other chemkin files these parameters appear in, they came from this paper: Lee, J.; Bozzelli, J. W. Thermochemical and Kinetic Analysis of the Formyl Methyl Radical + O2 Reaction System. J. Phys. Chem. A 2003, 107 (19), 3778–3791 DOI: 10.1021/jp030001o.

We can confirm they match the 10 atm results from Table 8, Reaction 5.

However, the actual High Pressure Limit rate that went in to the QRRK calculation is provided in Table 7, Reaction 5. These parameters give:

In [30]:
# Actual High P limit that went in to the QRRK calculation, table 7, reaction 5
A, n, Ea = (1.47e10, 0.65, 23.82)
k = A * T**n * np.exp(-Ea / (0.00198720359*T))
print "k =",k
print "log10(k) = ", np.log10(k)
k = 8157603.05826
log10(k) =  6.91156256911

which is much closer, and would not appear an outlier.

We can update the table and re-plot it like so:

In [31]:
new_kinetics.loc[r.index,'Sarathy'] = k
new_kinetics.loc[r.index]
Out[31]:
number family reactant1 reactant2 product1 product2 AutoTST_fwd AutoTST_rev Sarathy Rate Rules SarathySource Rule summary Rate Rules (New) Rule summary (New) Nodal distance
64 109 R_Addition_MultipleBond C=C=O [O]O O=[C]COO NaN 2.509610e+07 1.3635e+09 8.157603e+06 4.010655e+07 ho2ch2co=ch2co+ho2 ... Average of (Average of (Cds-HH_Cds-HH;OJ_pri +... 5.302642e+06 Average of [CO_O;OJ-Os + Cds_Cds;OJ-Os]. Estim... 3.0
In [32]:
beta_scission = new_kinetics[(new_kinetics['family'] == 'R_Addition_MultipleBond') & 
                             (new_kinetics['AutoTST_rev'] != 'not reverse')]

plt.figure(figsize=(6,6))
ax = beta_scission.plot.scatter(
                    'AutoTST_rev', 'Sarathy', 
                    loglog=True,
                    s=50, edgecolor='black', linewidth=0.5, 
                    color='lightcoral',
                    title = r"$\beta$-scission")
ax.plot(ax.get_xlim(), ax.get_xlim(), ls='--', color='gray')

ax.plot(new_kinetics.loc[r.index,'AutoTST_rev'], new_kinetics.loc[r.index,'Sarathy'], 'bo')
    
units = 's$^{-1}$'
ax.set(aspect='equal')
ax.set_xlabel(u'AutoTST')
ax.set_ylabel(u'LLNL Model')
ax.set_title( ax.title.get_text() + '\n$k$(1000K) / {}'.format(units) )
locator = matplotlib.ticker.LogLocator()
ax.xaxis.set_major_locator( locator )
ax.yaxis.set_major_locator( locator )
plt.show()
<matplotlib.figure.Figure at 0x119d7e510>

Now let's experiment with plotting both forward and reverse rates on the same plot. Also, let's check all the other PLOG rates look sensible, by marking them with a red outline.

In [33]:
r_add = new_kinetics[(new_kinetics['family'] == 'R_Addition_MultipleBond') & 
                     (new_kinetics['AutoTST_rev'] == 'not reverse')]

beta_scission = new_kinetics[(new_kinetics['family'] == 'R_Addition_MultipleBond') & 
                             (new_kinetics['AutoTST_rev'] != 'not reverse')]

plt.figure(figsize=(5,5))

ax = r_add.plot.scatter(
                    'AutoTST_fwd', 'Sarathy', 
                    loglog=True, 
                    s=50, edgecolor='black', linewidth=0.5, 
                    label = r"R addition $k$(1000K) / cm$^3$mol$^{-1}$s$^{-1}$")

# red line around the PLOG rates
r_add[r_add['SarathySource'].str.contains('PLOG') ].plot.scatter(
                    'AutoTST_fwd', 'Sarathy', 
                    loglog=True, 
                    s=50, edgecolor='red', linewidth=0.5, 
                    ax=ax, 
                   )

beta_scission.plot.scatter(
                    'AutoTST_rev', 'Sarathy', 
                    loglog=True, 
                    s=50, edgecolor='black', linewidth=0.5, 
                    color='yellow',
                    ax=ax, 
                    label=r"$\beta$-scission $k$(1000K) / s$^{-1}$",
)

# red line around the PLOG rates
beta_scission[beta_scission['SarathySource'].str.contains('PLOG') ].plot.scatter(
                    'AutoTST_rev', 'Sarathy', 
                    loglog=True, 
                    s=50, edgecolor='red', linewidth=0.5, 
                    color='yellow',
                    ax=ax, )

x_limits = ax.get_xlim()
ax.plot(x_limits, x_limits, ls='--', color='gray', zorder=0)
ax.set_xlim(x_limits)
ax.set_ylim(x_limits)

ax.set(aspect='equal')
ax.set_xlabel(u'AutoTST')
ax.set_ylabel(u'LLNL Model')
ax.legend(loc=3, bbox_to_anchor=(0., 1.02, 1., .102))
ax
plt.show()
<matplotlib.figure.Figure at 0x114d01050>

OK, now we're ready to make the final figure for our manuscript.

In [34]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15,5))

new_kinetics[new_kinetics['family'] == 'H_Abstraction'].plot.scatter(
                    'AutoTST_fwd', 'Sarathy',
                    loglog=True, 
                    xlim=(1e5,1e15),
                    ylim=(1e5,1e15), 
                    s=50, edgecolor='black', linewidth=0.5, 
                    ax=axes[0], 
                    title=u"Hydrogen abstraction\n(cm$^3$mol$^{-1}$s$^{-1}$)")
 
r_add.plot.scatter(
                    'AutoTST_fwd', 'Sarathy', 
                    loglog=True, 
                    xlim=(1e3,1e14),
                    ylim=(1e3,1e14), 
                    s=50, edgecolor='black', linewidth=0.5, 
                    ax=axes[1], 
                    label = r"R addition (cm$^3$mol$^{-1}$s$^{-1}$)")
beta_scission.plot.scatter(
                    'AutoTST_rev', 'Sarathy', 
                    loglog=True, 
                    xlim=(1e3,1e14),
                    ylim=(1e3,1e14), 
                    s=50, edgecolor='black', linewidth=0.5, 
                    color='lightcoral',
                    ax=axes[1], 
                    label=r"$\beta$-scission (s$^{-1}$)",
)
axes[1].legend(loc=3, bbox_to_anchor=(0., 1., 1., .1))

new_kinetics[new_kinetics['family'] == 'intra_H_migration'].plot.scatter(
                    'AutoTST_fwd', 'Sarathy', 
                    loglog=True, 
                    xlim=(1e-2,1e10), ylim=(1e-2,1e10), 
                    s=50, edgecolor='black', linewidth=0.5, 
                    ax=axes[2], 
                    title = u"Intramolecular H migration\n(s$^{-1}$)")


for ax in axes:
    x_limits = ax.get_xlim()
    y_limits = ax.get_ylim()
    limits = min(x_limits[0], y_limits[0]), max(x_limits[1], y_limits[1])
    ax.plot(limits, limits, ls='--', color='gray', zorder=0)
    ax.set_xlim(limits)
    ax.set_ylim(limits)

    ax.set(aspect='equal')
    ax.set_xlabel(u'AutoTST')
    ax.set_ylabel(u'LLNL Model')
    
    locator = matplotlib.ticker.LogLocator()
    ax.xaxis.set_major_locator( locator )
    ax.yaxis.set_major_locator( locator )
    
    startx, endx = ax.get_xlim()
    xticks = [10**i for i in np.arange(np.log10(startx), np.log10(endx)+1, 2)]
    ax.set_xticks(xticks)
    starty, endy = ax.get_ylim()
    yticks = [10**i for i in np.arange(np.log10(starty), np.log10(endy)+1, 2)]
    ax.set_yticks(yticks) 


plt.tight_layout()

fig.savefig('SarathyvsAutoTST.pdf', bbox_inches='tight')

Looking at clustering

This is to see how many times the same rate is used for different reactions within the Sarathy model

In [35]:
print "Radical addition"
r_add.groupby('Sarathy').count().sort_values(by='number', ascending=False)['number'].head()
Radical addition
Out[35]:
Sarathy
2.507910e+08    21
2.266478e+12    10
2.217331e+10     6
6.157821e+09     5
1.974001e+09     4
Name: number, dtype: int64
In [36]:
print "Hydrogen abstraction"
new_kinetics[new_kinetics['family']=='H_Abstraction'].groupby(['Sarathy']).count().sort_values(by='number', ascending=False)['number'].head()
Hydrogen abstraction
Out[36]:
Sarathy
5.741346e+08    18
2.984991e+09    16
7.758973e+08    14
1.565882e+10    11
3.852665e+08     7
Name: number, dtype: int64
In [37]:
print "Intramolecular H migration"
new_kinetics[new_kinetics['family']=='intra_H_migration'].groupby(['Sarathy']).count().sort_values(by='number', ascending=False)['number'].head()
Intramolecular H migration
Out[37]:
Sarathy
213277.964942    3
523177.616371    2
453491.729853    2
348785.077580    2
74803.173728     2
Name: number, dtype: int64

Now let's examine clustering in the RMG rate rule estimates

In [38]:
print "Radical addition"
new_kinetics[new_kinetics['family']=='R_Addition_MultipleBond'].groupby(['Rate Rules (New)']).count().sort_values(by='number', ascending=False)['number'].head()
Radical addition
Out[38]:
Rate Rules (New)
1.225595e+10    6
8.063886e+08    5
7.899464e+11    4
3.649628e+12    4
4.198434e+11    4
Name: number, dtype: int64
In [39]:
print "Hydrogen abstraction"
new_kinetics[new_kinetics['family']=='H_Abstraction'].groupby(['Rate Rules (New)']).count().sort_values(by='number', ascending=False)['number'].head()
Hydrogen abstraction
Out[39]:
Rate Rules (New)
9.576051e+07    52
3.756674e+08    23
7.145156e+08    22
1.140144e+09    19
3.819155e+08    18
Name: number, dtype: int64
In [40]:
print "Intramolecular H migration"
new_kinetics[new_kinetics['family']=='intra_H_migration'].groupby(['Rate Rules (New)']).count().sort_values(by='number', ascending=False)['number'].head()
Intramolecular H migration
Out[40]:
Rate Rules (New)
2.573338e+06    6
7.618256e+03    4
5.759316e+06    4
2.122009e+06    4
5.561535e+05    4
Name: number, dtype: int64

Let's see how common each nodal distance is:

In [41]:
groups = new_kinetics.groupby('Nodal distance').count()['number']
groups
Out[41]:
Nodal distance
0.000000    282
1.000000    253
1.414214    101
2.000000     91
2.236068     37
2.828427      2
3.000000     11
3.162278      1
Name: number, dtype: int64

Finally, let's see how much the rate rule estimates have changed between the earlier version of RMG and the newer

In [42]:
#sns.set_style('white')
fg = sns.FacetGrid(data=new_kinetics, hue='family', size=8, aspect=1.0, )
ax = fg.map(plt.scatter, 'Rate Rules', 'Rate Rules (New)')
ax.set(xscale="log", yscale="log")
plt.legend()
Out[42]:
<matplotlib.legend.Legend at 0x11a5176d0>
In [43]:
new_kinetics['Rate Rule Change'] = np.log(new_kinetics['Rate Rules (New)']/new_kinetics['Rate Rules'])

new_kinetics.hist('Rate Rule Change', bins=np.arange(-10.5, 11.5))
Out[43]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x11a4dc9d0>]], dtype=object)
In [ ]: