Determining Proton Affinities using psi4

This is the 3rd post in a series outlining a workflow using freely available computational chemistry resources with python interfaces to evalute properties of gas-phase ions. A cursory search illustrates that there are a variety of computational packages with a direct python interface but interestingly, not all of these packages are current. PySCF appears to be a solid choice, however, some of the documentation/examples do not provide a direct means to calculate thermochemistry. GAMESS is another option but the python wrapper for this system has not been updated in almost a year and appears only compatible with select python 2.7 installations. After testing all of these options, it became clear that psi4 provided a tractable approach to optimize the geometry of molecules followed by a detailed thermochemical and frequency evaluation. The ipynb notebook illustrates the mechanism to not only optimize the geometry of water, but also determine the proton affinity. This latter property remains essential for describing the ionization behavior of target molecules along with a host of other chemical properties. In many literature reports a more detailed treatment of the energy terms is often presented, however, as a first pass this workflow yields a result that is in good aggreement with the literature value for water.

Geometry Optimization in Python

This is the second post in a series aiming at generating a range of candidate structures for evaluation in the context of molecular modeling in the field of ion mobility spectrometry. In a previous post, the use of rdkit to generate structures was introduced. However, closer inspection of the code highlights a few funciton calls aimed at optimizing the conformer structures. Given that the tetraalkylammonium ions were the focus of that effort, the optimization step was quite rapid. This brought into question as to whether any geometry optimization was being performed. In the following jupyter notebook, ibuprofen generated from SMILES input is optimized using the same function call as found in the previous post. This degree of optimization does not reach the level needed for more advanced calculations but can be a decent start when trying to group the different conformers into structural families.

Required python modules include: rdkit

Optional modules: pymol and an instance of this program running as a server.

Conformational Searching using Python

This is the first of a series examining the use of python to generate candidate structures of molecules. These conformations may serve a variety of functions, though our particular purpose is to identify candidates for additional optimization and ultimate use in ion mobility modeling experiments. After considering a range of tools (e.g. Avogadro or ChemDraw), it was apparent that a more automated, open-source work-flow was needed. In full disclosure, there are surely other mechanisms to make this happen but the following jupyter notebook is a reasonable approach. Visualization of the conformers can be accomplished using pymol if you that module is installed and a server instance running in the background (i.e. pymol -R).

Savitzky-Golay Smoothing GUI

Simple Smoothing GUI

In an effort to create a set of simple tools that are useful for data processing and realtime analysis of data we’ve been exploring a range of tools.  Granted there are a number of canned solutions in existence (e.g. National Instruments), however, to avoid the long-term challenges of compatibility we are looking for tools that can better serve our research goals.  Two packages that we’ve began to lean more heavily upon include pyqtgraph and guidata.  Both use PyQt4 and are compatible with Pyside for GUI rendering and construction.  Matplotlib is quite mature but it has been our experience that pyqtgraph is quite a bit faster for plotting data in realtime.

The code below integrates pyqtgraph directly into the guidata framework.  This is not a huge stretch as the pyqtgraph widgets integrate directly with the QWidget class in PyQt4.  For those looking for an example the following code illustrate very simply how to integrate one of these plots and update it using simulated data along with the ability to alter the smoothing parameters of the raw data on the fly.  One might envision the use of this approach to capture data from a streaming device (more on that later). It should be noted that the file loading feature has been disabled but it would’t be a huge stretch to re-enable this functionality for single spectra.

# -*- coding: utf-8 -*-
# Adapted from guidata examples:
# Copyright © 2009-2010 CEA
# Pierre Raybaut
# Licensed under the terms of the CECILL License
# (see guidata/ for details)
# Adapted by Brian Clowers

DataSetEditGroupBox and DataSetShowGroupBox demo

These group box widgets are intended to be integrated in a GUI application
layout, showing read-only parameter sets or allowing to edit parameter values.

SHOW = True # Show test in GUI-based test launcher

import tempfile, atexit, shutil, datetime, numpy as N

from guidata.qt.QtGui import QMainWindow, QSplitter
from guidata.qt.QtCore import SIGNAL, QTimer
from guidata.qt import QtCore

from guidata.dataset.datatypes import (DataSet, BeginGroup, EndGroup, BeginTabGroup, EndTabGroup)
from guidata.dataset.dataitems import (FloatItem, IntItem, BoolItem, ChoiceItem, MultipleChoiceItem, ImageChoiceItem, FilesOpenItem, StringItem, TextItem, ColorItem, FileSaveItem, FileOpenItem, DirectoryItem, FloatArrayItem, DateItem, DateTimeItem)
from guidata.dataset.qtwidgets import DataSetShowGroupBox, DataSetEditGroupBox
from guidata.configtools import get_icon
from guidata.qthelpers import create_action, add_actions, get_std_icon

# Local test import:
from guidata.tests.activable_dataset import ExampleDataSet

import sys, os
import pyqtgraph as PG

def simpleSmooth(fileName, polyOrder, pointLength, plotSmoothed = False, saveSmoothed = True):
    if not os.path.isfile(fileName):
        return False
    rawArray = get_ascii_data(fileName)
    #savitzky_golay(data, kernel = 11, order = 4)
    smoothArray = savitzky_golay(rawArray, kernel = pointLength, order = polyOrder)
    if plotSmoothed:
        plot_smoothed(smoothArray, rawArray, True)

    if saveSmoothed:
        newFileName = fileName.split(".")[0]
    N.savetxt(newFileName, smoothArray, delimiter = ',', fmt = '%.4f')

    return smoothArray


def get_ascii_data(filename):
    data_spectrum=N.loadtxt(filename,delimiter = ',', skiprows=0)##remember to change this depending on file format
    return data_spectrum

def savitzky_golay(data, kernel = 11, order = 4):
 applies a Savitzky-Golay filter
 input parameters:
 - data => data as a 1D numpy array
 - kernel => a positive integer > 2*order giving the kernel size
 - order => order of the polynomal
 returns smoothed data as a numpy array

 invoke like:
 smoothed = savitzky_golay(<rough>, [kernel = value], [order = value]

 From scipy website
 kernel = abs(int(kernel))
 order = abs(int(order))
 except ValueError, msg:
 raise ValueError("kernel and order have to be of type int (floats will be converted).")
 if kernel % 2 != 1 or kernel < 1:
 raise TypeError("kernel size must be a positive odd number, was: %d" % kernel)
 if kernel < order + 2:
 raise TypeError("kernel is to small for the polynomals\nshould be > order + 2")

 # a second order polynomal has 3 coefficients
 order_range = range(order+1)
 half_window = (kernel -1) // 2
 b = N.mat([[k**i for i in order_range] for k in range(-half_window, half_window+1)])
 # since we don't want the derivative, else choose [1] or [2], respectively
 m = N.linalg.pinv(b).A[0]
 window_size = len(m)
 half_window = (window_size-1) // 2

 # precompute the offset values for better performance
 offsets = range(-half_window, half_window+1)
 offset_data = zip(offsets, m)

 smooth_data = list()

 # temporary data, with padded zeros (since we want the same length after smoothing)
 #data = numpy.concatenate((numpy.zeros(half_window), data, numpy.zeros(half_window)))
 # temporary data, with padded first/last values (since we want the same length after smoothing)
 data = N.concatenate((N.zeros(half_window)+firstval, data, N.zeros(half_window)+lastval))

 for i in range(half_window, len(data) - half_window):
 value = 0.0
 for offset, weight in offset_data:
 value += weight * data[i + offset]
 return N.array(smooth_data)


def first_derivative(y_data):
 calculates the derivative
 y = (y_data[1:]-y_data[:-1])
 dy = y/2#((x_data[1:]-x_data[:-1])/2)

 return dy

class SmoothGUI(DataSet):
 Simple Smoother
 A simple application for smoothing a 1D text file at this stage. 
 Follows the KISS principle.
 fname = FileOpenItem("Open file", ("txt", "csv"), "")

 kernel = FloatItem("Smooth Point Length", default=7, min=1, max=101, step=2, slider=True) 
 order = IntItem("Polynomial Order", default=3, min=3, max=17, slider=True)
 saveBool = BoolItem("Save Plot Output", default = True)
 plotBool = BoolItem("Plot Smoothed", default = True).set_pos(col=1)
 #color = ColorItem("Color", default="red")
class MainWindow(QMainWindow):
 def __init__(self):
 self.setWindowTitle("Simple Smoother")
 # Instantiate dataset-related widgets:
 self.smoothGB = DataSetEditGroupBox("Smooth Parameters",
 SmoothGUI, comment='')

 self.connect(self.smoothGB, SIGNAL("apply_button_clicked()"),

 self.fileName = ''

 self.kernel = 15
 self.order = 3 = PG.PlotWidget(name='Plot1'), y = True)

 self.p1 =
 self.p1.setPen('g', alpha = 1.0)#Does alpha even do anything?
 self.p2 = = 'y')'left', 'Value', units='V')'bottom', 'Time', units='s')

 splitter = QSplitter(QtCore.Qt.Vertical, parent = self)

 self.setContentsMargins(10, 5, 10, 5)
 # File menu
 file_menu = self.menuBar().addMenu("File")
 quit_action = create_action(self, "Quit",
 tip="Quit application",
 add_actions(file_menu, (quit_action, ))
 ## Start a timer to rapidly update the plot in pw
 self.t = QTimer()

 def rand(self,n):
 data = N.random.random(n)
 data[int(n*0.1):int(n*0.23)] += .5
 data[int(n*0.18):int(n*0.25)] += 1
 data[int(n*0.1):int(n*0.13)] *= 2.5
 data[int(n*0.18)] *= 2
 data *= 1e-12
 return data, N.arange(n, n+len(data)) / float(n)

 def updateData(self):
 yd, xd = self.rand(100)
 ydSmooth = savitzky_golay(yd, kernel = self.kernel, order = self.order)
 if self.smoothGB.dataset.plotBool:
 self.p2.setData(y=ydSmooth, x = xd, clear = True)
 self.p1.setData(y=yd*-1, x=xd, clear = True)
 self.p1.setData(y=yd, x=xd, clear = True)
 self.p2.setData(y=[yd[0]], x = [xd[0]], clear = True)

 if self.smoothGB.dataset.saveBool:
 if os.path.isfile(self.fileName):
 newFileName = self.fileName.split(".")[0]
 newFileName = "test"
 N.savetxt(newFileName, ydSmooth, delimiter = ',')#, fmt = '%.4f')

 def update_window(self):
 dataset = self.smoothGB.dataset
 self.order = dataset.order
 self.kernel = dataset.kernel
 self.fileName = dataset.fname

if __name__ == '__main__':
 from guidata.qt.QtGui import QApplication
 app = QApplication(sys.argv)
 window = MainWindow()

Gantt Charts in Matplotlib

GanttPlotLove it or hate it, the lack of a tractable options to create Gantt charts warrants frustration at times.  A recent post on Bitbucket provides a nice implementation using matplotlib and python as a platform.  In order to expand the basic functionality a few modifications enable a set of features that highlight the relative contributions of the team participants.  In the example provided above the broad tasks are indicated in yellow while the two inset bars (red:student and blue:PI) illustrate the percent effort.  See the source below for the details.

Creates a simple Gantt chart
Adapted from
BHC 2014

import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
import matplotlib.dates
from matplotlib.dates import MONTHLY, DateFormatter, rrulewrapper, RRuleLocator

from pylab import *

def create_date(month,year):
"""Creates the date"""

date = dt.datetime(int(year), int(month), 1)
mdate = matplotlib.dates.date2num(date)

return mdate

# Data

pos = arange(0.5,5.5,0.5)

ylabels = []
ylabels.append('Hardware Design & Review')
ylabels.append('Hardware Construction')
ylabels.append('Integrate and Test Laser Source')
ylabels.append('Objective #1')
ylabels.append('Objective #2')
ylabels.append('Present at ASMS')
ylabels.append('Present Data at Gordon Conference')
ylabels.append('Manuscripts and Final Report')

effort = []
effort.append([0.2, 1.0])
effort.append([0.2, 1.0])
effort.append([0.2, 1.0])
effort.append([0.3, 0.75])
effort.append([0.25, 0.75])
effort.append([0.3, 0.75])
effort.append([0.5, 0.5])
effort.append([0.7, 0.4])

customDates = []

task_dates = {}
for i,task in enumerate(ylabels):
task_dates[task] = customDates[i]
# task_dates['Climatology'] = [create_date(5,2014),create_date(6,2014),create_date(10,2013)]
# task_dates['Structure'] = [create_date(10,2013),create_date(3,2014),create_date(5,2014)]
# task_dates['Impacts'] = [create_date(5,2014),create_date(12,2014),create_date(2,2015)]
# task_dates['Thesis'] = [create_date(2,2015),create_date(5,2015)]

# Initialise plot

fig = plt.figure()
# ax = fig.add_axes([0.15,0.2,0.75,0.3]) #[left,bottom,width,height]
ax = fig.add_subplot(111)

# Plot the data

start_date,end_date = task_dates[ylabels[0]]
ax.barh(0.5, end_date - start_date, left=start_date, height=0.3, align='center', color='blue', alpha = 0.75)
ax.barh(0.45, (end_date - start_date)*effort[0][0], left=start_date, height=0.1, align='center', color='red', alpha = 0.75, label = "PI Effort")
ax.barh(0.55, (end_date - start_date)*effort[0][1], left=start_date, height=0.1, align='center', color='yellow', alpha = 0.75, label = "Student Effort")
for i in range(0,len(ylabels)-1):
labels = ['Analysis','Reporting'] if i == 1 else [None,None]
start_date,mid_date,end_date = task_dates[ylabels[i+1]]
piEffort, studentEffort = effort[i+1]
ax.barh((i*0.5)+1.0, mid_date - start_date, left=start_date, height=0.3, align='center', color='blue', alpha = 0.75)
ax.barh((i*0.5)+1.0-0.05, (mid_date - start_date)*piEffort, left=start_date, height=0.1, align='center', color='red', alpha = 0.75)
ax.barh((i*0.5)+1.0+0.05, (mid_date - start_date)*studentEffort, left=start_date, height=0.1, align='center', color='yellow', alpha = 0.75)
# ax.barh((i*0.5)+1.0, end_date - mid_date, left=mid_date, height=0.3, align='center',label=labels[1], color='yellow')

# Format the y-axis

locsy, labelsy = yticks(pos,ylabels)
plt.setp(labelsy, fontsize = 14)

# Format the x-axis

ax.set_ylim(ymin = -0.1, ymax = 4.5)
ax.grid(color = 'g', linestyle = ':')

ax.xaxis_date() #Tell matplotlib that these are dates...

rule = rrulewrapper(MONTHLY, interval=1)
loc = RRuleLocator(rule)
formatter = DateFormatter("%b '%y")

labelsx = ax.get_xticklabels()
plt.setp(labelsx, rotation=30, fontsize=12)

# Format the legend

font = font_manager.FontProperties(size='small')

# Finish up

XKCD-style Plots in Matplotlib

Now incorporated directly into the latest version of matplotlib (v1.3) here is a great alternative that brings some style to your plotting routines. I haven’t tried it out on plots with a huge number of points but I imagine it should work just fine.  Below are some simple examples.  Simple as matplotlib.pyplot.xkcd()…

Pseudo-Random Sequence with XKCD:




Cheers Jake Vanderplas:

More Examples: