Pages

Sunday, December 30, 2012

CTR: resolving dependencies in Groovy with Grapes

+Mark Fortner commented on Google+ that my script was missing a @Grab statement. I had seen that mentioned before, but never looked at it. It turns out the be very useful, and it makes Groovy scripts standalone. That is, it will resolve the missing dependencies, using Maven repositories. Fortunately, CDK modules are available from repositories, e.g. the one at Plovdiv University, and I gave it a try. My first attempt went bad, but +Nina Jeliazkova explained me what Maven mistake I was making, and now I got a working setup:


@GrabResolver(
  name='idea',
  root='http://ambit.uni-plovdiv.bg:8083/nexus/content/repositories/thirdparty/'
)
@Grapes([
  @Grab(
    group='org.openscience.cdk',
    module='cdk-io', version='1.4.11'
  ),
  @Grab(
    group='org.openscience.cdk',
    module='cdk-silent', version='1.4.11' 
  )
])

So, depending on the exact script, something like the above will remove any need of setting CLASSPATHs. The above Groovy code is for counting heavy atoms.

Saturday, December 29, 2012

CTR #3: Report the similarity between two structures

The next CTR I picked is not particularly hard either, given the functionality provided by the CDK. In fact, the fingerprinting functionality I will use for this CTR is actually one of the most used and oldest features of the CDK. CiteULike has a list of 26 papers using the CDK fingerprinting functionality. The CDK 1.4.x API returns a Java BitSet and we can use the Tanimoto class to calculate the matching similarity values with it:
import org.openscience.cdk.fingerprint.*;
import org.openscience.cdk.smiles.*;
import org.openscience.cdk.silent.*;
import org.openscience.cdk.similarity.*;

smilesParser = new SmilesParser(
  SilentChemObjectBuilder.getInstance()
);
smiles1 = "CC(C)C=CCCCCC(=O)NCc1ccc(c(c1)OC)O"
smiles2 = "COC1=C(C=CC(=C1)C=O)O"
mol1 = smilesParser.parseSmiles(smiles1)
mol2 = smilesParser.parseSmiles(smiles2)
fingerprinter = new HybridizationFingerprinter()
bitset1 = fingerprinter.getFingerprint(mol1)
bitset2 = fingerprinter.getFingerprint(mol2)
tanimoto = Tanimoto.calculate(bitset1, bitset2)
println "Tanimoto: $tanimoto"

CTR #2: Depict a compound as an image

This one was relatively easy, and roughly based on the first CDK-JChemPaint tutorial. Key aspects are the SMILES parsing, 2D coordinate generation with the StructureDiagramGenerator. The solution does not render the structure's title yet. I have do not have a solution for that right now (the CDK may; I am not sure).

import java.util.List;
import java.awt.*;
import java.awt.image.*;
import javax.imageio.*;
import org.openscience.cdk.silent.*;
import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.layout.*;
import org.openscience.cdk.renderer.*;
import org.openscience.cdk.renderer.font.*;
import org.openscience.cdk.renderer.generators.*;
import org.openscience.cdk.renderer.visitor.*;
import org.openscience.cdk.smiles.SmilesParser;
import org.openscience.cdk.templates.*;
import org.openscience.cdk.renderer.generators.BasicSceneGenerator.Margin;
import org.openscience.cdk.renderer.generators.BasicSceneGenerator.ZoomFactor;

smiles = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
int WIDTH = 200
int HEIGHT = 250
Rectangle drawArea = new Rectangle(WIDTH, HEIGHT);
Image image = new BufferedImage(
  WIDTH, HEIGHT, BufferedImage.TYPE_INT_RGB
);
smilesParser = new SmilesParser(
  SilentChemObjectBuilder.getInstance()
)
molecule = smilesParser.parseSmiles(smiles)
StructureDiagramGenerator sdg =
  new StructureDiagramGenerator();
sdg.setMolecule(molecule);
sdg.generateCoordinates();
molecule = sdg.getMolecule();
List generators =
  new ArrayList();
generators.add(new BasicSceneGenerator());
generators.add(new BasicBondGenerator());
generators.add(new BasicAtomGenerator());
AtomContainerRenderer renderer =
  new AtomContainerRenderer(
    generators, new AWTFontManager()
  );
renderer.setup(molecule, drawArea);
model = renderer.getRenderer2DModel();
model.set(ZoomFactor.class, (double)0.9);
Graphics2D g2 = (Graphics2D)image.getGraphics();
g2.setColor(Color.WHITE);
g2.fillRect(0, 0, WIDTH, HEIGHT);
renderer.paint(molecule, new AWTDrawVisitor(g2));
ImageIO.write(
  (RenderedImage)image, "PNG",
  new File("CTR2.png")
);

CTR #1: Heavy atom counts from an SD file

The first Chemistry Toolkit Rosetta task is to count the number of heavy atoms in the structures given in a MDL SD file. This tasks starts with an SD file and counts for each structure in the file the number of heavy atoms (non-hydrogen atoms). Because we simply handle the structures one by one, the solution uses the IteratingMDLReader reader. The input file (benzodiazepine.sdf.gz) is a gziped file, which we handle by using a GZIPInputStream. Because we want to make sure the input file does not have any unexpected content, we use the STRICT mode. The input file turns out to do not have non-standard features, so that we do not have to worry about D and T element symbols.

The solution lists all heavy atom counts:

import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.io.*;
import org.openscience.cdk.io.iterator.*;
import org.openscience.cdk.silent.*;
import org.openscience.cdk.tools.manipulator.*;
import java.util.zip.GZIPInputStream;

iterator = new IteratingMDLReader(
  new GZIPInputStream(
    new File("benzodiazepine.sdf.gz")
      .newInputStream()
  ),
  SilentChemObjectBuilder.getInstance()
)
iterator.setReaderMode(
  IChemObjectReader.Mode.STRICT
)
while (iterator.hasNext()) {
  mol = iterator.next()
  heavyAtomCount = 0
  for (atom in mol.atoms()) {
    if (1 == atom.atomicNumber ||
        "H".equals(atom.symbol)) {
      // do not count hydrogens
    } else {
      heavyAtomCount++
    }
  }
  println heavyAtomCount
}

Chemistry Toolkit Rosetta: common cheminformatics tasks

The Chemistry Toolkit Rosetta wiki was set up some time ago by Andrew Dalke to demonstrate how certain basic cheminformatics tasks are done in the various cheminformatics toolkits around. I think it is a great idea, but never found enough time to do much with it, unfortunately. But it is holiday now, which is a time to take your mind of your work, and then some random hacking with the CDK is what I like to do.

In a series of blog posts I will attempt to solve all 18 CTR tasks, and use this post to keep track of the coverage. The currently listed tasks are given in the below list, and link to the tasks description wiki page, where I will upload my solution. As I will use the solutions also as a chapter in my Groovy Cheminformatics with the Chemistry Development Kit book, I will use Groovy as the programming language. I will also create one blog post for each solution, giving details.

The CTR tasks:

Tuesday, December 25, 2012

Groovy Cheminformatics with the CDK: 7th edition

Already the 7th edition of my Groovy Cheminformatics with the Chemistry Development Kit book (and PDF eBook). It has been almost two years since the first release and has grown from an initial 72 pages to 212 pages today. There is still a lot I still want to write about, but only during the holidays I have time for it. New content includes:
  • Chapter 6. Reactions
  • Chapter 7. From IChemObject to IChemFile
  • Section 17.1.2. Stereoisomerism (in InChIs)
  • Rewrote Chapter 20. Documentation
  • Appendix D.1. The Readers and Writers (e.g. listing all IO options)
The latter looks like this section:


I also like to stress that all literature references in the book are freely available from CiteULike, and that page looks like:


The group's RSS feed gives a good idea on what I am writing about.


Tuesday, November 27, 2012

Triples, stores, and SPARQL in R

For some time I have been stealing an hour here and there for the rrdf package for R. This package is based on Apache Jena and allows reading and writing of RDF triples, as well as doing local and remote SPARQL querying. BTW, rrdf is not only R package to provide SPARQL functionality, and another package will be demoed at SWAT4LS.

It took me some time to get around to it, but I finally set up a vignette with Sweave for this package, but here it is, explaining some basic functionality of the package, just in time for #swat4ls:



So, with a week or so, I used <iframe> in the blog a few times now. The above one is being served by Google Drive.

Friday, November 23, 2012

A Mendeley group for @Open_PHACTS

The past few months has seen an increasing paper trail for our Open PHACTS projects. Lot's of cool stuff is ongoing, and more and more is getting openly available. There is a steep learning curve within the project on being Open, and the project makes sure it is done properly. But it takes time. With the Open Standards and Open Source getting out now, I think we have a reasonable start.

Yesterday, I created a Mendeley group with our paper trail:


The purpose was to get some #altmetrics on the impact of our project (I  blogged about that a year ago). We haven't started tagging the papers, and comments on useful tags are most welcome. Should we tag with matching Example Application, with consortium partner, both? Something else? Let me know.

Of course, Mendeley is also an attractive platform, with Word/OpenOffice/LibreOffice plugins for reference management, and it provides nice web pages for papers with possible direct links to OA PDFs and bookmark statistics. For example, for this paper:


We can see the paper detail in the middle, and get added value on the right. We see a thumbnail of the paper, a list of authors with Mendeley profiles (here only Rob Hooft), and then the reader statistics, and learn that 129 people have taken the time to put this paper in the reference database.

In fact, that is quite a lot. You can believe me, or you can look up the numbers. That is what #altmetrics is about. We can use altmetric.com and find these numbers:


Not just does such #altmetrics give us a number, it actually tells us who, what, why, and how. Much more informative than, let's say, a journal impact factor. This is scientific communication in action.

What this page does not tell us, is whether 46 is high, though the page does comment that this paper "is one of the highest ever scores in this journal (ranked #6 of 1,586)". Now, this is a Nature Genetics papers, and more than 1500 Nature Genetics papers got a value, and this paper is ranked #6! Yes, that is impact.

Total Impact gives further detail, but is called ImpactStory now. I ran this on output from our project, papers but also software and slides. One neat and really useful feature of this webpage is that it provides percentiles on data, in more detail than the comment from altmetric.com:


We get an detailed of view on where the impact is found, and the percentile information. Here too, we learn that this paper has a relative high impact, compared to peers: it is in the top 3% of papers by impact. Interestingly, the Mendeley reader count was not picked up (update: this was tracked down as a data glitch in the Mendeley database). Mind you, the percentiles for 2012 are not yet available; we have to wait a month or two for those.

And, all counts are linked. Just click on, for example, "72 tweets" (using Topsy) and you get the actual tweets, and learn what people have to say about this paper:


Once more, this is scientific communication in action!

But to do full justice to Euan Adie's altmetric.com work, that side captures the blogosphere pretty well (not suprisingly, given it is Euan). Just check the screenshot above again.

Sunday, November 18, 2012

DHSs and histone modifications: methylation, acetylation, citrullination, and phosphorylation

One day on, and still struggling with the chemistry behind gene regulation. Let no biologist ever tell me again not to use acronyms (yes, I am looking at you!). But it is interesting. I learned a lot about ChIP, histone modifications, etc, etc. This is an amazing world, where specific histone complex protein residues get methylated, acetylated, citrullinated, and phosphorylated. Of course, all this is in the context of the ENCODE meeting we have tomorrow at BiGCaT, where I will try to cover a paper by Thurman et al.

In that paper, Thurman studies the links between DNase I hypersensitive sites (DHSs) and markers of regulation. These DHSs are areas between histones where the DNA is free of histone proteins. There are remarkable images around showing histones as beads on a string, and the distances in nucleotides between histones is in fact not that large. In fact, a histone, despite a large complex, sterically hindering 50% of the DNA access does not stop translation; the transcription complexes apparently have no trouble passing the histones, as described by Felsenfeld et al. Quite amazing!

Now, those histones are chemically modified with acetyl, methyl, phosphates, and other groups. At well-describes residues, and each easily regulates modification of other steps. And everything regulates gene expression. Oh, and as we say yesterday, all that is regulated by metabolites, which in turn... Lovely. Try modeling that mathematically :) Here's what Abcam has to say about it:

Acetylation is generally linked to gene activation. Acetylation on Lys-10 (H3K9ac) impairs methylation at Arg-9 (H3R8me2s). Acetylation on Lys-19 (H3K18ac) and Lys-24 (H3K24ac) favors methylation at Arg-18 (H3R17me). Citrullination at Arg-9 (H3R8ci) and/or Arg-18 (H3R17ci) by PADI4 impairs methylation and represses transcription. Asymmetric dimethylation at Arg-18 (H3R17me2a) by CARM1 is linked to gene activation. Symmetric dimethylation at Arg-9 (H3R8me2s) by PRMT5 is linked to gene repression. Asymmetric dimethylation at Arg-3 (H3R2me2a) by PRMT6 is linked to gene repression and is mutually exclusive with H3 Lys-5 methylation (H3K4me2 and H3K4me3). H3R2me2a is present at the 3' of genes regardless of their transcription state and is enriched on inactive promoters, while it is absent on active promoters. Methylation at Lys-5 (H3K4me), Lys-37 (H3K36me) and Lys-80 (H3K79me) are linked to gene activation. Methylation at Lys-5 (H3K4me) facilitates subsequent acetylation of H3 and H4. ... ...

And that goes on for a while. Ambitiously, I started converting things I read into a WikiPathways:


I think that will keep me busy for a while. I won't even attempt to complete it further tonight. I have given up on that about an hour ago. In fact, I returned to the paper by Thurman, as I still have to figure out how their experimental methods work. In fact, how does one even detect the chemical modification of a histone, and to which DNA sequence on any of the chromosomes it belongs?? I mean, that's not AFM or STM, I say...

No, it's ChiP. ChIP on a chip, in fact. They have antibodies are stick particularly to a histones with one particular modification. That is how I actually ended up on that Abcam web page in the first place. Check out this nice western blot. With a huge antibody detecting whether there is an acetyl modification. Wicked!

Well, earlier I learned that proteins detecting methylated CpG bases not because of the methyl group (which amazed me already), but by a distorted hydration in the major groove due to MeCP2 binding. Seriously! Eat that, organic chemist friends!

So, Thurman and friends find distal DHSs and relate these to cis-regulatory elements. To some extend, puzzling, because the above tells us that a lot of regulatory work is happening outside those DHSs. But then again, I did read today about DNA methylation triggering histone modifications. It seems there is so much interactions going on, that it resembles a melting pot. Oh wait, that makes sense; it's one big one pot synthesis anyway.

The paper discusses an enormous amount of experimental work, and I cannot seem to be able to make sense of it all. There are striking aspects to it, which I will touch upon momentarily. But I cannot help but mentioning that I am not sure they could either. Their Discussion section leaves something to be desired, like an actual discussion. Instead, they just summarize the paper.

They used ChIP with Cell Signaling's 9751 antibody recognizing H3K4me3, with formaldehyde-induced crosslinking. It actually turns out, that the peaks for this modification are right on top of the DNA part from which the transcript is made, in line with Felsenfeld's observation. Upstream of that, where the promotors are expected, that is where DNase I signals are found. That is, I think this means that the DHS upstream of the histone where transcription starts is where the promotor regulation happens. With transcription factors (TFs), of course. And in those DHS regions, that is where DNA methylation happens, and Thurman finds DNA methylation in those regions, inhibiting TFs binding, because the already mentioned MeCP2 already takes that place.

Now, then they make a jump from this low level chemistry, to a genome wide landscape. Well, they actually start with that, but as a chemist, I am more of a bottom-up guy (that is an IT method). They report that most DHSs are found in introns and at distal locations. The first is striking: the ratio between intron/exon is >99. Does that imply that exons basically are always DNA wrapped around histones?? Does that actually then tell me that transcription actually sort of requires steric hindrance of the histone?? Ha, those diagrams biologists would be even more misleading that they have been to me (don't ask me how long it took me to learn that there are some 10-40 mitochondria per cell! and I still do not know if all copies in the cell have the same DNA, or if they are more like a population like your microbiome).

Now, distal DHSs are the second largest group, and capture some 40-45% of all DHSs. Distal means typically more than 2.5 kb away from the TSS (transcriptional start sites). Most of them are somewhere between 10 and 50 kb away. Now, isn't that something? That is distant indeed!

What? Still with me? Let's do some math. It's hard, and I hope to get it right. A human has about 3 billion base pairs (I'll take the WikiPedia count). The paper finds almost 3 million DHSs. That means that the average distance between DHSs is about 1 kb. Compare that to their diagram 1b, outline in the previous paragraph. That means that the DHSs must be very densely placed around the transcribed genes. Indeed, they report ratios of up to and above a 100 fold increase. It must be like that, because otherwise, you cannot get those distances for distal DHSs.

Now, another interesting aspect of the paper, is that they find different DHSs for different cell types. That, in fact, increases the average distance between DHSs: those 3 million they find is for 125 cell lines, and more DHSs are found in less then 20 cell lines. Only promotor-related DHSs seem to be more persistent between cell lines. This implies that different cell lines, have different genes unfolded in nucleosome/DHS rich areas (defining the chromatin accessibility), triggering different gene expression. That all makes sense, and rather existing too. As such, it seems to me that this map effectively gives a predictive model, indicating which genes are expressed in which cell types.

A further question they ask is if DNA (not histone) methylation is the cause of the result of DHSs. The confirm earlier found correlation between DNA methylation and gene silencing. They basically question if the things like MeCP2 binding happen because no transcription factor is in the way, or that TF cannot bind because MeCP2 is there. Chemically, these are perhaps equivalent: they have competing binding affinities. Except that the methylation must happen at some point too. The suggest that that may be due DNA getting randomly methylated, perhaps not unlike passive demethylation. Chemically, that does not make sense to. I would guess there are many chemical species in the cell that would get more easily methylated... They believe to have found evidence for passive deposition, but also find positive correlation between methylation and gene expression. I would say, the answer is still out there.

OK, that's about how far I got now. The last two pages I have to read again, and see what papers I need to read to make sense of that. And I will try to see what others have been saying about this paper. One hooray for #altmetrics!

ResearchBlogging.orgThurman, R., Rynes, E., Humbert, R., Vierstra, J., Maurano, M., Haugen, E., Sheffield, N., Stergachis, A., Wang, H., Vernot, B., Garg, K., John, S., Sandstrom, R., Bates, D., Boatman, L., Canfield, T., Diegel, M., Dunn, D., Ebersol, A., Frum, T., Giste, E., Johnson, A., Johnson, E., Kutyavin, T., Lajoie, B., Lee, B., Lee, K., London, D., Lotakis, D., Neph, S., Neri, F., Nguyen, E., Qu, H., Reynolds, A., Roach, V., Safi, A., Sanchez, M., Sanyal, A., Shafer, A., Simon, J., Song, L., Vong, S., Weaver, M., Yan, Y., Zhang, Z., Zhang, Z., Lenhard, B., Tewari, M., Dorschner, M., Hansen, R., Navas, P., Stamatoyannopoulos, G., Iyer, V., Lieb, J., Sunyaev, S., Akey, J., Sabo, P., Kaul, R., Furey, T., Dekker, J., Crawford, G., & Stamatoyannopoulos, J. (2012). The accessible chromatin landscape of the human genome Nature, 489 (7414), 75-82 DOI: 10.1038/nature11232

ResearchBlogging.orgFelsenfeld G, Boyes J, Chung J, Clark D, & Studitsky V (1996). Chromatin structure and gene expression. Proceedings of the National Academy of Sciences of the United States of America, 93 (18), 9384-8 PMID: 8790338

Saturday, November 17, 2012

The chemistry of DNA modifications for gene regulation

I have started learning about epigenetics, and particularly the regulatory effects of DNA methylation and histone acetylation. It's cool, it's hot, it's everything we hope will explain genetics, because genes certainly did not.

The chemistry behind this involves interesting pathways, involves storage of information that passes from one generation to another... epigenetic effects down to the grandchild generation have repeatedly been shown now. I likely candidate are mRNAs that persist beyond the cell division, which trigger modifications again. Well, that is cool chemistry indeed! So, the chemist in me asks: so where are residues actually methylated then? I am learning here, and trying to get the facts together. But, the bases seem to be one place, blocking interactions with DNA-binding proteins which can show beautiful residue/base pair interactions at the sides of the bases. Second year students at Maastricht University in Biomedical Sciences had this as part of their practical last year.

But for that genetic information to pass around and persist, and for gene regulation in general, there are brilliant pathways, which may involve metabolites, like butyrate, which acts as energy source in certain systems. Donohoe et al. report work around a pathway for histone acetylation, where they found an interaction with the Warburg effect. While in both cases butyrate triggers an increased acetylation, the mechanism is different. They propose this pathway, which I am making available on WikiPathways (CC-BY):


The page on WikiPathways is not complete yet, but I haven't completed reading the full paper yet. I wonder how many of these pathways are known. Do you know one? Leave a DOI/PubMed ID in the comments, or add the pathway to WikiPathways yourself.

ResearchBlogging.orgDonohoe, D., Collins, L., Wali, A., Bigler, R., Sun, W., & Bultman, S. (2012). The Warburg Effect Dictates the Mechanism of Butyrate-Mediated Histone Acetylation and Cell Proliferation Molecular Cell DOI: 10.1016/j.molcel.2012.08.033

Sunday, November 11, 2012

CDK 1.4.15: the changes, the authors, and the reviewers

At some point I had thought that I could finally concentrate on master. We have enough regressions there, of various kinds, some 40-50 unit tests that did pass properly in the past. Various core changes that increase the accuracy of our library have the nasty side effect that they uncover certain assumptions. But let's not talk about master yet, and focus on the 1.4.15 release (download here). Unlike I had hoped, a lot changed since the 1.4.14 release. On the bright side, CDK 1.4 is getting more and more reliable with every minor release.

I major addition in this release is that of a data model for double bond stereochemistry, making the CDK now handle to two most common forms of stereochemistry for small molecules. It must be stressed that not all IO classes are reading data into this data model yet. The interface looks like (the full JavaDoc is found here):

  public interface IDoubleBondStereochemistry
  extends IStereoElement {
    public enum Conformation {
        TOGETHER,  //  as in Z-but-2-ene
        OPPOSITE   //  as in E-but-2-ene
    }
    public IBond[] getBonds();
    public IBond getStereoBond();
    public Conformation getStereo();
  }

Other new functionality include an alternative aromaticity checker, which is happy to mark rings aromatic even if the ring has double bonds pointing outside the ring (e.g. benzoquinone). That means, we now how two algorithms in the CDK to perceive aromaticity.

Otherwise, there is a truck load of fixes. One really important one, is the fix that ensures that stereochemistry is also cloned(). Other fixing include minor atom typing work, including new selenium atom types, the further generalization of the IO accepts() methods, and a fix in the SDG code to not delete bridging hydrogens before doing structure clean up. There are many more small fixes and tunes, and as always, the full list is given below.

The changes
  • Fixed a bug present in many readers: it would not accept a subclass if ChemFile (e.g. NNChemFile) even if ChemFile itself was accepted bc30798
  • Fixed loading of the right class when reporting possible alternative constructors 420533e
  • [bug:1275] added check to ensure that when String.substring is called the string is long enough 5a7baa4
  • [bug:1274] added conditional to ensure that when multiple bond stereo is specified as attributes and characters only one is used. This is achieved by using the existing flag to determine if a stereo bond value has already been provided. 0e97eff
  • Updated Gilleain's code to hook in with the other two new selenium atom types 9fb2dd3
  • Missing Se.2 atom type and test case 21186c0
  • Added finally cause to ensure the file is closed a50c303
  • - added unit test to demonstrate the bug a6d3d6e
  • added unit test to demonstrate bug and correct bug id's for two recents tests 0ea90e8
  • Adding CMLReaderTest of io module test suite fefc82d
  • Resolved NoNotify fails on AtomParity. Error was due to subclassing of AtomParity. Also the assertEquals params were swapped as the assertion was the wrong way. 46ffbb4
  • Added unit tests for bugs 1270 (removalAllElements should remove stereo elements) and 1273 (double bond stereo chemistry constructor should throw an exception on wrong input). Added @cdk.bug tag for bug 1264 (stereo element cloning) 938306a
  • Used return covariance on clone() to provide cleaner front-end API c3d4af0
  • Added deep cloning of stereo element to atom containers and polymers (atom container subclass) a46545d
  • Added stereo element shallow copy 7a1f243
  • Added a 'map' method on all IStereoElements. The map method allows a stereo element on one container/molecule to mapped to a stereo element on another. This mapping is achieved using two symbol tables, one for atoms and one for bonds. All methods are null safe and the mapping will not fail if any content in the stereo elements is null. The mapping simplifies the cloning of molecules/atom containers but could also be used when comparing isomorphic graphs. 267ec79
  • Reworked AtomContainer.clone() so it is clear what is going on. We now use a HashMap between the original and cloned atoms to avoid to a linear search each time the atom mapping is needed. This is also useful when we add StereoElement cloning (not yet implemented). We also store a bond mapping as well - we will need the bond mapping for double bond stereo chemistry. The stereo elements in the clone need to be set to an empty array on the clone so we don't remove elements from the original (cloning odditity). It was also clear we need to change the clone() method on Polymer which currently undoes all the cloning work we do in the AtomContainer. For all clone instances I added some code to correctly create HashMaps that won't need to resized. The default HashMap implementation works best at 0.75 capacity - we therefore need to do some simple arithmetic to ensure we don't get a resize. The implemented method is what is used in the Guava library. f98b04a
  • Added unit tests for DoubleBond and AtomParity cloning a7e4255
  • Added ability to setStereoElements - this was required due to clone() being shall on List. We need to be able to set a new array when have cloned a AtomContainer 84f8f0e
  • Added unit test for tetrahedral chirality stereo element 83333d7
  • Fixed closing (fixes #1265) a9a27ed
  • Added cdk-silent dependency for test-renderextra 2afae5c
  • Added renderextra to dist-large and test-all targets 8178890
  • Removed redundant code from ChemObj clone - the existing code did exactly what the copy constructor of HashMap does and thus provides a cleaner implementation 8f7c0ab
  • Added removal of stereo elements in 'removeAllElements()' - documentation has been updated d1e8fa6
  • Added check to ensure a DoubleBondStereochemistry is never created with more then 2 bonds - this would cause errors with some methods. f8f98fe
  • Removed print to standard out from ChemObjectBuilders ab6c308
  • Removed redundant code - we don't need to check whether the bond is already in the container as we create a new instance. We also don't need to check the array size as this is done by addBond(IBond) c7786c9
  • Moved TetrahedralChirality from data to core. 0e41b05
  • Added unit test and @TestMehtod annotaitons for new 'isEmpty' mehtods aa0f969
  • add isEmpty() to classes/interfaces ChemModel, AtomContainerSet, ReactionSet, Crystal and AtomContainer f0c14fe
  • Be more informative when the test fails 8b8a848
  • Added missing test annotation bbf00f6
  • Documenting new method and extended unit test 74d206e
  • Removed SVN tags, as suggested by the reviewer 4e35feb
  • Removed cdk.create dates, as suggested by the reviewer 49b6541
  • Testing that benzoquinone is perceived as aromatic using the alternative detection method. 702d8ba
  • Because the placement of double bonds is not deterministic, we cannot be sure we always get them at the same location. Better is to just test that all carbons are perceived as C.sp2 and that they are aromatic. 77a778f
  • Added an alternative aromaticity perception model, which is happy about double bonds pointing outwards from aromatic rings b5bf695
  • Added the missing S.2minus atom type for selenide 367ff4f
  • There is no Se.2 atom type in the CDK; the perception seems to match Se atoms with two neighbors; I added two unit tests for the changed code, assuming one and two implicit hydrogens 48e6060
  • Added a missing import and dependency f400a5e
  • Added aromaticity-based perception: N.planar3 67c27bd
  • Added a null check and return immediately (fixes #1260) 0a981a4
  • Ignore this failing test case; it was one of the original points at which we decided new tools were needed 8adfd90
  • Added double bond stereo 5b644f9
  • Added a data model for double bond stereochemistry 76577b4
  • Added similar testing to reader and writers to fix four further unit tests for support of matching against some IChemObject interface class 47f35b9
  • Fixed the readers and writers to also accept the matching interfaces (fixes #3553780) bfc674a
  • Test that the reader and writers also "accept" the interfaces they support, see bug #3553780 f30f9a3
  • Added a unit test for the JCP bug report for the SDG about briding hydrogens 4d3db46
  • simplify by calling getConnectedBondsCount() 320a21d
  • only delete non-multibond H's; fixes JCP issue 8 d0d785c
  • Fixed the unit test, similar to commit d1da5276dae4a21a4c45d9fa41816be5eb646b4aa: the compound is aromatic. 81d7be7
  • Adds a unit test and fix for the loading of atom pair descriptors. a6ab39c
The authors

26  Egon Willighagen
25  John May
 3  Ralf Stephan
 2  Stephan Beisken

The reviewers

25  Egon Willighagen 
17  John May 
 2  Rajarshi  Guha 

Brushing my biology: cool diagram of chromosomes in the nucleus

I already mentioned this ENCODE discuthon we have next week. As I have to discuss stuff about hypersensitive DNA regions, I have to seriously brush up my biology. Brush up?? That suggests there was a decent basis. Well, think again. I though history was much more interesting!

I am a chemist. When people talk about the DNA in the cell, I always considered a single molecule. Until I learned there are 46 chromosomes. So, each cell actually has 92 DNA molecules. That was a revelation I had somewhere in my second or third year at the university. Remember, I did not have biology in secondary school.

Anyway, no book ever showed me what that really looks like. Yeah, schema. Cell diagrams with one mitochondrion in the cell. Well, bloody yes, the cell has very many of them, thank you very much. It just did not fit the diagram, I guess...

So, I just ran into this way cool figure from the Three-Dimensional Maps of All Chromosomes in Human Male Fibroblast Nuclei and Prometaphase Rosettes paper by Bolzer et al. (doi:10.1371/journal.pbio.0030157). I actually ran into the WikiPedia version of it, in the chromosome article. It's an adaption, but the original is much better even.



I can just wish they had added a Jmol applet with the 3D rods, rather than these static images.

But I find the flatness at 90o weird... what is the story behind that? Is their method not really or not fully 3D? I guess I will have to find some time to read up on their Methods section...

Saturday, November 10, 2012

Java Puzzlers and FindBugs. Running then on the CDK silent module

The CDK has been using PMD for quite a while yet, but there is another tool, called FindBugs. I had seen this before, but until I watched two Java Puzzler videos John sent me (here is one), I had not used that much. There is a nice Eclipse plugin, and you can run it on any Java package.

As I was procrastinating anyway (I should be preparing my core teaching qualifications portfolio and prepare a presentation on the accessibility of chromatin in the human genome, in an upcoming ENCODE discuthon. Mind you, you have a lot of individual DNA molecules in your cells! Just your core 92 to start with and then your mitochondrial DNA (does all mitochondria in one cell have the same DNA??), and the fact that your DNA in your toes is unlikely to be the same on your nose (I learned that at the DiXA meeting in Berlin), or that your microbiome with its own DNA has a huge influence on your well-being?? Well, that was our dinner table discussion anyway).... (still breathing...)

... so, while I was procrastinating, I ran FindBugs on the org.openscience.cdk.silent package. I mean, what could possible go wrong? ...


Thirteen possible problems! I mean, seriously, this is the core of the CDK!
Here are three unit tests to uncover three of the found issues. If you like, take them "CDK Puzzlers"...

  @Test public void testCompare_MassNumberIntegers() {
    Isotope iso = new Isotope(Elements.CARBON);
    iso.setMassNumber(new Integer(12));
    Isotope iso2 = new Isotope(Elements.CARBON);
    iso2.setMassNumber(new Integer(12));
    Assert.assertTrue(iso.compare(iso2));
  }

So, we create two 12C isotopes, which should be the same. Of course, this test failed. The culprit is a == comparison in the compare() methods code, and Integer objects are not the same. Doing the same starting with ints goes better, even after casting, and the next test does not fail:

  @Test public void testCompare_MassNumber() {
    Isotope iso = new Isotope(Elements.CARBON);
    iso.setMassNumber(12);
    Isotope iso2 = new Isotope(Elements.CARBON);
    iso2.setMassNumber((int)12.0);
    Assert.assertTrue(iso.compare(iso2));
  }

And, indeed, using Integer.valueOf() helps too, something that PMD is keen on suggesting too: the next unit test runs fine too:

  @Test public void testCompare_MassNumberIntegers_ValueOf() {
    Isotope iso = new Isotope(Elements.CARBON);
    iso.setMassNumber(Integer.valueOf(12));
    Isotope iso2 = new Isotope(Elements.CARBON);
    iso2.setMassNumber(Integer.valueOf(12));
    Assert.assertTrue(iso.compare(iso2));
  }

The others two issues I wrote tests for, have the same underlying issue, causing these two tests to fail. But note that now we do not even need to use objects:

  @Test public void testCompare_ExactMass() {
    Isotope iso = new Isotope(Elements.CARBON);
    iso.setExactMass(12.000000);
    Isotope iso2 = new Isotope(Elements.CARBON);
    iso2.setExactMass(12.0);
    Assert.assertTrue(iso.compare(iso2));
  }

  @Test public void testCompare_NaturalAbundance() {
    Isotope iso = new Isotope(Elements.CARBON);
    iso.setNaturalAbundance(12.000000);
    Isotope iso2 = new Isotope(Elements.CARBON);
    iso2.setNaturalAbundance(12.0);
    Assert.assertTrue(iso.compare(iso2));
  }

The tests are filed as patch here.

I think my peer review has just become a bit more tough...

Wednesday, November 07, 2012

The #OpenScience Working Group needs you

Since some time I have been member of the Open Science working group of the Open Knowledge Foundation. As such, I organized lunch meetings in Stockholm about Open Science (join this mailing list) and participated in working group efforts, such is running Is It Open Data (RIP) on the HCLS LODD data sets (many of them turned out to not be Open at all). Also, I love to have Open Science lunch meetings in Maastricht (and/or in Eindhoven), and if you do too, join this mailing list.

But the working group does a lot more, and this week Jenny Molloy send out a call for participation. There are a lot of possibilities; she wrote:
    Dear All,
    
    As we have grown to over 400 people on the mailing
    list and activities have expanded it would be great
    to form a committee with a few more people on board to
    make sure the working group is as effective as
    possible in achieving our mission of opening up
    science and scientific research outputs.
    
    If you'd be interested in getting more involved in
    running the group and our activities and projects,
    get in touch! It would also be great to have
    representatives in local areas willing to act as
    open science champions in their own countries or
    cities.
    
    The types of roles we'd like to fill are below, as
    well as a reminder of some of the projects we've got
    on the go. The time commitment will be flexible and
    relatively low, but it will make a big difference to
    have someone keeping an eye on specific areas! If you
    know anyone not on the list who might be interested
    in getting involved, please forward this message to
    them.
    
    Working group coordinator (working with Jenny)
    - Blog Editor
    - Tech/Dev Lead
    - Event Organiser
    - Designer
    
    Active Projects:
    
    - Panton Principles and Panton Fellowships
    - Content Mining Manifesto
    - pyBOSSA
    - Open Research Data Handbook
    - Open Science blog
    
    Tools and activities from other working groups of
    special interest:
    
    - BibServer (developed by Open Bibliography)
    - Open Access Index (@ccess and Open Bibliography)
    - DataHub (CKAN Team)
    - Who Needs Access? (@ccess)
    
    If you have any questions, please let me know - I
    look forward to hearing from you!
    
    Jenny
    
Really, there is really a lot you have to do, and as Peter Murray-Rust replied to Jenny's call, you do not have to be paid as scientist to join!

Tuesday, November 06, 2012

What online services support InChI and REST?

The adoption if InChI is increasing, despite its limitations. But one thing I find greatly missing, is chemical databases supporting access of entries via the appropriate InChI. I know there are resolvers around, but that is different. They do a search, and give me multiple links to individual structures that may match to a certain extend. I am not interested in that in this context.

What I want instead is to be able to deep link to a particular entry in ChemSpider, PubChem, HMDB, or whatever databases using the InChI instead. The only service currently supporting this that I am aware of, is rdf.openmolecules.net. It uses a URI pattern like http://rdf.openmolecules.net/?$INCHI. For example, the entry for methane is http://rdf.openmolecules.net/?InChI=1/CH4/h1H4 and this URI deep links to the entry of methane, rather than a search result list.

So, the core requirement is that the database URI tells me, either: "yes, this is the one and only entry matching 100% this InChI", or "no, I do not have data for this structure".

What other databases support deep linking using the InChI? And what would the URI look like?

Sunday, October 07, 2012

CDK 1.5.1: the changes, the authors, and the reviewers

Before you continue reading this post, please be aware that the CDK 1.5.x series is the not a new stable release, but the current unstable, development release, where all the API changes happen. For stable releases, only look at 1.4.x, such as the just released CDK 1.4.14.

The first alpha release from the cdk-1.5.x series (we do not have a special branch for that; the matching branch is master) was release 1.5.0 which was released in January!.

It took me some effort to remove all patches in the cdk-1.4.x branches, but I think the below list shows all changes since 1.5.0. And since that first alpha version, the changes in the releases 1.4.8 through 1.4.14 have been included. Therefore, you may also want to read the changelogs for 1.4.8, 1.4.9, 1.4.10, 1.4.11, 1.4.12, and 1.4.14.

Significant changes in this release include that IO settings do now use an enum, as well as a matching new implementation for IO readers and writers to handle settings (done by John). The getHillString() API has been improved, to make it more consistent with the matching getString(). The iterating SD file reader has been renamed from IteratingMDLReader to IteratingSDFReader, and the Elements static fields for the elements in the periodic table now use a final class called NaturalElement, independent from any data interfaces implementation. Daniel worked a lot of the 3D structure builder, improving the code significantly, and Jonathan revamped the fingerprint stack, and introduced two new interfaces, IBitFingerprint and ICountFingerprint, making the framework more uniform. On top of that, there is a new ShortestPathFingerprinter and new IO classes for the Mopac 7 input and output formats.

All in all, quite a lot, but that was to be expected after 8 months. Mind you, like 1.5.0 this release too shows an increased number of failing unit tests on Nightly. Nothing severe, so if you are in a development branch, with the first betas a few months from now, it may be tempting to migrate.

The changes
  • s/Molecule/AtomContainer/ to fix a compile issue with the port of the SDG patch to master 4d8be8a
  • Added the missing MDL molfile :( 96382dc
  • Set bond order 4 bonds to UNSET e65ef9e
  • Introducing a new flag: SINGLE_OR_DOUBLE 55b1494
  • Minor cleanup for the migration of the SINGLE_OR_DOUBLE code to master 3429e61
  • - Simplified getMaximumBondOrder for bonds - Added exception when both bond orders are unset - Added null check for getMaximumBondOrder - Added additional unit test for getMaximumBondOrder 7734cb4
  • Removed NoNotification (deprecated) from test cases - these were calling failures as the NoNotify module is no longer a dependency on cdk-core a7c54a5
  • Finds the location of double bonds. 6e04d3a
  • Added testing that the new AT properties are set e70786f
  • Refactored the atom types to have bonding patterns explicit 71929f3
  • Restore original bond order sum and max bond order properties, as descriptors are not allowed to change t he data structure 58cfaad
  • Additional testing, based on bond order information, now possible 8e00a75
  • Split the C.sp atom type into C.sp for (-C#) and C.allene for (=C=) 001f488
  • Introduced a more explicit way to define the number of connections and allowed bond orders 492a9cd
  • Added a few convenience methods to get the max bond order 4e45c7d
  • Marked the atoms, bonds and molecule with the SINGLE_OR_DOUBLE-flag if needed and tested it 6408139
  • Added a new flag: SINGLE_OR_DOUBLE 46b1e9e
  • Two more tests, reflecting the assumptions that the Number is different for different flags, and the same when the same flag is set ed7a3fc
  • Added another getFlagValue() method, here testing non-zero when a flag is set 96032d8
  • Added a test for a default 0 value of getFlagValue() 255e25c
  • Added missing tests, and particular one about a default 0 value of getFlagValue() 42db307
  • Added a new jar d7d9003
  • added ShortestPathFingerprinter with recommended changes Signed-off-by:Syed Asad Rahman ecfccea
  • added ShortestPathFingerprinter with recommended changes Signed-off-by:Syed Asad Rahman 29aabd3
  • added test case to test module Signed-off-by:Syed Asad Rahman ca4bd8d
  • added test case Signed-off-by:Syed Asad Rahman eff9e23
  • added code dependencies for the Shortest path fingerprinter Signed-off-by:Syed Asad Rahman b356d8c
  • added commons math library Signed-off-by:Syed Asad Rahman 5763df2
  • Implemented new flag storage on IChemObject implementations. Flags are now stored on a single numeric val ue (currently a short) and flags are accessed/mutated via bit shifting of this value. This implementation pro vides space saving over using a boolean array however getFlags() and setFlags() now have a overhead due to co nversion from the array to the numeric value. Usage however indicates the singular setFlag/getFlag is used >1 000 times where as the setFlags/getFlags is used ~50. 1a1b03d
  • Converted flags from incremental integers to bit masks 0352d30
  • Normalised all getFlag and setFlag usages to use the CDKConstants 2db7a94
  • Added unit testing for the dict module bac4e07
  • Removed unused class FixedSizeStack 3f37d8c
  • Removed local variable declarations d5c9a77
  • Replaced field declaration of Vector with List d796b4a
  • Removed output to STDOUT f97b493
  • Added a missing test, and updated a test for the getBitFingerprint method 47e2f1d
  • Point to the master branch on GitHub 56caf17
  • new junit test for @cdk.bug #3526295 d618d98
  • new test method added to test class 29366f0
  • Added the @cdk.module annotation and the class to the builder3d test suite 1ffe232
  • new builder3d template handler test class 21b9941
  • some import deletions to clean up test code dfb1e46
  • bug report and unit test 8dadc75
  • modified license header 6367939
  • Added the test classes to the appropriate test suites 5a27f24
  • new bug unit tests and bug references 5ae0bfe
  • new bug unit tests and bug references 069f06b
  • patch for @cdk.bug #3525144; see sourceforge for bug report 373d6ec
  • Patch for @cdk.bug 3523247 deebfab
  • Patch for @cdk.bug 3515122 by danielszisz 0b31c15
  • Added the new test class to the module test suite 428 623d
  • changed license header with correct author and email dd45524
  • Header and modifications to MMFF94BasedParameterSetReaderTest.java e39a647
  • New MMFFBasedParameterSetReaderTest 3a33765
  • Added another thing to ignore: SWITCH_TABLE stuff 376 28f0
  • Updated for the new FP api by Jonathan 8939f04
  • cleaned up imports 2abf5a1
  • changed variable name to all caps 2686a16
  • added missing dot in javadoc 1c7d281
  • whitespace fixes 227280e
  • Fixed JavaDoc fe466f1
  • fixed dependencies 2558411
  • Made test compile after rebase 1d839e0
  • s/Molecule/IAtomContainer/g 718fa78
  • updated to correspond to version of Junit jar f03d22c
  • added method for studying hashes in FP eafbd8c
  • made fingerprint Serializable e0a77e9
  • clean up imports c67a7ff
  • made 0 param constructor public 8bdb66d
  • constructor for creating the FP from an int array 3db 0a18
  • changed to IBitFingerprint f0e45bf
  • changed to IBitFingerprint b1ccac6
  • added method for getting array of set bits 73a1be8
  • support IBitFingerprint instead of BitSet 0a1257d
  • provided general from-IBitFingerprint-constructor b6e 92dc
  • use IntArrayFingerprint if fp to compare is that bce5 79c
  • fixed wrong index 91d0590
  • added equals and hashcode to the bit fingerprints 7ff a16e
  • tests and solution for bitfp tanimoto trouble 9b713be
  • let's use double for tanimoto coefficients 66a2e2a
  • fix for tanimoto method 2 + test c6e85e0
  • fixed cdk modules and deps f2b2067
  • Added second tanimoto method 2420a70
  • added merge method for count fingerprints f3f941f
  • tanimototests 0b41afa
  • cardinality should be an int 9891a77
  • light-weight binary version of signature fp 8597974
  • Introduced IBitFingerprint and ICountFingerprint 7004511
  • Also updated the test classes for the UIT change b8e6 e1e
  • Make it compatible with the new UniversalIsomorphismTester. 299ce67
  • Make it compatible with the new UniversalIsomorphismTester. 5ac4197
  • Make it compatible with the new UniversalIsomorphismTester. 7439b08
  • Make it compatible with the new UniversalIsomorphismTester. 0a70c86
  • Make it compatible with the new UniversalIsomorphismTester. 542b3b8
  • Make it compatible with the new UniversalIsomorphismTester. 3802b28
  • Make it compatible with the new UniversalIsomorphismTester. cb2c58e
  • Make it compatible with the new UniversalIsomorphismTester. 4bec0a4
  • Make it compatible with the new UniversalIsomorphismTester. 99cc828
  • Make it compatible with the new UniversalIsomorphismTester. 48ba7e3
  • Make it compatible with the new UniversalIsomorphismTester. 2dca75a
  • Modify to make UniversalIsomorphismTester usable in a threaded environment. Remove keyword static from variables start and timeout. Remove keyword static from methods isIsomorph, getIsomorphMap, getIsomorphAtomsMap , getIsomorphMaps, getSubgraphMaps, getSubgraphMap, getSubgraphAtomsMaps, getSubgraphAtomsMap, isSubgraph, getOverlaps, search, getMaximum, setTimeout. Added constructor. c241461
  • Fixed the same problem as in DefaultChemObjectBuilder (see commit 3d0b0e5f329e9256638ce18e4b5024e2d348474 a and 2a2aecc077add716309591e2fae9832dfcfc64cf) 510a753
  • Corrected method call in test 28b0629
  • Added Test[Class|Method] annotation 075d518
  • Added @link's to documentation 102bad6
  • spelling dc90dd2
  • Taking advantage of new has2DCoordinates behvaiour 32 fda31
  • Changed behaviour of has2DCoordinates to mirror has3DCoordinates 19be6ca
  • Revert "Patched bpol descriptor with cheminf ID" (fixes #3541366) 6a7674c
  • updated entry for bpol to be in sync with other descriptors 1be5396
  • Added a missing test method for getFingerPrint() 5916 23f
  • Double comparisons needs an epsilon d648b3e
  • Resynched the test method name e9555ce
  • Read the test MDL V2000 molfile with the appropriate reader dd39f24
  • Fixed the name of the test class 862f478
  • Added a missing dependency d98db46
  • Fixed a typo in test name (Whit -> With) 27cb6fc< /li>
  • Marked the getHillString() method as deprecated, because it is a duplicate of the getString() method. feebefe
  • Added the option to specify element order as a parameter of getHTML() 6df66fb
  • Added two more recent authors 76d32b9
  • Added a new author 1457148
  • Update for a patch from cdk-1.4.x: s/NoNotification/Silent/ 3179394
  • fixes #3525144; from danielszisz 6626a2d
  • Simple period put at the end of the comment line of JavaDoc of method getHybridisationState() 1cfe4c1
  • Simple JavaDoc @param for molecule added to method zmatrixChainToCartesian(). eca9603
  • This commit puts a simple period at the end of sentence 'Method assigns 3Dcoordinates...' at the JavaDoc of placeAliphaticHeavyChain() method. 1a03972
  • A closing period is given to the JavaDoc of the method markPlaced(). 7afb7c9
  • Another missing @param added to findHeavyAtomsInChain() method into line 80. 3238e26
  • Patches for @cdk.bug #3524092 and #3524093 ec6d92d
  • s/IMolecule/IAtomContainer/ fbfab7b
  • Return a default array instead of null, fixing NullPointerExceptions caused otherwise when instantiating via e.g. IAtom(Elements.CARBON) 19bb74a
  • Updated to not use IMolecule 53aa18c
  • Replaced IMolecule* by IAtomContainer* 50536d4
  • It turns out that Java creates an inner class for switch blocks, for which we do not require testing; the refore, this additional excemption 2026a25
  • Set the electron counts when a bond order is set 7c65 06c
  • Added a unit test that sets electron counts when a bond order is set 1faf986
  • Some minor changes to ForceFieldConfiguratorTest by danielszisz 4494af4
  • additions for AtomPlcer3D 36db127
  • Updated and corrected new AtomPlacer3D Test class by danielszisz ea2b55e
  • FurtherAtomPlacer3DTest corrections d83c937
  • New AtomPlacer3DTest class bc83829
  • TestClass annotation for AtomPlacer3D 08e6b58
  • Fixed 'dist-large' by removing non-existing include 7 351e8e
  • Missing annotations are now given to the ForceFieldConfigurator class after new test class has been added f7654cb
  • Use interfaces instead of implementations 829da6c
  • Now independent from the data interface implementation b2838d5
  • Moved to the pdb module 4bcc074
  • Removed redundant dependencies. 026d9d2
  • The qsaratomic module is now also independent from the data module 1a2e93a
  • Made qsarmolecular independent from data, by creating a new fragment module, allowing to not depend on extra, and cleaning up some code to not use data module classes 646c37b
  • Moved Elements to standard, making it independent from the data module, by introducing a read-only IElement implementation 941a459
  • Removed a non-existing dependency 546b737
  • updated ForceFieldConfiguratorTest with bug annotations d30874c
  • New ModelBuilder3D package test 07dbddb
  • Makes the reaction module independent from the data module a21ccc2
  • Added some checks for null specification references. Code is now more robust to ill-formed dictionary elements faa5404
  • Removed the unneeded catching of CDKException 9fbf219
  • Added a new dependency 0040b00
  • Added test case to check for UIT failure when matching symmetric query 8e67dca
  • Some minor modification to the code to make it a little more readable a23d689
  • Implemented new dynamic settings for all usages d5e71 8a
  • Dynamic Settings 175bb47
  • Use the interface instead of the implementation 6955c 25
  • Added a bit of missing test method annotation d9b1b08
  • Corrected JUnit version 751c1b9
  • Renamed the IteratingMDLReader to a IteratingSDFReader, matching the format name 08a06c0
  • Write a CHARGE command when the entity has a non-zero charge 5e81816
  • Added a unit test for charged entities b635613
  • Removed two static fields that are already provided by RingSizeComparator aec7fb3
  • Updated to CDK coding standards: 44764da
  • Updated to match the CDK hierarchy and classes; LEGO building block style: no processing, just writing 6391477
  • Added a Mopac 7.10 test file 4b57b57
  • Added code from Ambit2 for reading Mopac output files ee33e4e
  • Added code from Ambit2 for writing Mopac input files 04b687a
  • Added missing @TestMethod annotation, and two test methods cf7bf06
  • Added missing test classes for the POVRay, SVG, and CDKSourceCode formats 444af43
  • Fixed API: the setResourceFormat() should accept a IResourceFormat 9403d1c
  • Added missing TestMethod annotations for the matching methods 8392519
  • Added a missing and corrected two other test class annotations c11a1e3
  • Fixed extending the most specific test class 74a7dc4< /a>
  • Updated unit test for commit #3093241, where null's are always larger than an actual object ab8aa48
  • Removed two static fields that are already provided by RingSizeComparator 5e7772a
  • Very basic tests for the setWriter() methods (it cannot test if something is really written, as we do not know what objects are supported by a random reader; therefore, we just expect that no exception is thrown) 3cc4232
  • Use an enum, instead of ints 1cdd680
  • Updated the copyright range f878785
  • In Eclipse, use the Eclipse way to find JUnit db6b021
But while I was able to isolate the patches, the below two lists are based on all patches since 1.5.0, and thus include statistics on patches from the CDK 1.4.x series.

The authors

211  Egon Willighagen
 34  Daniel Szisz
 32  John May
 32  Jonathan Alvarsson
 17  Rajarshi Guha
 13  Arvid Berg
 13  Yap Chun Wei
  6  Syed Asad Rahman
  4  Klas Jönsson
  4  Tomas Pluskal
  3  Nina Jeliazkova
  3  Ralf Stephan
  3  Gilleain Torrance
  2  Kevin Lawson
  1  Jonty Lawson
  1  Stefan Kuhn
  1  Stephan Beisken

Yes, that is a respectable list of authors indeed! It also covers nine different countries, if not mistaken.

The reviewers

57  Egon Willighagen 
45  Rajarshi Guha 
25  John May 
 8  Nina Jeliazkova
 2  Arvid Berg 
 2  Ralf Stephan 
 1  Jonathan Alvarsson 

From now on, I will try to increase the frequency of 1.5.x releases to once every three months, or better.