Pages

Monday, December 21, 2009

BlueObelisk StackExchange: summary of the first month

The Blue Obelisk StackExchange (BOx) has seen a relatively good start, but the number of questions is dropping. The average number of unique visits is about 23-30 now:


The number of registered users is not insignificant but also has not been growing much lately:

At the same time, the quality of the questions are high, and have real users questions:
The overall state is 37 questions with about 50 different tags:



To the make progress with BOx, we primarily need to promote it more as central point of entry to people who want to know what free tools they can use to perform there need, and to the people who want to contribute to ODOSOS cheminformatics, by pointing out the unsolved problems.

Saturday, December 19, 2009

December wrap up. X-mas holidays at last!

Wow, I just saw it has been 17 days since my last post already :( That's a new record, I think! A lot has happened actually, but I have not had time to write up things. Actually, I have still have SWAT4LS coverage left to do :(

Latex
Anyway, one of the things our group has been up to in the last two weeks, is writing a book to support of the free, online Pharmaceutical Bioinformatics course. The material includes a good deal of cheminformatics (molecular representation: chemical graph theory, 3D geometries, file formats, line notations, InChI), bioinformatics (sequence analysis), and statistics (PLS, PCA, proteochemometrics). All in light of drug discovery. Of course, we're using LaTeX, and I asked around here and there about related things. For example, on StackOverflow on educational book styles. But also on FriendFeed on tautomerism in relation to drug activity.

Bioclipse
I also hacked up a Bioclipse plugin that allows me to convert a Bioclipse matrix resource into LaTeX source code, but that will not be part of the Bioclipse 2.2 release, as it requires quite some updating of the statistics functionality. BTW, the LaTeX plugin is hosted at Gitorious, which is an GitHub alternative, but does not seem to have post-commit hooks :(

Also, the Bioclipse2 paper "Bioclipse 2: A scriptable integration platform for the life sciences" has been published now in BMC Bioinformatics (DOI:10.1186/1471-2105-10-397)!

New student
I am also happy to have a second student starting in January, who will work primarily on an RDF version of the ChEMBL data. Her work will extend on the excellent work being done right now by Samuel on comparing Prolog with DL reasoning.

CDK Licensing
Another thing that required my attention was the problem brought up by Andew on licensing. There was considerable out-of-date problems with the statements the CDK makes on the license and copyright informations certain CDK modules use, and the implications that has on what the CDK project is required to do (e.g. link to source code of third party libraries) and for downstream CDK distributors, like the Debian and Ubuntu projects. For example, it became apparent that the Debian package cannot distribute the XML Schema of CML, which is CC-BY-ND which is not DSFG-compatible. A few bugs have been reported, and work is ongoing to fix the issues.

Wednesday, December 02, 2009

CDK 1.3.1: the changes

Two weeks ago, I released CDK 1.2.4. Anay reported fails with generating the JavaDoc from the packages, which I think I both fixed now; the uploaded 1.2.4.1 packages on SourceForge include these fixes.

The 1.2.4 release was soon followed by 1.3.1. Unfortunately, uploading the packages to to SourceForge over 3G with Chrome did not work well, so only finished that today. CDK 1.3.1 is the second release in the development branch, and brings in new functionality but also API changes. Here are the changes since the 1.3.0 release:
  • Bumped version for 1.3.1 release c341095
  • Added some extra lines, hopefully fixing the conflicts all the time 6dab943
  • Fixed param name 743bad3
  • Updated the makefp3d target to work with the current build system bbb78ee
  • Set up a branch for the 1.2.4 release 4801d79
  • Fixes bug 2898399. Updates to the SMARTS parser to handle proper matching for explicit hydrogens (including H, 1H, 2H and 3H). SMARTSQueryVisitor updated to take into account different isotopes of H. Also updated unit tests to take into account proper H matching. Added a unit test to further check H matching. b67d76a
  • Added tests to match hydrogens 45a7f54
  • Fixed junior issue 1816529: Missing Java5 generics for atomContainers() Iterator 484619e
  • Reworked the tests for bug 2898032. Updated Javadocs for smiles generator 7f68b07
  • Added unit test to confirm and check for bug 2898032 924b563
  • Fixed junior issue 1802586: Misuse of assertTrue for tested strings 12bec4f
  • Made the AtomContainerPermutors IAtomContainer implementation independent 4748098
  • Updated UIT to handle single atom queries and added a unit test for bug 2888845. Also updated Javadocs to specifically note behavior of single atom queries dfb2805
  • Fixed the dist-large target: removed to no longer existing .libdepends after the log4j module patch 9dc13e3
  • Implemented instantiating custom loggers; example in the unit test class 2771eb9
  • Added the use of the SystemOutLoggingTool as back up acf5953
  • Added a ILoggerTool implementation for STDOUT 921447a
  • Dig up and updated the copyright history a3cc876
  • Factored out initialization of the tool, to allow reusing the code for other logger class names 2af5f24
  • Moved the log4j.jar depending LoggingTool into a separate module 112f64d
  • Introduces the ILoggingTool interface and a factory so that CDK code no longer needs to depend on LoggingTool which depends on Apache's Log4j library. c6c8d38
  • Added generation of java source jars e33fba2
  • Fixed matchers to allow XML without new lines (closes #2832835) f9a0552
  • Added unit tests for detection of PubChem XML files. 571f434
  • Fixed matchers to allow XML without new lines (closes #2832835) a1f25d8
  • Added unit tests for detection of PubChem XML files. 1cec794
  • Added reading of E/Z stereochemistry from double bonds in MDL V2000 molfiles. cb824f1
  • A minor fix to clean up a PDMD warning 024499e
  • Overwrite unit tests, because there are no change events passed around at all for the NoNotification interface implementations 36f295b
  • Added missing unit tests for IChemModel event propagation for the ICrystal field 2993e0c
  • Fixed propagation of change events to IChemModel when modifications are made in child IChemObjects 0c8a88f
  • Fixed unit tests: the IChemModel.setFoo(null) should actually give a change event on the listener of the IChemModel, and not after unregistering of the Foo object. b833176
  • Synchronized with the Blue Obelisk version a91062b
  • Added unit test to the function of the new IO setting to force 2D coordinate output. 4e2b2bf
  • Added writer IO option to force writing of 2D coordinates if 3D coordinates are present too, which now are preferably outputted. 0e6aa2c
  • Added unit test to verify that if 2D and 3D coordinates are available, the 3D coordinates are outputted. 56852f8
  • Changed IBond.get/setStereo() to use a IBond.Stereo enumeration instead of an int (fixes #2855850): 46893ed
  • Fixed Taglets: only return HTML if the Tag is really given; the toString() method is given for all cases, not just when the tag is found 1107fb2
  • Added the Mannhold LogP descriptor 1e6b6cd
  • Added the Mannhold LogP descriptor to the ontology a7adc9f
  • Fixeda bug which was causing various parts of the DescriptorEngine to fail - it was trying to instantiate a non-descriptor class which happens to reside in the descriptor package directory. This fix is a bit kludgy - ideally only descriptors should be in that directory 0242d9a
  • Fixes ClassCastException when not IMolecule 6f3e848
  • Upgraded to PMD 2.4.5 with many bug fixes, giving more accurate error reports f29a66b
  • Added missing dependency on cdk-diff, being used in one of the unit tests 0e287dd
  • Fixed methods names to match those in the test class 789a314
  • Fixed test method name to match the expected patters, fixing a coverage test fail ac13619
  • Removed duplicate code: MolecularFormulaTest now extends AbstractMolecularFormulaTest b8651c7
  • Fixed test method annotation to point to the right method bb7d341
  • Added missing @TestMethod annotation f6f759b
  • Added modules that were missing from the PMD testing 073e5ec
  • Added modules that were missing from the doccheck testing 10dc19c
  • Added reference to IUPAC documentation about stereochemistry visualization. 56adf23
  • Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG d1397fe
  • Added missing dependency introduced by the use of AbstractFingerprinterTest in test-standard. b26eb93
  • Updated the unit test classes for all IFingerprinter implementations to use the new AbstractFingerprinter class; a few unit tests actually fail 1989fa5
  • Extracted an AbstractFingerprinterTest with unit tests that should really apply to all IFingerprinter implementations 8bc42dc
  • Clean up of layout. 5f7cb53
  • Fix the unit test to not give a 'input must support mark' exception on some platforms, by wrapping the InputStream in a BufferedInputStream. 6f6f41e
  • Added missing dependencies 8759481
  • Added ioformats to modules to test 56289e2
  • Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where multiline field data is found. df35f02
  • Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where very much field data, like the ChEBI_complete.sdf eac8266
  • Factored out steps in reading the SD file data block 678e7ca
  • Bumped version, to make it clear this is not the 1.2.3 release 8c8166a
  • Bumped version, to make it clear this is not the 1.3.0 release eeda652
  • Fixed registering on the cdk.threadnonsage tag (closes #2796362) d451576
  • Removed obsolete pattern from old svnrev tag c8f5a72
  • Fixed JavaDoc to remove traces of the old svnrev Tag 1a70488
  • Synchronized exception message with implementation (fixes #2844333) c70b79c
  • Made class private again, per authors request fa7ba02
  • Any class will do, not just public, final and abstract dc9e8c5
  • Added ant task to calculate JavaNCSS code statistics a8b313e
  • Added JavaNCSS 32.53 (LGPL 3.0) 6753a8c
  • The Pauling Electronegativity is copied in configure as well. I can't see why not copy everything we have. 3fd2b17
  • Revert "added a test for bug 2831420": 2c2add6
  • Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG 963b0a7
  • added a test for bug 2831420 5d15222
  • added a test for bug #2831420 93536f0
  • Made InChIGeneratorFactory a singleton. 242da91
  • Layout. af4fac7
  • Added bug annotation 38d0235
  • test case for bug #2846213 f84c53b
  • Fixed perception of N.planar3 where N.sp2 was detected, by now taking into account the given hydrogen count. 1714de2
  • Fixed perception of benzene with all single bond, but hydrogen count 1 and bonds flagged aromatic. In this case, the type is C.sp2 not C.sp3. 05e0be3
  • Added assertions to unit test for values being not null 863b0a5
  • Added two unit tests for the same problem: carbon atom types are not correctly perceived if bond order info is SINGLE only, and hydrogen count and aromaticity flag is set. f19a451
  • Moved class into a org.openscience.cdk package, which seems to work now. I'm puzzled why it did not before. Solved several unit test fails. b055c6b
  • Unsealed the XOM jar to allow having the CustomSerializer 3b82340
  • Fixed Javadocs error e0304bf
  • Fixed a wrong javadoc tag. Also removed svn tag in the SMARTS parser JJT file, replaced with git tag c888773
  • Added support for 'public enum's 4bf822d
  • corrected bug in bondtools.isStereo(IAtomContainer container, IAtom stereoAtom). A comparision of atom symbols in a nested loop was using the counter of the outer loop twice. Note it worked before, because there is a sort of fallback to Morgan numbers. fallback to morgan (fixes #2830287) 025fb47
  • added a new test for bondtools 13f72bd
  • Fixed inconsistency between accepts() and write: also support writing of IAtomContainerSet and IAtomContainer as accepts() indicates (fixes #2827745) 6380578
  • General test for testing consistency between write() and accepts(), testing that all accepted IChemObject's can also be written f0678eb
  • Added unit test for bug #2826961: inconsistent atom typing for two SMILES. Unit test does not show a fail, ruling out a CDK bug 42e45ef
  • Remove erroneous throws statement f8cfea8
  • Bug found calculating the exact mass given a molecular formula when it is negative charged. 3d1de45
  • Fixed reading of the cdk/dict/data/elements.owl database which is now in OWL 73225a0
  • Fixed issue 2458210: use assertNotNull(foo) etc instead of assertTrue(foo != null). 182afe6
  • Added minimum equivalents for BondManipulator.getMaximumBondOrder() methods 6e12696
  • Fixes asserts: after removal *no* change should be recorded 3b9fa30
  • Added IO option to disable generator of XML declaration statements in the output CML. 74451b8
  • Added generics, and consistified code by always returning a List of the same '?'. (And some 80 chars fixes in the JavaDocs.) d6337cd
  • Added unit tests to test that when a [Molecule|Reaction|Ring]Set has been removed from a ChemModel, the ChemModel should unregister as listener. 63e6c01
  • Added unit tests for event propagation from [Molecule|Reaction|Ring]Sets to ChemModel. e011035
  • More testing of flags. abb5384
  • Fix for junior job id: [ 1837692 ] Test methods should throw only one Exception. 8c38536
  • Fixed missing imports and wrapped to 80 chars fd2d2df
  • Better excpetion handling in builder3d: bc5837d

Wednesday, November 25, 2009

SWAT4LS: wrapping up #1

It's already been five days since the SWAT4LS meeting (matching blog), and finally got around to writing up my personal summary. I very much enjoyed the Blue Obelisk dinner on Thursday evening with Nico, Duncan and Miguel (the CDK one).

The SWAT4LS was fun, interesting, perhaps to short, but very much appreciated! Thanx to all organizers! During the day various people tweeted the meeting, using the #swat4ls2009 hashtag (forwarded to a FriendFeed room), while Nico covered things in various blog posts which I'll link to below where appropriate. Summaries I have seen so far are from Nico and Duncan (again :), and the organizers.

The day kicked off with a presentation by Alan Ruttenberg (Nico's coverage). It nicely demonstrated where the semantic web for life sciences is going too. Particularly interesting was the integration of SPARQL with Jmol in ImmPort/JmolViz: it uses Jmol to visualize a PDB entry, while using SPARQL to retrieve atomic and residue annotation, using Jmol script (we have to thank another Miguel (the Jmol one) for taking the scripting and visualization capabilities to the next level in 2002). It always makes me proud to see one of the projects I have worked on to hit a prominent place in key note talks at conferences :)

Alan also clarified that CC0 is not a license, but a statement about the public domain nature of data; there is nothing to accept, nothing to live up to. The important is, and I am sure most of my readers are well aware of that, is that it formalized the public domain concept by wrapping it in a full CC0 statement. My recommendation to all who want to make (chemical data) available as public domain, use the CC0; just because the CC0 works in any country, and it will make a lot of your users very happy. If you cannot claim CC0 because you are not really owner (as I have seen done), do not claim the data to be public domain either then (which was done)!

There was also note of the Amino Acid Ontology, which comes closer to our groups proteochemometrics work, but I have yet to look if this can be used for or linked protein descriptors. Also interesting is the idea behind RDFHerd, a project aiming to distribute RDF data sets as installable packages. If I understood correctly, only Virtuoso is yet supported, but this thing can fly, particularly, if these packages are easily converted into Debian packages.

More wrapping up will follow, but got other business to do first now.

Friday, November 20, 2009

Linking two Virtuoso instances to one Apache server

Virtuoso comes with its own web front end, but I did not want to make that public. Additionally, I actually have two instances running, one for the GNU FDL licensed NMRShiftDB data, and one for the CC0 ChemPedia and Solubility data sets.

So, I used Apache's proxy module linking to two Virtuoso instances. These two are set up by just duplicating a data based folder and to have it use two virtuoso.ini config files. Modify one of two config files to have them run on a different port in the Parameters section, for example 1198 and 1199:
[Parameters]
ServerPort                      = 1199
And assign a different server ports in the HTTPServer section, such as 2290 and 2291:
[HTTPServer]
ServerPort                      = 2291
Then modify the /etc/apache2/mods-enabled/proxy.conf (or whatever equivalent on your system) to have two sections creating two URL rewrites proxying the request to the virtuoso server:
<Proxy /nmrshiftdb/sparql>
  RewriteEngine On
  Allow from all
  ProxyPass        http://localhost:2290/sparql
  ProxyPassReverse http://localhost:2290/sparql
</Proxy>

<Proxy /cc0/sparql>
  RewriteEngine On
  Allow from all
  ProxyPass        http://localhost:2291/sparql
  ProxyPassReverse http://localhost:2291/sparql
</Proxy>

Thursday, November 19, 2009

ChemPedia RDF #1: the SPARQL end point

Well, you might spot a pattern here; yes, another chemical SPARQL end point (actually, it shares the end point with the Solubility data). This time around Rich's ChemPedia. Taking advantage of the CC0-licensed downloads, I have created a small Groovy script (using this JSON library) to convert the ChemPedia JSON into Notation3:
import net.sf.json.groovy.JsonSlurper;

input = new File("substances.json")
json = new JsonSlurper().parse(input);

println "@prefix dc: <http://purl.org/dc/elements/1.1/>";
println "@prefix cp: <http://rdf.openmolecules.net/chempedia/onto#>";
json.each { it ->
  println "<" + it.uri + "> dc:identifier \"" + it.gsid + "\";";
  println " <http://www.w3.org/2002/07/owl#sameAs> <http://rdf.openmolecules.net/?" + it.inchi + ">;";
  println "  <http://www.iupac.org/inchi> \"" + it.inchi + "\".";
  if (it.namings.size() > 0) {
    for (int i = 0; i<it.namings.size(); i++) {
      naming = it.namings.get(i);
      namingURI = it.uri + "/naming" + i;
      println "<" + it.uri + "> cp:hasNaming " +
        "<" + namingURI + ">.";
      println "<" + namingURI + "> a cp:Naming;";
      println "  cp:hasName \"" + naming.name + "\";";
      println "  cp:hasStatus \"" + naming.status + "\";";
      println "  cp:hasScore \"" + naming.score + "\".";
    }
  }
}
After uploading it into Virtuoso (now using DB.DBA.TTLP instead of DB.DBA.RDF_LOAD_RDFXML_MT ), we can now have our regular SPARQL fun with the data from ChemPedia. For example, list the 10 names with the most votes:
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix cp: <http://rdf.openmolecules.net/chempedia/onto#>

select distinct ?name ?score where {
  ?s a cp:Naming ;
     cp:hasName ?name ;
     cp:hasScore ?score .
} ORDER BY DESC(?score) LIMIT 10 

Open Notebook Science Solubility: the SPARQL end point

The Open Notebook Science Solubility challenge is an project crowd sourcing solubility of organic compounds in non-aqueous solvents. I have been working on RDF-ing this data: And this resulted in a joint chapter in the nice Beatiful Data book.

What I had not done so far, is set up a SPARQL end point for this data, like I did for the NMRShiftDB data.

Now, however, a Virtuoso-powered SPARQL end point is available, and I hope this will seen get picked up by the other nodes on the ONS Solubility project. It is not a auto-synchronized link, though.

Possible advantages include that the client can perform any query and get these results in various formats, including JSON. For example, follow this link to get all solutes in JSON format.

The matching SPARQL looks like:
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix ons: <http://spreadsheet.google.com/plwwufp30hfq0udnEmRD1aQ/onto#>

select distinct ?s ?title where {
  ?s a ons:Solute ;
     dc:title ?title .
}

Wednesday, November 18, 2009

CDK 1.2.4: the authors

The CDK 1.2.4 changelog I posted earlier was directly created from git output. Git has many features which makes such thing simple. Here's a list of authors of the 1.2.4 change set:
56 Egon Willighagen
9 Rajarshi  Guha
5 Stefan Kuhn
2 mark_rynbeek
1 Uli Köhler
1 Rajarshi Guha
1 Peter Odéus
1 Paul Turner
1 Miguel Rojas Cherto
1 Arvid Berg
This is just the number of commits, and many of mine are logistic in nature. You can also notice that Rajarshi has changed his name (removed the extraneous space :). Thanx to all of authors for contributing to this release! I am happy to see a few new names in this list, which seems to indicate that the people are settling in on the whole move from Subversion to Git.

This list was created with this command adapted from this StackOverflow question:
git log --pretty=format:%an cdk-1.2.3..cdk-1.2.4 | awk -- '{ ++c[$0]; } END { for(cc in c) printf "%5d %s\n",c[cc],cc; }' | sort -n -r

CDK 1.2.4: the changes

Here is the changelog of CDK 1.2.4 which I am about to upload to SourceForge:
  • Fixed param name 743bad3
  • Updated the makefp3d target to work with the current build system bbb78ee
  • Set up a branch for the 1.2.4 release 4801d79
  • Fixes bug 2898399. Updates to the SMARTS parser to handle proper matching for explicit hydrogens (including H, 1H, 2H and 3H). SMARTSQueryVisitor updated to take into account different isotopes of H. Also updated unit tests to take into account proper H matching. Added a unit test to further check H matching. b67d76a
  • Added tests to match hydrogens 45a7f54
  • Reworked the tests for bug 2898032. Updated Javadocs for smiles generator 7f68b07
  • Added unit test to confirm and check for bug 2898032 924b563
  • Updated UIT to handle single atom queries and added a unit test for bug 2888845. Also updated Javadocs to specifically note behavior of single atom queries dfb2805
  • Added generation of java source jars e33fba2
  • Fixed matchers to allow XML without new lines (closes #2832835) f9a0552
  • Added unit tests for detection of PubChem XML files. 571f434
  • Overwrite unit tests, because there are no change events passed around at all for the NoNotification interface implementations 36f295b
  • Added missing unit tests for IChemModel event propagation for the ICrystal field 2993e0c
  • Fixed propagation of change events to IChemModel when modifications are made in child IChemObjects 0c8a88f
  • Fixed unit tests: the IChemModel.setFoo(null) should actually give a change event on the listener of the IChemModel, and not after unregistering of the Foo object. b833176
  • Added unit test to the function of the new IO setting to force 2D coordinate output. 4e2b2bf
  • Added writer IO option to force writing of 2D coordinates if 3D coordinates are present too, which now are preferably outputted. 0e6aa2c
  • Added unit test to verify that if 2D and 3D coordinates are available, the 3D coordinates are outputted. 56852f8
  • Fixed Taglets: only return HTML if the Tag is really given; the toString() method is given for all cases, not just when the tag is found 1107fb2
  • Fixeda bug which was causing various parts of the DescriptorEngine to fail - it was trying to instantiate a non-descriptor class which happens to reside in the descriptor package directory. This fix is a bit kludgy - ideally only descriptors should be in that directory 0242d9a
  • Fixes ClassCastException when not IMolecule 6f3e848
  • Upgraded to PMD 2.4.5 with many bug fixes, giving more accurate error reports f29a66b
  • Added missing dependency on cdk-diff, being used in one of the unit tests 0e287dd
  • Fixed methods names to match those in the test class 789a314
  • Fixed test method name to match the expected patters, fixing a coverage test fail ac13619
  • Removed duplicate code: MolecularFormulaTest now extends AbstractMolecularFormulaTest b8651c7
  • Fixed test method annotation to point to the right method bb7d341
  • Added missing @TestMethod annotation f6f759b
  • Added modules that were missing from the PMD testing 073e5ec
  • Added modules that were missing from the doccheck testing 10dc19c
  • Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG d1397fe
  • Fix the unit test to not give a 'input must support mark' exception on some platforms, by wrapping the InputStream in a BufferedInputStream. 6f6f41e
  • Added missing dependencies 8759481
  • Added ioformats to modules to test 56289e2
  • Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where multiline field data is found. df35f02
  • Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where very much field data, like the ChEBI_complete.sdf eac8266
  • Factored out steps in reading the SD file data block 678e7ca
  • Bumped version, to make it clear this is not the 1.2.3 release 8c8166a
  • Fixed registering on the cdk.threadnonsage tag (closes #2796362) d451576
  • Removed obsolete pattern from old svnrev tag c8f5a72
  • Fixed JavaDoc to remove traces of the old svnrev Tag 1a70488
  • Synchronized exception message with implementation (fixes #2844333) c70b79c
  • The Pauling Electronegativity is copied in configure as well. I can't see why not copy everything we have. 3fd2b17
  • Added bug annotation 38d0235
  • test case for bug #2846213 f84c53b
  • Fixed perception of N.planar3 where N.sp2 was detected, by now taking into account the given hydrogen count. 1714de2
  • Fixed perception of benzene with all single bond, but hydrogen count 1 and bonds flagged aromatic. In this case, the type is C.sp2 not C.sp3. 05e0be3
  • Added assertions to unit test for values being not null 863b0a5
  • Added two unit tests for the same problem: carbon atom types are not correctly perceived if bond order info is SINGLE only, and hydrogen count and aromaticity flag is set. f19a451
  • Moved class into a org.openscience.cdk package, which seems to work now. I'm puzzled why it did not before. Solved several unit test fails. b055c6b
  • Merge branch 'cdk-1.2.x' of ssh://egonw@cdk.git.sourceforge.net/gitroot/cdk into cdk-1.2.x f77db9c
  • Unsealed the XOM jar to allow having the CustomSerializer 3b82340
  • Fixed Javadocs error e0304bf
  • Fixed a wrong javadoc tag. Also removed svn tag in the SMARTS parser JJT file, replaced with git tag c888773
  • Added support for 'public enum's 4bf822d
  • corrected bug in bondtools.isStereo(IAtomContainer container, IAtom stereoAtom). A comparision of atom symbols in a nested loop was using the counter of the outer loop twice. Note it worked before, because there is a sort of fallback to Morgan numbers. fallback to morgan (fixes #2830287) 025fb47
  • added a new test for bondtools 13f72bd
  • Fixed inconsistency between accepts() and write: also support writing of IAtomContainerSet and IAtomContainer as accepts() indicates (fixes #2827745) 6380578
  • General test for testing consistency between write() and accepts(), testing that all accepted IChemObject's can also be written f0678eb
  • Added unit test for bug #2826961: inconsistent atom typing for two SMILES. Unit test does not show a fail, ruling out a CDK bug 42e45ef
  • Remove erroneous throws statement f8cfea8
  • Bug found calculating the exact mass given a molecular formula when it is negative charged. 3d1de45
  • Fixed reading of the cdk/dict/data/elements.owl database which is now in OWL 73225a0
  • Fixed issue 2458210: use assertNotNull(foo) etc instead of assertTrue(foo != null). 182afe6
  • Added minimum equivalents for BondManipulator.getMaximumBondOrder() methods 6e12696
  • Fixes asserts: after removal *no* change should be recorded 3b9fa30
  • Added IO option to disable generator of XML declaration statements in the output CML. 74451b8
  • Added generics, and consistified code by always returning a List of the same '?'. (And some 80 chars fixes in the JavaDocs.) d6337cd
  • Added unit tests to test that when a [Molecule|Reaction|Ring]Set has been removed from a ChemModel, the ChemModel should unregister as listener. 63e6c01
  • Added unit tests for event propagation from [Molecule|Reaction|Ring]Sets to ChemModel. e011035
  • More testing of flags. abb5384
  • Fix for junior job id: [ 1837692 ] Test methods should throw only one Exception. 8c38536
  • Fixed missing imports and wrapped to 80 chars fd2d2df
  • Better excpetion handling in builder3d: bc5837d
  • Fixed serialization of IAtom's with null formal charge to not cause NullPointerExceptions acc8012
  • Added unit test for serialization of null formal charges into the MDL molfile format (which currently fails) df57aea
  • Updated Javadocs for SMARTS query tool to indicate unsupported features e1da4c0
  • Cleaned up source file to remove spurious line endings 3d7adae

This overview was created with this Linux one-liner:
git log --oneline cdk-1.2.3.. | sed 's/\([a-f0-9]*\)\s\(.*\).*/<li>\2 <a href="http:\/\/cdk.git.sourceforge.net\/git\/gitweb.cgi?p=cdk\/cdk;a=commit;h=\1">\1<\/a><\/li>/'

Wednesday, November 11, 2009

BlueObelisk StackExchange (.com)

Oh no, not another communication channel?! We already have Google Wave! (BTW, I have quite some new invites...)

Well, you are right. But I could not resist: blueobelisk.stackexchange.com...


No, it is not using an Open platform, but plenty of Windows and Max users among us... the data is CC0.

Update: any question about Open Data, Open Source, or Open Standards (ODOSOS) is welcome. As well as any question on if and how some chemical question could be answered with ODOSOS tools. It is not restricted to the Blue Obelisk or the projects under the wings of the Blue Obelisk. All Open Data, Open Source, and Open Standards in chemistry is worth asking about.

Saturday, November 07, 2009

Call for Collaboration: JavaDoc validation with OpenJavaDocCheck

I reported recently about my efforts to write an Open Source DocCheck replacement. I received the first patches (from Rajarshi), and brought it online in a CDK branch (see this Nightly page).



This list shows a mix of tests that are now implemented in OpenJavaDocCheck itself, but the third line is actually a test that is plugged in and specific for the CDK. This is an important feature, I think, and allows users of OpenJavaDocCheck to add functionality is that is not interesting to the general public, but very interesting for the JavaDoc being analyzed. Well, at least, it is to our CDK project :)

The current list of tests is still quite small, and consists of these tests:
  • test if each class and method has JavaDoc
  • test for missing @return tags
  • test for missing @param tags
  • test for @returns instead of @return
  • test @param template code, such as added by IDEs like Eclipse
  • test @exception template code, such as added by IDEs like Eclipse
  • test for redundant @version tags
I am now seeking feedback on the current code base, and potentially collaboration with writing more JavaDoc validation tests. There is enough to do, and I have been thinking on tests for:
  • spell checking JavaDoc
  • checking for 404s of web pages linked with <a href> in the JavaDoc
  • well-formedness of the HTML in the webpages
And about:
  • a PMD-like system to allow people to choose which testing they want or not
  • an Eclipse plugin

Wednesday, November 04, 2009

New Bioclipse Features: Kabsch Alignment, RMSD Distance and Tanimoto Simarlity Matrices

We recently submitted a second paper on Bioclipse, and have worked hard in the past two weeks on addressing the reviewers' questions (and we love these feature requests! See also these two blogs). One reviewer seemed very interested in seeing docking available in Bioclipse. While we do not have a full docking feature set up for Bioclipse, we do have functionality to deal with 3D structures, though our researched urged us to focus on the 2D side of cheminformatics so far.

To strengthen our intentions towards the 3D cheminformatics world, we have implemented a few new features, using CDK functionality. For example, we added Kabsch aligment and the related RMSD between molecular structures implemented as both popup menus as well as manager methods. The manager method you can see in action in MyExperiment workflow 937, which you can download directly into Bioclipse with one simple command (see Bioclipse Manager for MyExperiment.org):
var smileses = new Array("CC(C)C", "CCCN", "CCC=O");

var unaligned = cdk.createMoleculeList();
for (i=0; i<smileses.length; i++) {
  mol = cdk.fromSMILES(smileses[i]);
  mol = cdk.generate3dCoordinates(mol)
  unaligned.add(mol);
}

var aligned = cdk.kabsch(unaligned)

jmol.load(aligned.get(0));
for (i=1; i<aligned.size(); i++) {
  jmol.append(aligned.get(i));
}
Now, we do have to update the use of Jmol in Bioclipse, and a big overhaul is scheduled for the 2.4 released in February next year. But you get the idea.

As said, there are two stories to adding this new functionality. Because we want all GUI interaction the user performs to be recordable (Scientist 1: What did you do to get those nice results? Scientist 2: I pushed that button in the that long menu. Scientist 1: What button is that? Scientist 2: Wait, I send you the BSL script with a Google Wave.)

The managers that allow this recording is Bioclipse specific, and also the reason why it would not be trivial to make a general Bioclipse plugin for Eclipse... some Spring magic is used to inject the managers into the JavaScript language. Anyway, the second thing is to add a GUI element, like popup menus. Now, this is a particular area where Eclipse excels. Now, I did have to ask for the details, as I am not using this daily (I'm doing science, not IT), but Ola was kind enough to give me the pointers for it.

The below configuration snippet links the pop up action to Bioclipse Navigator content (you know, where your MDL SD, CML, script and other files show up in Bioclipse). But only if I have selected 3 or more files! And, only if those files are actually some molecular content with 3D coordinates! And Bioclipse inherits this functionality by using the Eclipse platform.
<menuContribution
  locationURI="popup:org.eclipse.ui.popup.any?after=additions">
  <command
    commandId="net.bioclipse.cdk.ui.handlers.kabschAlignment"
    label="Perform Kabsch Alignment"
    icon="icons/molecule2D.png">
    <visibleWhen>
      <with variable="selection">
        <count value="(2-"/>
        <iterate operator="and" ifEmpty="false">
          <adapt type="org.eclipse.core.resources.IResource">
            <or>
              <test property="org.eclipse.core.resources.contentTypeId"
                       value="net.bioclipse.contenttypes.cml.singleMolecule3d"/>
              <test property="org.eclipse.core.resources.contentTypeId"
                       value="net.bioclipse.contenttypes.cml.singleMolecule5d"/>
              <test property="org.eclipse.core.resources.contentTypeId"
                       value="net.bioclipse.contenttypes.mdlMolFile3D"/>
            </or>
          </adapt>
        </iterate>
      </with>
    </visibleWhen>
  </command>
</menuContribution>
When Bioclipse is run, this looks like:



And the alignment results will nicely show up in a Jmol viewer (while it is implemented as an Eclipse editor, it is not yet):


The first screenshot also shows the new pop-up menus for calculating two matrices for 3 or more molecules. One is based on the RMSD of the 3D atomic coordinats of the atoms in the MCSS (BTW, Asad's SMSD work is making its way into the CDK library, and will be available in a later Bioclipse version too.) and will create a distance matrix. The second new pop-up menu used the Tanimoto similarity measure based on CDK fingerprints on the selected chemical graphs. If the Bioclipse Statistics feature is installed, the created CSV files will open up in a matrix editor:


Kabsch alignment of protein backbones is planned for a later Bioclipse release, but an important feature for our groups proteochemometrics work.

Milestones...

While I am still looking around for a assisting/associate professor position, there are two milestones around my scientific work I want to briefly mention here. This blog is the 500th blog on chem-bla-ics, and the two CDK papers have combined reached 100+ citations as counted by Web-of-Science, as can be seen on my ResearcherID profile.

Bioclipse Manager for MyExperiment.org

Some time ago I wrote about using Bioclipse to query to MyExperiment.org SPARQL end point. I think I had not mentioned that I have also written a manager to download MyExperiment Bioclipse Scripting Language (BSL) scripts (though there are no GUI elements yet):
> myexperiment.search("RDF")
[921, 928, 889]

> myexperiment.search("Kabsch")
[937]
The returned lists give the workflow numbers for matching BSL scripts, which you can then simply download with:
> var file = myexperiment.downloadWorkflow(937)
ui.open(file)

Wednesday, October 21, 2009

Maintaining patches is fixing patches

Today I had a question about having to fix patches against upstream changes because those patches were not included upstream yet is not very productive.

However, it is a prominent part of maintaining a code base. In the past 9 year, I and many others have been reworking a lot of CDK code because of API changes and bug fixes in deeper parts of the CDK library. At least half of the work I have done for the CDK is doing this kind of fixing of downstream code. This is never trivial, and it is never productive. Well, depends somewhat on your definition of productivity.

Whether productive or not, it is just something that needs to happen. Additionally, it is not something you can prevent. I guess one can call this a fact of life. Doesn't make it nice work. Not at all. And most of my frustration with the CDK library is the lack of documentation and unit testing, which makes such fixing of downstream code hard. This means that the person best suited to do this job, is the one who wrote the patch in the first place. The person who made the comment I mentioned earlier is seeing this from very up close now.

Code Quality
I very much understand his feeling of being unproductive when updating patches; been there, done that. He (that I can disclose) is absolutely right. With all the quality assurance functionality I have set up in the past for the CDK, nicely integrated in Rajarshi's Nightly script, I hope to make it easier for people to write proper maintainable patches. Often these reports are, however, again about doing tasks which make you feel unproductive. But I can assure you that writing such tools quality assurance tools, like the OpenJavaDocCheck I worked on this weekend, makes you feel even less productive.

Redesign
Sometimes making a library better maintainable, includes reworking the design. Almost always this take serious effort, and potentially introduce new bugs. At the same time, it always fixes a lot of older bugs and at the same time, of redesigned properly, makes it much easier to fix other bugs and allow more functionality to be implemented.

But again, this requires rewriting of downstream patches too. And the one doing the redesign will always get comments about this requiring to make unproductive code updates downstream. I have seen this on several occasions in the CDK, such as my rewrite of the atom typing functionality in the CDK. (And don't get any KDE4 developer started on that topic ;) Another fact of life, I guess.

Tuesday, October 20, 2009

CrossRef writes up RSS usage recommendations

CrossTech announced that a CrossRef working group has written a best practices for the use of RSS feeds by publishers. Nice introduction for anyone who is creating RSS feeds. Only comment I could make, is the lack of other modules. For example, a Chemistry module has been proposed by us 5 years ago already (DOI:10.1021/ci034244p) and about which I blogged on several occasions.

Below is the CMLRSS feed of Chemical blogspace.



Of course, publishers can take advantage of such modules, using the XML Namespaces technology. The best practices uses that for a Dublin Core and a PRISM extension. The here discussed CML extension is another one, but the point is, that you can basically plug in any module.

Saturday, October 17, 2009

Work in Progress: an Open DocCheck replacement

While it is still very much in progress, I have already made more progress than I had hoped for. The JavaDoc Doclet API is actually not too difficult to use, though my use will very likely improve more later. The CDK has been using Sun's DocCheck utility for testing the library's JavaDoc quality, but the reports never really satisfied me. Moreover, the most recent version is ancient and because it is closed source, no one can continue on those efforts. DocCheck is MIA.

Instead, PMD is given nice overviews of what it believes to be wrong with the CDK, and also provides a decent XML format which allows extraction of information, which is used by, for example, SuperNightly as showed yesterday in PMD 2.4.5 installed in the CDK 1.2.x branch.

I have been pondering about it for a long time now, but writing a JavaDoc checking library is hardly core cheminformatics research; at least, you would not get funding for it, despite everyone always complaining about good documentation. Alas.

Last week, I was reviewing some more code, and again saw the very common error of the missing period at the end of the first sentence in JavaDoc. This one is sort of important for proper JavaDoc documentation generation, but the complexity of the current DocCheck reporting, people are not familiar enough with it. Being tired of having to repeat myself, I decided to address the problen, but creating better Nightly error reporting for the CDK JavaDoc.

So, I started OpenJavaDocCheck, or ojdcheck. As mentioned, I have made quite promising progress, and the current version provides the ability to write custom tests (which I plan to use for validating content of CDK taglet content), and create XML as well as XHTML which can be saved to any file. To give you a glimps of where things are going, here's a screenshot of the current XHTML output:



The current list of tests is really small, and consists of a single test:
  • test if each class and method has JavaDoc

Friday, October 16, 2009

PMD 2.4.5 installed in the CDK 1.2.x branch

Today I installed PMD 4.2.5 in the CDK 1.2.x branch which contains mostly bug fixes compared to the 4.2.2 version we had earlier. Several of these include false positives: warnings which were not really problems, but tests going bad.

The number of these false positives seems to be significant as the number of PMD violations for the CDK 1.2.x branch seems to have dropped about 1500! warnings :)

Thursday, October 15, 2009

SPARQL end points, Jena and bif:contains

I have been having fun with SPARQL in Bioclipse for a while now, and blogged at several occasions:
One thing I had not been able to work out, is that Virtuoso uses a (rather nice) bif:contains extension that support indexing. However, Jena would complain with:
com.hp.hpl.jena.query.QueryParseException: Line 1, column 31: Unresolved
prefixed name: bif:contains
Defining the prefix did not solve the problem either, but Ivan Mikhailov just replied to my post to the virtuoso-user mailing list providing the solution.

The solution is in the fact that bif: is in its own namespace, which makes it possible to replace bif:contains by its full reference <bif:contains>. I directly gave that a try in Bioclipse, and just succesfull ran this Bioclipse script snippet:
rdf.sparqlRemote(
  "http://bio2rdf.org/sparql",
  "SELECT * WHERE {?s ?p ?o . ?o <bif:contains> \"aspirin\" .};"
);

Thanx, Ivan!

Friday, October 09, 2009

NMRShiftDB RDF #3: Bio2RDF

My might have seen my efforts to convert the NMRShiftDB data into RDF:
Peter Ansell has shortly after that copied the data into Bio2RDF, but I had not blogged about that yet. So, here goes. If you have not looked at Bio2RDF yet, this is a good time to do that. The structure of the exposed triples is not perfect, and I just realized I made a beginners mistake, to use a domain name in a namespace I have not control over (bad me). The Virtuoso6 faceted browser allows you to navigate the data in Bio2RDF by molecule (e.g. molecule 234):



And by spectrum too (e.g. spectrum 4735):


Thursday, October 08, 2009

Where are the CDK 1.3.1 and 1.2.4 releases ?!?

You might be wondering what is keeping the CDK 1.3.1 and 1.2.4 releases. And right you are. When we look at Supernightly, we get a clue (BTW, I hope the EBI nodes will join soon too):

Studying this table shows the reasons: there are too many regressions, too many failing unit tests. For example, 1.2.4 (while not yet released, called 1.2.3.git) has 50 new failing tests. Now, fair enough, this is mostly because of ioformats not being tested in 1.2.3 and most of the fails caused by a bug in the test, not in the code. But that still leaves 20 other failing tests. Mostly related to known bugs, and for some problems patches are actually available.

These last 22 we also see in the differences between 1.3.0 and 1.3.1 (while not yet released, called 1.3.0.git). That's because the ioformats modules is not tested in that branch either, pending a new merge with the cdk-1.2.x branch.

Wednesday, October 07, 2009

VR.se funded research to be OA as of 2010

Happy news from the Swedish Vetenskapsradet (via Coturnix): as of next 2010 all peer reviewed journal papers must be Open Access. I am not yet VR funded, but involved in a few VR grant applications. Not that that really matters, as I am happily publishing OA already.

Keeping my Bioclipse repositories in sync with upstream

Bioclipse is now split up over several Git repositories (and some additional stuff in even more repositories). This has all to do with each repository now having one person acting as point-of-access. This means that I have several repositories checked out, which I need to keep synchronized. Now, I am pretty sure there are many solutions (and suggestions very welcome!), but this is the Bash script I have just written to give me an overview of the state of my repositories, hoping it may be useful to others too:
#!/bin/bash

PLUGINS=`ls -1`

for PLUGIN in $PLUGINS
do
        echo "***************************************************************** $PLUGIN"
        cd $PLUGIN; git fetch origin; git status; cd ..
done

Tuesday, October 06, 2009

CDK Molecules in RDF

Yesterday, I finally got around to starting a branch on adding RDF support to the CDK; in particular, write the CDK data model ontology in OWL and serialization to and from RDF using the ontology. The framework is now set up, but I have yet to formalize all bits and pieces of the CDK data model in classes and properties. Just as a preview, here is what a very basic bit of CDK model in RDF looks like (N3 format):
@prefix cdk:     <http://cdk.sourceforge.net/model.owl#> .

<http://cdk.sf.net/model/atom/1>
      a       cdk:Atom ;
      cdk:symbol "C" .

<http://cdk.sf.net/model/molecule/1>
      a       cdk:Molecule ;
      cdk:hasAtom  .
Still rather verbose, but very flexible. I have even been thinking of an XHTML+RDFa writer...

Thursday, October 01, 2009

Google Wave Invite: but you need to work on the CDK and the CDKitty robot

I just posted to below email to the cdk-user mailing list. Next Monday, I'll decide.
Hi all,

unless you have not read any news in the last two days, you will have
seen that Google is rolling out a second batch of Google Wave
accounts... I have one invite for someone who wants to co-develop the
CDKitty robot, which adds CDK-based functionality to Google Wave...

The code is at: http://github.com/egonw/cdkitty

If you are interested in the account, please email me offline with:

* how you think you can contribute to the robot
* why you want to do that
* how much time you will have for it

The position is open to anyway, and consider your email an application
to the position :) (and, if you are a student, we could even try to
arrange Uppsala University credit points, if you can work 20 weeks
full time on it).

Egon
BTW, existing Google Wave users can invite the robot by adding chemdevelkit@appspot.com.

Processing the ChEBI MDL SD file with the CDK

Bioclipse has a bug report about browsing the ChEBI SD file in its moltable editor. Some entries make Bioclipse crash (as reported), or just very sluggish as with my Dell superlapcomputer :)

So, I processed the file with a pure CDK 1.2.3 with this small piece of Groovy script:
import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.io.*;
import org.openscience.cdk.io.iterator.*;
import org.openscience.cdk.*;
import org.openscience.cdk.tools.manipulator.*;

iterator = new IteratingMDLReader(
  new File("ChEBI_complete.sdf").newReader(),
  DefaultChemObjectBuilder.getInstance()
)
int i = 0;
boolean hasNext = true;
while (hasNext) {
  i++;
  long startTime = System.currentTimeMillis();
  hasNext = iterator.hasNext();
  IMolecule mol = iterator.next()
  long endTime = System.currentTimeMillis();
  formula = MolecularFormulaManipulator.getMolecularFormula(mol)
  long time = endTime - startTime;
  if (time > 99)
    println i + ": " + MolecularFormulaManipulator.getString(formula) +
            " (" + endTime + "-" + startTime + "=" + time + " ms)"

}

This script times reading of all entries and reports all that entries take more than 100 ms to read (in the scripting environment). There are surprising results: H2O takes 50 seconds, phosphate 100 seconds. So, I am quite certain it must be the reading of the metadata, and not the connection table. But, this I will explore in more detail now, hoping to come up with a patch for the CDK to speed up reading of such entries.

The full list of timings:
1: C10H2 (1254375053450-1254375052356=1094 ms)                                           
152: C20HN7O6 (1254375054779-1254375054125=654 ms)                                       
592: C3HO3 (1254375056604-1254375055499=1105 ms)                                         
832: C9NO5 (1254375057016-1254375056823=193 ms)                                          
879: C20N4 (1254375057381-1254375057039=342 ms)                                          
1125: R (1254375058293-1254375057528=765 ms)                                             
1197: C20N7O6 (1254375058612-1254375058372=240 ms)                                       
1198: C5NO3 (1254375058714-1254375058613=101 ms)                                         
1243: C5NO4 (1254375063698-1254375058800=4898 ms)                                        
1272: C21N7O16P3S (1254375067185-1254375063856=3329 ms)                                  
1277: C23N7O17P3S (1254375067625-1254375067239=386 ms)                                   
1282: C3NO2S (1254375070673-1254375067650=3023 ms)                                       
1285: C3O3 (1254375071600-1254375070675=925 ms)                                          
1290: C2O2 (1254375071802-1254375071608=194 ms)                                          
1299: H2O (1254375122202-1254375071808=50394 ms)                                         
1300: H (1254375136668-1254375122202=14466 ms)                                           
1301: O2 (1254375145270-1254375136670=8600 ms)                                           
1335: C15N6O5S (1254375150683-1254375145319=5364 ms)                                     
1343: C10N5O13P3 (1254375298927-1254375150686=148241 ms)                                 
1349: C2NO2 (1254375301391-1254375298953=2438 ms)                                        
1351: C34N4O4 (1254375301659-1254375301396=263 ms)                                       
1509: C6NO2 (1254375302753-1254375302011=742 ms)                                         
1541: C19N7O6 (1254375303296-1254375302778=518 ms)                                       
1543: C20HN7O7 (1254375303441-1254375303312=129 ms)                                      
1609: C9N2O15P3 (1254375303740-1254375303558=182 ms)                                     
1631: CHO2 (1254375303975-1254375303837=138 ms)                                          
1632: C4O4 (1254375304127-1254375303976=151 ms)                                          
1711: C21N7O14P2 (1254375310174-1254375304245=5929 ms)                                   
1788: C6H3O9P (1254375310555-1254375310387=168 ms)                                       
1798: C10H2N2O3S (1254375310705-1254375310588=117 ms)                                    
1808: C6N3O2 (1254375312665-1254375310727=1938 ms)                                       
1823: C10N5O14P3 (1254375318534-1254375312781=5753 ms)                                   
1839: C5NO4 (1254375325988-1254375318583=7405 ms)                                        
1840: C3O3 (1254375326249-1254375325989=260 ms)                                          
1848: C10N5O7P (1254375336661-1254375326273=10388 ms)                                    
1862: C5NOSR2 (1254375337336-1254375336699=637 ms)                                       
1882: C3N2 (1254375337489-1254375337351=138 ms)                                          
1893: C6O9P (1254375337626-1254375337501=125 ms)                                         
1910: C3O6P (1254375337846-1254375337639=207 ms)                                         
1934: H3N (1254375349713-1254375337921=11792 ms)                                         
1977: O4S (1254375350045-1254375349902=143 ms)                                           
1984: CN2O (1254375350174-1254375350050=124 ms)                                          
2007: C5N (1254375350324-1254375350183=141 ms)                                           
2015: C5N5O (1254375350493-1254375350329=164 ms)                                         
2016: C2O (1254375350683-1254375350494=189 ms)                                           
2018: C27N9O15P2 (1254375351927-1254375350684=1243 ms)                                   
2020: H2O2 (1254375352124-1254375351928=196 ms)                                          
2036: C17N3O17P2 (1254375352309-1254375352196=113 ms)                                    
2095: C10N5O4 (1254375352578-1254375352394=184 ms)                                       
2137: C14HO4R (1254375353331-1254375352646=685 ms)                                       
2180: C3NO2 (1254375354199-1254375353469=730 ms)                                         
2184: C9NO5 (1254375354480-1254375354270=210 ms)                                         
2194: C6N4O2 (1254375356738-1254375354485=2253 ms)                                       
2201: C21N7O17P3 (1254375359838-1254375356748=3090 ms)                                   
2228: C6O2 (1254375360480-1254375359912=568 ms)                                          
2240: CO2 (1254375363324-1254375360485=2839 ms)                                          
2327: C5NO2S (1254375370536-1254375363612=6924 ms)                                       
2348: C14N6O5S (1254375371522-1254375370558=964 ms)                                      
2359: C9N2O9P (1254375372236-1254375371544=692 ms)                                       
2367: C9N2O6 (1254375373614-1254375372265=1349 ms)                                       
2370: C5N5 (1254375373975-1254375373615=360 ms)                                          
2404: C10N5O5 (1254375374360-1254375374108=252 ms)                                       
2413: C10N5O10P2 (1254375401639-1254375374373=27266 ms)                                  
2454: C5O5 (1254375401831-1254375401688=143 ms)                                          
2455: C5NO2S (1254375407807-1254375401832=5975 ms)                                       
2470: C11N2O2 (1254375408251-1254375407815=436 ms)                                       
2494: C4NO3 (1254375409200-1254375408373=827 ms)                                         
2499: C5O10P2R (1254375412153-1254375409297=2856 ms)                                     
2525: C4HO7P (1254375412777-1254375412293=484 ms)                                        
2526: C4N2 (1254375414071-1254375412777=1294 ms)                                         
2534: C21N7O14P2 (1254375417657-1254375414091=3566 ms)                                   
2581: C3NO2 (1254375422072-1254375417745=4327 ms)                                        
2638: C4NO4 (1254375424772-1254375422244=2528 ms)                                        
2680: C5O14P3 (1254375426347-1254375424831=1516 ms)                                      
2683: C3NO3 (1254375433063-1254375426353=6710 ms)                                        
2702: C3HO6P (1254375433192-1254375433079=113 ms)                                        
2749: C4N2O3 (1254375434106-1254375433445=661 ms)                                        
2755: C10N4O8P (1254375434417-1254375434113=304 ms)                                      
2756: C5NO2 (1254375436750-1254375434418=2332 ms)                                        
2779: C4NO2S (1254375437847-1254375436759=1088 ms)                                       
2803: C5N4 (1254375438991-1254375437968=1023 ms)                                         
2832: C9NO2 (1254375439226-1254375439026=200 ms)                                         
2844: C8HNO3 (1254375440463-1254375439238=1225 ms)                                       
2856: C5O13P3R (1254375441336-1254375440497=839 ms)                                      
2863: C10O6 (1254375442424-1254375441348=1076 ms)                                        
2873: C10N5O8P (1254375442560-1254375442433=127 ms)                                      
2898: C3HO3 (1254375443712-1254375442655=1057 ms)                                        
2925: C8H4NO6 (1254375443886-1254375443729=157 ms)                                       
3025: CO3 (1254375444508-1254375444131=377 ms)                                           
3031: C10N5O11P2 (1254375444810-1254375444601=209 ms)                                    
3038: C3NO2S (1254375449012-1254375444836=4176 ms)                                       
3042: C4N2O2 (1254375449224-1254375449066=158 ms)                                        
3060: C6O2 (1254375449433-1254375449274=159 ms)                                          
3083: C17N4O9P (1254375450751-1254375449465=1286 ms)                                     
3088: C34FeN4O4 (1254375452873-1254375450848=2025 ms)                                    
3111: C9N2O12P2 (1254375454560-1254375452939=1621 ms)                                    
3119: CNO5P (1254375454774-1254375454563=211 ms)                                         
3122: C9N3O14P3 (1254375454972-1254375454778=194 ms)                                     
3184: C3O3 (1254375455362-1254375455053=309 ms)                                          
3213: CO (1254375455489-1254375455375=114 ms)                                            
3216: C3HO7P (1254375455662-1254375455490=172 ms)                                        
3223: C9N2O6 (1254375455850-1254375455737=113 ms)                                        
3239: C3NO3 (1254375458116-1254375455868=2248 ms)                                        
3296: C9NO3 (1254375459575-1254375458250=1325 ms)                                        
3306: S3R (1254375464014-1254375459596=4418 ms)                                          
3313: C6O6 (1254375464701-1254375464016=685 ms)                                          
3348: C14HO4 (1254375465102-1254375464766=336 ms)                                        
3360: C12O11 (1254375465830-1254375465193=637 ms)                                        
3364: N2 (1254375475917-1254375465872=10045 ms)                                          
3371: C21N7O17P3 (1254375479243-1254375475920=3323 ms)                                   
3377: C6N2O2 (1254375481175-1254375479306=1869 ms)                                       
3379: C3O6P (1254375482278-1254375481176=1102 ms)                                        
3390: O10P3 (1254375484356-1254375482286=2070 ms)                                        
3403: C5N2O3 (1254375486975-1254375484451=2524 ms)                                       
3499: C9NO3 (1254375487745-1254375487074=671 ms)                                         
3502: C5O8P (1254375489044-1254375487747=1297 ms)                                        
3532: C55MgN4O5 (1254375489206-1254375489097=109 ms)                                     
3537: C5NO4 (1254375494872-1254375489209=5663 ms)                                        
3546: Fe (1254375507646-1254375494892=12754 ms)                                          
3554: C5N2O2 (1254375507934-1254375507650=284 ms)                                        
3566: H2 (1254375508526-1254375508033=493 ms)                                            
3576: C12ClN4O7P2S (1254375508737-1254375508548=189 ms)                                  
3577: Mn (1254375511113-1254375508738=2375 ms)                                           
3582: C11NO6P (1254375511249-1254375511120=129 ms)                                       
3628: O7P2 (1254375554180-1254375511388=42792 ms)                                        
3633: O4P (1254375659461-1254375554183=105278 ms)                                        
3647: C12N4OS (1254375659706-1254375659481=225 ms)                                       
3664: C8HNO6P (1254375661230-1254375659713=1517 ms)                                      
3665: C9N4O8P (1254375661450-1254375661231=219 ms)                                       
3679: Mg (1254375679426-1254375661513=17913 ms)                                          
3859: C20N7O6 (1254375679768-1254375679522=246 ms)                                       
3860: C19N7O6 (1254375680069-1254375679769=300 ms)                                       
4026: Ca (1254375681849-1254375680582=1267 ms)                                           
4029: CNOR (1254375682983-1254375681850=1133 ms)                                         
4031: COR2 (1254375686384-1254375682984=3400 ms)                                         
4038: Cl (1254375686610-1254375686387=223 ms)                                            
4099: F (1254375687012-1254375686767=245 ms)                                             
4138: H (1254375722496-1254375687100=35396 ms)                                           
4163: C6NO2 (1254375722805-1254375722566=239 ms)                                         
4166: C6N2O2 (1254375724837-1254375722807=2030 ms)                                       
4167: Mg (1254375746423-1254375724838=21585 ms)                                          
4229: O (1254375754305-1254375746586=7719 ms)                                            
4254: H3O4P (1254375771602-1254375754367=17235 ms)                                       
4263: K (1254375771850-1254375771608=242 ms)                                             
4265: C5NO2 (1254375772195-1254375771852=343 ms)                                         
4297: Na (1254375772801-1254375772310=491 ms)                                            
4311: C4O3R (1254375773107-1254375772835=272 ms)                                         
4313: S (1254375795116-1254375773109=22007 ms)                                           
4356: Zn (1254375814849-1254375795263=19586 ms)                                          
4424: C5O5 (1254375818351-1254375814892=3459 ms)                                         
4453: C2 (1254375818489-1254375818369=120 ms)                                            
4482: C6N3O2 (1254375819699-1254375818525=1174 ms)                                       
4494: C (1254375821009-1254375819706=1303 ms)                                            
4519: Co (1254375821358-1254375821068=290 ms)                                            
4670: C11N2O2 (1254375821817-1254375821583=234 ms)                                       
4677: C4H2O4 (1254375822301-1254375821824=477 ms)                                        
4801: Ni (1254375822605-1254375822450=155 ms)                                            
4912: C5N2O3 (1254375823778-1254375822655=1123 ms)                                       
5060: C5O5 (1254375824119-1254375823908=211 ms)                                          
5111: C6O6 (1254375824420-1254375824212=208 ms)                                          
5143: Cu (1254375824613-1254375824502=111 ms)                                            
5357: C6N4O2 (1254375826277-1254375824919=1358 ms)                                       
5368: C9NO5 (1254375826504-1254375826289=215 ms)                                         
5369: Fe (1254375826620-1254375826505=115 ms)                                            
5380: C3HO6P (1254375827340-1254375826635=705 ms)                                        
5398: Na (1254375827949-1254375827359=590 ms)                                            
5400: K (1254375828174-1254375827951=223 ms)                                             
5402: Zn (1254375844116-1254375828175=15941 ms)                                          
5404: Ca (1254375845806-1254375844117=1689 ms)                                           
5438: HO (1254375846125-1254375845836=289 ms)                                            
5538: CH3 (1254375847891-1254375846233=1658 ms)                                          
5548: Ca (1254375861972-1254375847928=14044 ms)                                          
5560: H2N (1254375866398-1254375861980=4418 ms)                                          
5693: C10O6 (1254375867526-1254375866499=1027 ms)                                        
5814: O7P2 (1254375910579-1254375867608=42971 ms)                                        
5869: C2NOR2 (1254375914124-1254375910633=3491 ms)                                       
5871: C3NOSR2 (1254375914574-1254375914161=413 ms)                                       
5873: C6N4OR2 (1254375916764-1254375914575=2189 ms)                                      
5875: C11N2OR2 (1254375917143-1254375916766=377 ms)                                      
5877: C4NO3R2 (1254375917710-1254375917145=565 ms)                                       
5885: C6N2OR2 (1254375919573-1254375917716=1857 ms)                                      
5889: C5NO3R2 (1254375920901-1254375919576=1325 ms)                                      
5895: C6N3OR2 (1254375922306-1254375920904=1402 ms)                                      
5900: C5NO4 (1254375925689-1254375922310=3379 ms)                                        
5902: C5NO4 (1254375930626-1254375925693=4933 ms)                                        
5903: C5NO4 (1254375933920-1254375930626=3294 ms)                                        
5906: C4NO4 (1254375934593-1254375933924=669 ms)                                         
5907: C4NO4 (1254375935274-1254375934594=680 ms)                                         
5909: C4NO4 (1254375936451-1254375935274=1177 ms)                                        
5911: C9NOR2 (1254375936575-1254375936453=122 ms)                                        
5913: C3NO2R2 (1254375940197-1254375936577=3620 ms)                                      
5914: C3NOSeR2 (1254375940307-1254375940197=110 ms)                                      
5920: C6NOR2 (1254375940705-1254375940311=394 ms)                                        
5925: C5N2O2R2 (1254375942662-1254375940737=1925 ms)                                     
5926: C4NO2R2 (1254375943012-1254375942662=350 ms)                                       
5939: C4O4 (1254375943199-1254375943061=138 ms)                                          
5993: C2O2 (1254375943413-1254375943287=126 ms)                                          
6082: Zn (1254375958993-1254375943449=15544 ms)                                          
6453: C10N5O13P3 (1254376116554-1254375959193=157361 ms)                                 
6574: CHO2 (1254376116903-1254376116743=160 ms)                                          
6706: C5O5 (1254376117248-1254376117131=117 ms)                                          
7032: C3O4 (1254376117689-1254376117435=254 ms)                                          
7104: C5NO3R2 (1254376118481-1254376117843=638 ms)                                       
7252: C5NOSR2 (1254376118731-1254376118630=101 ms)                                       
7411: C3O3 (1254376120056-1254376119011=1045 ms)                                         
7465: CR (1254376121752-1254376120212=1540 ms)                                           
7627: CNO (1254376122089-1254376121887=202 ms)                                           
7741: C12ClN4OS (1254376122320-1254376122156=164 ms)                                     
7858: CO2R (1254376122547-1254376122436=111 ms)                                          
7891: Fe4S4 (1254376122844-1254376122585=259 ms)                                         
8178: Mn (1254376124399-1254376122960=1439 ms)                                           
8338: C4NO2 (1254376124643-1254376124480=163 ms)                                         
9219: C4NO6P (1254376127494-1254376124951=2543 ms)                                       
9234: C9N3O14P3 (1254376127605-1254376127498=107 ms)                                     
9235: C10N5O14P3 (1254376132596-1254376127605=4991 ms)                                   
9305: C3NO6P (1254376143823-1254376132629=11194 ms)                                      
9311: C6O6 (1254376144017-1254376143825=192 ms)                                          
9402: C5NOSR (1254376144332-1254376144214=118 ms)                                        
9427: C4O2R2 (1254376144645-1254376144419=226 ms)                                        
10281: O10P3 (1254376146646-1254376144942=1704 ms)                                       
10308: C2OR (1254376148204-1254376146659=1545 ms)                                        
10453: CHOR (1254376148682-1254376148321=361 ms)                                         
10506: HOR (1254376149933-1254376148793=1140 ms)                                         
10589: C21N7O17P3 (1254376150485-1254376150182=303 ms)                                   
10602: C5N2O2R (1254376150727-1254376150503=224 ms)                                      
10604: C5N2OR2 (1254376150946-1254376150728=218 ms)                                      
10614: C5N2O2 (1254376151170-1254376150949=221 ms)                                       
10617: C5N2OR (1254376151389-1254376151171=218 ms)                                       
10641: C3O6P (1254376152229-1254376151404=825 ms)                                        
10656: C16OR (1254376152505-1254376152235=270 ms)                                        
10688: C5O5 (1254376155565-1254376152582=2983 ms)                                        
10690: C3NO5PR2 (1254376166856-1254376155566=11290 ms)                                   
10729: C12N4O7P2S (1254376167090-1254376166889=201 ms)                                   
10748: C3NOR2 (1254376171309-1254376167097=4212 ms)                                      
10756: C34O4 (1254376172041-1254376171320=721 ms)                                        
10760: C9N2O15P3 (1254376172162-1254376172045=117 ms)                                    
10786: OR (1254376172419-1254376172169=250 ms)                                           
10828: C2NO2R (1254376173256-1254376172429=827 ms)                                       
10830: C2NOR (1254376174784-1254376173258=1526 ms)                                       
10883: C9NO2R2 (1254376175110-1254376174827=283 ms)                                      
10899: NR (1254376202420-1254376175114=27306 ms)                                         
10902: C2O (1254376203938-1254376202454=1484 ms)                                         
10914: C9NO5 (1254376204203-1254376203942=261 ms)                                        
11203: C6O6 (1254376204716-1254376204522=194 ms)                                         
11226: C4HO7P (1254376205190-1254376204732=458 ms)                                       
11680: C5NO2SR (1254376205649-1254376205392=257 ms)                                      
11681: C5NOSR (1254376205900-1254376205650=250 ms)                                       
11916: H (1254376206483-1254376206109=374 ms)                                            
12216: C5NOR2 (1254376208374-1254376206750=1624 ms)                                      
12217: C4N2O2R2 (1254376209040-1254376208375=665 ms)                                     
12680: COSR2 (1254376209601-1254376209314=287 ms)                                        
13478: C25HN2O19 (1254376210152-1254376209951=201 ms)                                    
13662: C5O8P (1254376210482-1254376210379=103 ms)