## Monday, January 31, 2011

### In Silico Toxicology

Today my copy of In Silico Toxicology: Principles and Applications (Issues in Toxicology) arrived at the local Library. Library with a capital L: we requested the book 1.5 weeks ago, got confirmation it got ordered last week Monday, and just picked it up. Well done!

The book covers in 24 chapters the prediction of toxicological properties of small molecules, and extensively discusses aspects of QSAR studies. As such, various Blue Obelisk tools are described, including the CDK (page 184 and 413), Bioclipse (page 414), JOELib (page 185), OpenBabel (page 414), Oscar (page 415), as well as other Open Source tools, including AMBIT (page 315 ff), InChI (page 72), Toxtree (page 313 and 417), and others. (Nina, should I classify AMBIT and ToxTree as Blue Obelisk projects, now that you are a Blue Obelisk Award winner?)

Much of these are actually listed in Chapter 17, Open Source Tools for Read-Across and Category Formation, by Nina et al., but also worth mentioning is the prominent placement of Open Source tools in Chapter 6 by Uko Maran et al., Molecular Descriptors from Two-Dimensional Chemical Structure, where the REACH documentation focuses on proprietary tools.

Obviously, these are not the bits I am eager to read. Instead, I'm very much looking forward to reading the chapters on data quality (Ch. 4), model validation (Ch. 11), application domain (Ch. 12), linking chemical structures to adverse reactions (Ch. 14), and toxicokinetics (Ch. 21). The book sums up to some 669 pages, so it will keep me busy for the next few hours ;)

### Extracting 3D coordinates from PDB files with Groovy CDK

BioStar featured a question today about how to extract 3D coordinates from PDB files. Now, a simple script may actually do, and BioJava was mentioned too. Of course, it can also be done with the CDK. For example using this Groovy code:

import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.io.*;
import org.openscience.cdk.tools.manipulator.*;
import org.openscience.cdk.io.IChemObjectReader.Mode;
import org.openscience.cdk.*;
import java.io.File;
import java.util.zip.GZIPInputStream;

reader = new PDBReader(
new GZIPInputStream(
new URL(
"http://www.pdb.org/pdb/files/1CRN.pdb.gz"
).openStream()
)
);
crambin = reader.read(new ChemFile());
for (container in
ChemFileManipulator.getAllAtomContainers(
crambin
)) {
for (atom in container.atoms()) {
println atom.point3d;
}
}


Wondering what the Bioclipse Scripting Language version would look like... and, it can likely be done with Jmol script too.

## Sunday, January 30, 2011

### GitHub Tip: download commits as patches

Some time ago, the brilliant GitHub people gave me the following tip. Rajarshi is lazy, and might find it interesting. By appending .patch to the commit URL, a commit can easily be downloaded as patch. That way, developers can easily download it with wget or curl and apply it locally with git am, without having the fetch the full repository.

For example, Dmitry made this commit in his branch, having the URL https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625. The patch for this commit can then be downloaded at this URL https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625.patch.

## Tuesday, January 25, 2011

### My professional network (according to LinkedIn)

LinkedIn has a nice new visualization app, InMaps. I quite like LinkedIn, though not all aspects, but most anyway. I particularly like it focuses on work-relationships. This new app visualizes my professional network, and colors it by network organization:

This one is just a static image, but as creator, you get a zoomable and interactive version. In the image I have labeled the various colored groups that were mined in my network. It is nice to see my positions to show up. The CUBIC is missing as group, but that I can explain by the fact that it has a very strong overlap with my involvement in open source cheminformatics. However, I have no apology for the lack of Uppsala University subgraph. I am embarrassed to say that this likely reflects my inability to really settle in in that position, other than my direct colleges, who are now floating around. Another reason might simply be that Uppsala University is underrepresented on LinkedIn.

## Saturday, January 22, 2011

### "Atomic weights are not constant" !!!

In case you missed it, "standard atomic weights are not constants of nature" (doi:10.1351/PAC-REP-10-09-14)! Wow, chemistry upside down. This is bigger than the new arsenic life they found!

Calm down, calm down. Nothing to see here, move on.

It was actually news a some weeks back, but a tweet by @MatToddChem and a question by Antony on the Blue Obelisk eXchange, made me write up this post.

Facts: 1. atomic weights have never been constant; 2. isotopic weight are constants of nature. The difference is simple, but the public was amazed last month and reaffirmed that science is just another religion. (In fact, the Dutch political wizard Wilders calls religion just a politic ideology, so, science is just politics; Q.E.D. :).

Atomic weights are used to calculate the weight of samples, or, the other way around, how many molecules we have in 1 mg of some organic sample. Now, at this macroscopic level, and looking at carbons, we have actually to do with a mix of, mostly I guess, 12C and 13C, more or less 99% and 1% each. Now, these percentage reflect a mixture. Mixture composition has never been natural constants, so the claim by the authors of the paper is weird, to say the least. In fact, it has been know for years that the isotope ratios vary around the world. Hence, the molecular weight of compound X is not the same here as somewhere else. That's all.

Now, the Blue Obelisk has been making this information aggregated by IUPAC available under a permissive license. The most recent release still has a MIT license, but the next release will be even more permissive an have the CC0 waiver.

### GitToDo Install Guide #1: the command line utilities

Some years ago I was in need of a todo list tool. Of course, the requirements were simple: distributed, version controlled, command line support (I must be able to access it with minimal requirements). I also hooked it up with the Freemind mindmapping tool. I am in the process of installing it on my new laptop, so thought a walk-through might be useful.

First thing to do is to get a copy of the source code (sorry, no binaries yet).Because we are compiling from source we have to install some utilities (using Debian/Ubuntu formalism; tune to your platform):
$sudo aptitude install openjdk-6-jdk git ant We also need to install Java libraries used by GitToDo:$ sudo aptitude install libcommons-cli-java
Then, we are ready to download the source:
$git clone git://github.com/egonw/gtd.git The source for the command line utilities is found in the com.github.gittodo, which is in fact an Eclipse project too:$ cd gtd
$cd com.github.gittodo The README file in this folder explains how to continue, which is first to compile the code with Ant. We first need to tell Ant where the dependencies are found, for which we use a .properties file:$ cp ant.properties.template ant.properties
$nano ant.properties The template properties are written for Debian/Ubuntu systems, but you can tune it to your likes. The compiling itself is then as simple as any project using Ant:$ ant clean make
$sudo ant install We now have our command line utilities installed. However, there is one last step left, or otherwise you get an error message like this:$ gtd-list-items
java.io.FileNotFoundException: /home/egonw/.gtdrc (No such file or directory)

We now need to set up a git repository at a convenient place and let GitToDo know about it. We create the git repository with (I have it in $HOME/var/Projects/hg):$ cd $HOME/var/Projects/hg$ mkdir gtdrepos
$cd gtdrepos$ git init
$git add . And we edit the$HOME/.gtdrc file to point it to this new repository, so that:
$cat ~/.gtdrc Repository=/home/egonw/var/gtdrepos Now we are ready to add a first todo item:$ gtd-create-item Install GitTodo-Freemind and the GitToDo GUI.
gtd-list-items

## Thursday, January 20, 2011

### Is Nature really clueless about Blogs, Twitter, etc? WTF ?!

My apologies for this rant in the early morning, but WTF?? (what the fuzz??) I just got pointed to this Peer review: Trial by Twitter (doi:10.1038/469286a) by Mandavilli. Cool title, but before I even finished seventeen words of the intro... WTF?? Here it is:

Blogs and tweets are ripping papers apart within days of publication, leaving researchers unsure how to react.

What?? Is she mocking me? I know (I have been a reported of a university news paper) that intros must encourage the reader to read on... but What?? (And I read the intro a third time...)

I'll have to read the full thing later, if that makes more sense. But is she clueless? Are all people clueless about blogging, tweeting, etc?? Remember Royce Murray? Has she actually read the Trial by Twitter only so recently?

Dear Mandavilli, in case you do run into this blog post, here's my reply to your intro: "The researchers have no problems how to react, they just did."

Now, after I cooled down a bit, and anticipating I got it all wrong, she might refer to the researchers of the publication being ripped apart. In that case, I am tempted to believe that also in the English language one is expected to use (well, forgive me I do not know the exact term) "leaving the researchers ...", where 'the' links 'researchers' to something said earlier. Now, I read, probably wrong, researchers as any researcher interested in that publication. Mandavilli could even have written "leaving the authors...". But what do we have Nature editors for, right?

Anyways, I do believe this will be an interesting read once I managed to read past (for the fourth time) the intro of this article.

</ripping>

## Wednesday, January 19, 2011

### Re: How can cancer research be open-sourced?

Mark asked on Quora on how can cancer research be open-sourced. So, far I found Quora to be rather noisy, even after signing up only to science related groups, themes, whatever it is called. However, every now and then there is an interesting question like this one.

The question resonated with discussions I had earlier this week. During Peter's Symposium the discussion was restarted on why publishing data in databases is currently not rewarded. I think the answer is really simple: there is no independent organization counting citation statistics. What if Thomson did not calculate citation counts and impact factors? Would we be using them to judge the careers of fellow scientists? If FooBar would calculate H-indices based on data citations would we ignore that? I hardly think so. However, FooBar does not exists, and FooBar is not getting rich because of its citation counts.

From a scientist point of perspective, we see people hold back data and source code, because releasing it reduces the time for the scientist to bring the idea to Nature and Science. Now, in cheminformatics this is hardly a problem, because Nature and Science do generally not recognize fundamental, methodological work from informatics and statistics, despite their now crucial role in many Nature and Science papers. However, for data this is different. By releasing your data Openly (think Panton Principles), you remove your intellectual property that gives you a nice list of co-author papers for your publication list long tail. Mind you, this is not an argument I make up here, but actual practice: "Sure you can use my data/method, but I like to be co-author on your paper then."

Why this is actual practice? Even a paper in the long tail is rewarding. "Wow, he has 250 papers!" As Rich nicely characterizes it: game theory.

So, what if we would replace the papers in that publication list long tail, by points for releasing Open Data and Open Source? I'm all in favor. And no worries about Handles and DOIs. Forget about them. We had Thomson calculate impact factors very long before we had DOIs.

My reply to Mark's question?

First thing that needs to be changed is the academic reward system. At this moment, it is rewarding to hold back information, source code, etc. Because if you do, you make yourself more competitive with respect to publishing in high-ranked journals. Now, if we would reward releasing data into public (Open) databases, that would change. Likewise for software. The new journal http://www.openresearchcomputation.com/ is an attempt at changing this situation (disclaimer: I'm on the editorial board). Of course, there are many kind of rewards. BMC giving out awards for Open Data is another. Another important reward would be financial. If organizations, foundation, etc, would start giving out financial support for Open projects, that will be a great change too. We are starting to see this with a couple of national founding agencies in Europe to have dedicated funding for Open Access publishing

## Monday, January 17, 2011

### The 9th International Conference on Chemical Structures (ICCS)

Later this year the ninth International Conference on Chemical Structures (ICSS) conference will be held in the Netherlands. I had the pleasure of joining this meeting, I think, eight years ago, when I was doing my PhD in Nijmegen. Mind you, I did not attend the conference; I helped with the organization ;) That was a good deal, particularly because I got to meet many cheminformaticians while working behind the registration desk ;)

Actually, my gravatar still reflects that meeting, as it is a picture taken on the boat trip on the Markermeer. That was one great boat trip: I steered a driemaster, and helped out on the boat on ropes outside the deck, meters above the water. Cheminformatics can be so nice! The photo was taken during a calmer part of that boat trip :)

Back to the ICCS. It's one of the bigger cheminformatics meetings, and likely the best after the yearly GCC meetings. Mind you, the term cheminformatics reflects more the methods than the domains. Indeed, the meeting's Call for Papers lists many topics highly relevant to my position here at KI, including chemogenomics, (Q)SAR, literature mining, "integration of medical and biological information" (including semantic web technologies), and in-silico analysis of toxicology, drug safety, and adverse events.

Depending on the schedule this year, I may actually submit an abstract based on what we will do in the next year, and see what happens. The CfP deadline is 31 January.

## Friday, January 14, 2011

### #pmrhack and #pmrsymp (or: what to do the next days?)

The next few days there will be a disturbance of the force: #pmrhack and #pmrsymp. Because I have an important meeting early next week in my new position, I am unable to attend these events. The first is a symposium organize in honor of Peter Murray-Rust, called Visions of a Semantic Molecular Future Symposium, which attracts over 100 people! Maybe Open Source cheminformatics has taken off ;) (Or, maybe it's just people who love to see how it has not.)

The second event is the hackfest/unconference held this weekend. That will be at least as much fun as the symposium. Now, I have plenty of house cleaning to do of our old house (we just moved), but will try to virtually attend that as much as possible too. Mind you, I expect to use many of the technologies the Blue Obelisk develops in my new project.

Anyways, I really regret not being able to attend, and am happy that Noel (thanx!) has taken over advertising some of the Blue Obelisk projects I worked on: Bioclipse and the CDK. (It seems that Christoph will not be attending the meeting either...)

This afternoon the people in Peter's group are working out the technical details for live streaming of the event(s), and I am really looking forward to that! Good luck to all, and hope to see you the next days ;)

## Wednesday, January 12, 2011

### Karolinska Institutet

Hi all, and a happy 2011! After a turbulent December (finishing up in the Oscar project), holidays, etc, I am starting to get some sense of organization in my new position at the Institutet för miljömedicin at the Karolinska Institutet in Stockholm. I will be using cheminformatics and chemometrics methods in toxicology studies, and work in the groups of Prof. Roland Grafström and Prof. Bengt Fadeel.

The actual work will crystallize in the next weeks, but here are two pointers. One is ToxBank, for which the website should go live this month, but Google will provide some ideas about this FP7 project. The other part is into nanotoxicology (see e.g. doi:10.1039/C0NR00535E).

So, if you are around in Solna, give me a ping!