Saturday, May 12, 2012

Maximum Common Substructures and fmcs

Here's the place to comment about my posts related to my maximum common substructure algorithm, fmcs tool based on the algorithm, and MCS benchmarking. The relevant essays are:

6 comments:

Geoff said...

Hi Andrew-
You should check out Hussain & Rea, JCIM 2010, 50, 339-348 as it seems relevant & interesting (though they are more focused directly on applications in matched molecular pairs).
Geoff

Andrew Dalke said...

Hi Geoff,

Unfortunately, that's behind the great ACS firewall, and I don't feel like paying $35 to take a look at it. My wife's university is mostly undergrad/non-sciences, and they don't have access to the journal. The local chemistry library doesn't allow non-students/faculty to use their online ACS registration. So it will have to wait until I figure out a way to read the paper.

Unknown said...

Hi Andrew!

I can't believe your're going back at the MCS problem, I saw how stressful it was for you all those years ago. I remember that once you were given an optimization problem you quickly started dreaming about it.

Great code as always.

Does this approach support seeding of the initial match? We had good success back in the day when I took over your codebase.

-Brian

Andrew Dalke said...

Yeah, and developing this version also got to me. Just ask my wife.

As for seeding the match, it's not yet trivial. What you need to do is tag the atoms by isotope number, where the matching atoms have the same isotope, and otherwise atoms of the same class have their own isotope number.

So if you do the tagging yourself, either in the SMILES strings or with a user-defined tag in the SD file, then it will work. This is part of user-defined atom tagging which I would like to do in the near future.

That's not the most efficient solution. In theory it wouldn't be hard to change the initial seed for the MCS so that it's the core substructure. That's doable, but more work, and I'm not sure about the API.

Hannes said...

Hi,

we are currently using fmcs to do pair-wise matching in alchemical free energy setups. Fmcs has been really a great tool but we also experienced problems with certain structures needing runs for hours or more. That's probably not surprising given the nature of the problem but I wonder if I could possibly find a way to rationalise what structures could end up running for too long.

Cheers,
Hannes.

Andrew Dalke said...

Hi Hannes,

Systems with many rings and especially with lots of local symmetry are the most difficult. These are hard to prune quickly. The easiest way to characterize the problem is use the "-v -v" option, to give full verbosity. This will report the newest best hit when it finds one. This might help you understand where it's being stuck with your system.

I've noticed that it usually finds an MCS, or a near-MCS, rather quickly. If you use a --timeout of 60 seconds, you'll likely have a good hit, without the multi-hour performance time.

There's also one optimization I've wanted to add, which should help some. If you're interested in funding the effort, let me know.