The Importance of Collaboration and Teamwork in Coding Education for Kids

As a passionate coder and an advocate for computer science education, I have witnessed firsthand the transformative power of collaboration and teamwork in coding. In today’s rapidly advancing…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Exploring Graph Networks for IUPAC Nomenclature and SMILES.

I took a long break from machine learning and artificial intelligence to understand chemical data. The base of chemical data is IUPAC, a legacy language that chemists use to communicate with each other about different skeletal patterns of molecules. Like so:

For a computer to understand what we are talking about. We need the machine to understand the IUPAC language because everything in previous literature stands a viable key for more information. Thankfully, we condensed this type of nomenclature into the SMILES language using our own new language with grammar to represent the 2D geometry of the molecule.

I wanted to take a different approach at building neural network architectures and I’m going to start with my own package.

To determine our purple nodes we first have to determine what purple nodes to add into into this mix. We want our computer to understand the IUPAC nomenclature and randomly it assigning it all the data won’t be that constructive it might even damage the machine in it’s learning. Well if we check out all the nodes first:

So let’s make a decision. IUPAC for drug-like molecules tend to be very well thought out and their SMILES are concrete enough to handle organic elements (N, C, O, S, P) and standardized whereas molecules that enter other sub-fields (polymer chemistry , organometallics, and molecules from interstellar space). For example have a look at an entry from RingsInDrugs

A lot of the compounds within this list tend to have a lower IUPAC name with less complexity. Their SMILES are also pretty well defined as piperazine is a common fragment in drug design. If we have a look at some of the interstellar space IUPAC nomenclature:

It can be really weird, it’s tied to other versions of the language that can or cannot be useful. When just setting up the network, I don’t the machine to see this data yet or learn from it. Even if the SMILES is not as complex the language nomenclature is not as common relative to everything else stored within this package. We want data that relates to each other in a gradual contingency. That being said we staple nodes as our first nodes. So for me, I would pick:

Schedule One Narcotics are the U.S Reported lists, these lists contain the exact nomenclature we use in our everyday politics of drug law. Again something modern.

Pihkal is an older book from the early 2000s but still represent a lot of the modern functional groups still persistent today in our society. The list is very rich of 179 compounds to evaluate where the author, Alexander Shulgin, also came up with some unique coding styles.

The next set of nodes we would like would be a gradual step from things that are narcotics into science but also relate to the previous 3 nodes.

Privileged Scaffolds are functional group scaffolds that have been elected by biology as the most useful. We often use these scaffolds in biomimetic synthesis of natural products.

Emerging Perfluoroalkyls common herbicides that are toxic to us. These are long halogen alkanes that shouldn’t be too complex in naming but might through the machine through a loop with the SMILES input:

The last is common organic solvents which are common solvents used in the synthesis of the drug. The solvent’s names are fairly simple but also relate to each one of the previous nodes. So the overall code would be:

And if we install the extensions package we can visualize it a little more cleaner:

Great we set up our node architecture, tune in for the next part on running our network (have to code it first).

Add a comment

Related posts:

Are you being bamboozled by big banking?

According to Nerdwallet.com the average American consumer accrues $1,000 over the course of a decade and right now I know some of you are shaking your heads while others of you are nodding…

The Discovery

Two children discover something in the field and decide to take a closer look.

A Little Flame

Fire is a puzzling thing; it is capable of exciting that virtuous yellow semblance of rebirth despite an inherent destructive potential. It is a paradoxical thing, glowing most brightly in the…