Email Twitter Facebook Github ResearchGate LinkedIn Google+ ORCID RSS

Today I’ve released Python module BioCyc that provides an interface to the BioCyc Web API. Acting as a wrapper it queries the database and then presents the XML returned in a pythonic object-based interface. Support for IPython views is included offering nice summary tables of object attributes.

BioCyc

BioCyc are approaching the renewal period for their NIH grant. If you find the tools useful please consider writing a letter of support. If you use EcoCyc there is a seperate call. It’s incredibly important to keep public databases like these available for both research and educational value. I know they’ve been indispensable in my PhD.

Interface

The BioCyc interface provides acces to most attributes, with inter-object links presented as lazy-loading lists. These links are followed and auto-queried on access, allowing navigation through the entire database tree by simply accessing object attributes and slices.

The interface is throttled to one request per second (by request of BioCyc). However, the module comes with a built-in cache (stored by default under ~/.biocyc) that stores retrieved objects for future use. As such subsequent requests are much quicker. Multiple and configurable caches may be used, and it’s possible to share caches across multiple machines.

To install, get on the command line and type:

pip install BioCyc

or download from PyPi or Github.

A demo IPython notebook (available here) is walked through below.

Basic initialisation

Import the biocyc object from the biocyc module. This object provides the base access to the database for the initial get. You can set the organism using set_organism and one of the standard BioCyc database identifiers. Note that this only affects the organism-database used for direct requests on the biocyc object. Sub-requests on existing objects will use the same database as that object (otherwise things would be very confusing indeed).

import os
from biocyc import biocyc
os.environ['http_proxy'] = '' # Set your proxy if neccessary
biocyc.set_organism('meta')

Making a request

To get an database object (of any type) simply using the unique BioCyc identifiers for it. Here we request L-Lactate. Note that if you do this from within an IP[y] Notebook you get a nice table output of all associated attributes for an object. This includes direct links to the BioCyc database and other database annotations.

o=biocyc.get('L-LACTATE')
o
Name(S)-lactate
BioCyc IDL-LACTATE
Org IDMETA
SynonymsL-lactate, L(+)-lactate
INCHIInChI=1S/C3H6O3/c1-2(4)3(5)6/h2,4H,1H3,(H,5,6)/p-1/t2-/m0/s1
Molecular weight89.071
Gibbs 0-72.55646
ParentsL-2-hydroxyacids, Lactate
ReactionsTRANS-RXN-104, RXN-12165, RXN-12096, LACTALDDEHYDROG-RXN, RXN0-5269, D-LACTATE-2-SULFATASE-RXN, TRANS-RXN-104, L-LACTDEHYDROGFMN-RXN, LACTATE-MALATE-TRANSHYDROGENASE-RXN, LACTATE-2-MONOOXYGENASE-RXN, L-LACTATE-DEHYDROGENASE-CYTOCHROME-RXN, L-LACTATE-DEHYDROGENASE-RXN, RXN-9067, RXN-8076, PROPIONLACT-RXN, LACTATE-RACEMASE-RXN, LACTATE-ALDOLASE-RXN
Database linksCAS: 79-33-4, PUBCHEM: 5460161, LIGAND-CPD: C00186, CHEMSPIDER: 4573803, CHEBI: 16651, BIGG: 34179

Exploring further

Now we have an object we can perform sub-queries by accessing fields. If you access the o.reactions field you will trigger a dynamic request for all entities in that list. Connections to the BioCyc server are throttled at 1/second, so this may take a little while on long lists. However, retrieved data is cached under ~/.biocyc so subsequent requests will be much quicker. By default the cache is set to expire objects after ~6 months, and the cache folder can be shared between multiple machines.

Note: If you just want access to the identifiers, you can use the “o._reactions“ field to access these without triggering a request

r = o.reactions
r[0]
BioCyc IDTRANS-RXN-104
Org IDMETA
ParentsSmall-Molecule-Reactions, TR-12
r[1]
NameNADP+ L-lactaldehyde dehydrogenase
BioCyc IDRXN-12165
Org IDMETA
ParentsChemical-Reactions, Small-Molecule-Reactions
PathwaysPWY-6713

You can access sub-entities and manipulate objects using standard Python list processing.

ps = [r.pathways for r in o.reactions]
p = [p for sl in ps for p in sl]
p
[L-rhamnose degradation II,
 L-rhamnose degradation III,
 L-rhamnose degradation II,
 methylglyoxal degradation V,
 lactate biosynthesis (archaea),
 L-lactaldehyde degradation (aerobic),
 L-lactaldehyde degradation (aerobic),
 methylglyoxal degradation V,
 pyruvate fermentation to lactate,
 glucose and xylose degradation,
 Bifidobacterium shunt,
 heterolactic fermentation,
 factor 420 biosynthesis]
p[0]
NameL-rhamnose degradation II
BioCyc IDPWY-6713
Org IDMETA
Synonymsaldolase pathway
ParentsL-rhamnose-Degradation
SpeciesTAX-5580, ORG-6176, TAX-95486, TAX-284592, TAX-322104
Taxonomic rangeTAX-2, TAX-4751

Finally

That’s all for now! Hopefully this shows how Python (and IPython notebook) access to the BioCyc Web API may be useful. Support for additional attributes, API calls etc. is planned for the future. If you have specific requests, get in touch!