Today I’ve released Python module BioCyc that provides an interface to the BioCyc Web API. Acting as a wrapper it queries the database and then presents the XML returned in a pythonic object-based interface. Support for IPython views is included offering nice summary tables of object attributes.
BioCyc are approaching the renewal period for their NIH grant. If you find the tools useful please consider writing a letter of support. If you use EcoCyc there is a seperate call. It’s incredibly important to keep public databases like these available for both research and educational value. I know they’ve been indispensable in my PhD.
The BioCyc interface provides acces to most attributes, with inter-object links presented as lazy-loading lists. These links are followed and auto-queried on access, allowing navigation through the entire database tree by simply accessing object attributes and slices.
The interface is throttled to one request per second (by request of BioCyc). However, the module comes with a built-in cache (stored by default under ~/.biocyc) that stores retrieved objects for future use. As such subsequent requests are much quicker. Multiple and configurable caches may be used, and it’s possible to share caches across multiple machines.
To install, get on the command line and type:
pip install BioCyc
A demo IPython notebook (available here) is walked through below.
Import the biocyc object from the biocyc module. This object provides the base access to the database for the initial get. You can set the organism using set_organism and one of the standard BioCyc database identifiers. Note that this only affects the organism-database used for direct requests on the biocyc object. Sub-requests on existing objects will use the same database as that object (otherwise things would be very confusing indeed).
import os from biocyc import biocyc os.environ['http_proxy'] = '' # Set your proxy if neccessary biocyc.set_organism('meta')
Making a request
To get an database object (of any type) simply using the unique BioCyc identifiers for it. Here we request L-Lactate. Note that if you do this from within an IP[y] Notebook you get a nice table output of all associated attributes for an object. This includes direct links to the BioCyc database and other database annotations.
|Reactions||TRANS-RXN-104, RXN-12165, RXN-12096, LACTALDDEHYDROG-RXN, RXN0-5269, D-LACTATE-2-SULFATASE-RXN, TRANS-RXN-104, L-LACTDEHYDROGFMN-RXN, LACTATE-MALATE-TRANSHYDROGENASE-RXN, LACTATE-2-MONOOXYGENASE-RXN, L-LACTATE-DEHYDROGENASE-CYTOCHROME-RXN, L-LACTATE-DEHYDROGENASE-RXN, RXN-9067, RXN-8076, PROPIONLACT-RXN, LACTATE-RACEMASE-RXN, LACTATE-ALDOLASE-RXN|
|Database links||CAS: 79-33-4, PUBCHEM: 5460161, LIGAND-CPD: C00186, CHEMSPIDER: 4573803, CHEBI: 16651, BIGG: 34179|
Now we have an object we can perform sub-queries by accessing fields. If you access the o.reactions field you will trigger a dynamic request for all entities in that list. Connections to the BioCyc server are throttled at 1/second, so this may take a little while on long lists. However, retrieved data is cached under ~/.biocyc so subsequent requests will be much quicker. By default the cache is set to expire objects after ~6 months, and the cache folder can be shared between multiple machines.
Note: If you just want access to the identifiers, you can use the “o._reactions“ field to access these without triggering a request
r = o.reactions r
|Name||NADP+ L-lactaldehyde dehydrogenase|
You can access sub-entities and manipulate objects using standard Python list processing.
ps = [r.pathways for r in o.reactions] p = [p for sl in ps for p in sl] p
[L-rhamnose degradation II, L-rhamnose degradation III, L-rhamnose degradation II, methylglyoxal degradation V, lactate biosynthesis (archaea), L-lactaldehyde degradation (aerobic), L-lactaldehyde degradation (aerobic), methylglyoxal degradation V, pyruvate fermentation to lactate, glucose and xylose degradation, Bifidobacterium shunt, heterolactic fermentation, factor 420 biosynthesis]
|Name||L-rhamnose degradation II|
|Species||TAX-5580, ORG-6176, TAX-95486, TAX-284592, TAX-322104|
|Taxonomic range||TAX-2, TAX-4751|
That’s all for now! Hopefully this shows how Python (and IPython notebook) access to the BioCyc Web API may be useful. Support for additional attributes, API calls etc. is planned for the future. If you have specific requests, get in touch!