tools python

BioCyc (API)

Python interface to the BioCyc Web API

BioCyc is a Python interface to the BioCyc. Acting as a wrapper it queries the database and then presents the XML returned in a pythonic object-based interface. Support for IPython views is included offering nice summary tables of object attributes.

BioCyc are approaching the renewal period for their NIH grant. If you find the tools useful please consider writing a letter of support. If you use EcoCyc there is a seperate call. It's incredibly important to keep public databases like these available for both research and educational value. I know they've been indispensable in my PhD.

Interface

The BioCyc interface provides acces to most attributes, with inter-object links presented as lazy-loading lists. These links are followed and auto-queried on access, allowing navigation through the entire database tree by simply accessing object attributes and slices.

The interface is throttled to one request per second (by request of BioCyc). However, the module comes with a built-in cache (stored by default under ~/.biocyc) that stores retrieved objects for future use. As such subsequent requests are much quicker. Multiple and configurable caches may be used, and it's possible to share caches across multiple machines.

To install, get on the command line and type:

bash

pip install BioCyc

or download from PyPi or Github.

A demo IPython notebook (available here) is walked through below.

Basic initialisation

Import the biocyc object from the biocyc module. This object provides the base access to the database for the initial get. You can set the organism using set_organism and one of the standard BioCyc database identifiers. Note that this only affects the organism-database used for direct requests on the biocyc object. Sub-requests on existing objects will use the same database as that object (otherwise things would be very confusing indeed).

python

import os
from biocyc import biocyc
os.environ['http_proxy'] = '' # Set your proxy if neccessary
biocyc.set_organism('meta')

Over 10,000 developers have bought Create GUI Applications with Python & Qt!

To support developers in [[ countryRegion ]] I give a [[ localizedDiscount[couponCode] ]]% discount on all books and courses.

[[ activeDiscount.description ]] I'm giving a [[ activeDiscount.discount ]]% discount on all books and courses.

Making a request

To get an database object (of any type) simply using the unique BioCyc identifiers for it. Here we request L-Lactate. Note that if you do this from within an IP[y] Notebook you get a nice table output of all associated attributes for an object. This includes direct links to the BioCyc database and other database annotations.

python

o=biocyc.get('L-LACTATE')
o

Name	(S)-lactate
BioCyc ID	L-LACTATE
Org ID	META
Synonyms	L-lactate, L(+)-lactate
INCHI	InChI=1S/C3H6O3/c1-2(4)3(5)6/h2,4H,1H3,(H,5,6)/p-1/t2-/m0/s1
Molecular weight	89.071
Gibbs 0	-72.55646
Parents	L-2-hydroxyacids, Lactate
Reactions	TRANS-RXN-104, RXN-12165, RXN-12096, LACTALDDEHYDROG-RXN, RXN0-5269, D-LACTATE-2-SULFATASE-RXN, TRANS-RXN-104, L-LACTDEHYDROGFMN-RXN, LACTATE-MALATE-TRANSHYDROGENASE-RXN, LACTATE-2-MONOOXYGENASE-RXN, L-LACTATE-DEHYDROGENASE-CYTOCHROME-RXN, L-LACTATE-DEHYDROGENASE-RXN, RXN-9067, RXN-8076, PROPIONLACT-RXN, LACTATE-RACEMASE-RXN, LACTATE-ALDOLASE-RXN
Database links	CAS: 79-33-4, PUBCHEM: 5460161, LIGAND-CPD: C00186, CHEMSPIDER: 4573803, CHEBI: 16651, BIGG: 34179

Exploring further

Now we have an object we can perform sub-queries by accessing fields. If you access the o.reactionsfield you will trigger a dynamic request for all entities in that list. Connections to the BioCyc server are throttled at 1/second, so this may take a little while on long lists. However, retrieved data is cached under ~/.biocycso subsequent requests will be much quicker. By default the cache is set to expire objects after ~6 months, and the cache folder can be shared between multiple machines.

Note: If you just want access to the identifiers, you can use the `o._reactionsfield to access these without triggering a request

python

r = o.reactions
r[0]

BioCyc ID	TRANS-RXN-104
Org ID	META
Parents	Small-Molecule-Reactions, TR-12

python

r[1]

Name	NADP⁺ L-lactaldehyde dehydrogenase
BioCyc ID	RXN-12165
Org ID	META
Parents	Chemical-Reactions, Small-Molecule-Reactions
Pathways	PWY-6713

You can access sub-entities and manipulate objects using standard Python list processing.

python

ps = [r.pathways for r in o.reactions]
p = [p for sl in ps for p in sl]
p

[L-rhamnose degradation II,
L-rhamnose degradation III,
L-rhamnose degradation II,
methylglyoxal degradation V,
lactate biosynthesis (archaea),
L-lactaldehyde degradation (aerobic),
L-lactaldehyde degradation (aerobic),
methylglyoxal degradation V,
pyruvate fermentation to lactate,
glucose and xylose degradation,
Bifidobacterium shunt,
heterolactic fermentation,
factor 420 biosynthesis]

python

p[0]

Name	L-rhamnose degradation II
BioCyc ID	PWY-6713
Org ID	META
Synonyms	aldolase pathway
Parents	L-rhamnose-Degradation
Species	TAX-5580, ORG-6176, TAX-95486, TAX-284592, TAX-322104
Taxonomic range	TAX-2, TAX-4751

Finally

That's all for now! Hopefully this shows how Python (and IPython notebook) access to the BioCyc Web API may be useful. Support for additional attributes, API calls etc. is planned for the future. If you have specific requests, get in touch!

Elsewhere

Getting Started with Pathomx

Gremlins in the Machine
Creating custom tools for the Pathomx data analysis platform

MetaboHunter (API)
1D NMR Metabolite Identification from Python

Web Browser
Build Your Own PyQt App