introduction to the biomart api. biomart apis ● biomart_plib - objected oriented perl interface
TRANSCRIPT
Introduction to theBioMart API
BioMart APIs
● Biomart_plib - Objected Oriented Perl interface
Biomart_plib
Architecture
● Object Oriented Perl Based API to BioMart
Datasets
● Uses XML configuration shared by all BioMart
Software
Query logic
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Configuration logic
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Initializing API script
my $confFile = "/home/user/martRegistryFile";my $initializer = BioMart::Initializer->new(‘registryFile’=>$confFile);my $registry = $initializer->getRegistry;
Optional Initializer parameters:• ‘action’ => ‘clean’ - replace the dataset configurations stored on
the local file-system with those from the database and build a new, clean registry object
• ‘action’ => ‘update’ - replace any file-system dataset configurations modified since the last retrieval with the database copies and build a new registry object
• Default behaviour with no action specified is to generate the registry object using the cached file-system configurations if they exist, otherwise retrieve them from the database.
Initializing API script
Optional Initializer parameters (cont)
• ‘mode’ => ‘lazyload’ - only keep a certain number of dataset
configurations in memory at once for low memory machines and
future scalability• Default behaviour with no mode specified is to keep all
configurations in memory.
Building Query
my $query = BioMart::Query->new(‘registry’ => $registry ‘virtualSchemaName’ => ‘default’);
$query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id');
or with optional virtualSchema and interface settings:$query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id’,
’default’,’default’);
$query->addFilter('hsapiens_gene_ensembl','chromosome_name',['1']);$query->addFilter('hsapiens_gene_ensembl','hgnc_symbol',['FGFR1','IL2','DERL3']);
Executing query and printing results
my $query_runner = BioMart::QueryRunner->new();
$query_runner->execute($query);
$query_runner->printResults;
Executing query and printing resultsPrint formatted header:
$query_runner->printHeader;
Print just first 20 results:
$query_runner->printResults(20);
Change the formatter from tab-separated default before execute the
query:
$query->formatter(‘FASTA’);
The formatter has to have a corresponding module in
lib/BioMart/Formatter implementing the FormatterI.pm interface (eg)
CSV, TXT, GTF, XLS etc
Multi dataset queries
my $query = BioMart::Query->new('registry'=>$registry, 'virtualSchemaName'=>'default');
$query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id');$query->addAttribute('hsapiens_gene_ensembl','ensembl_transcript_id');$query->addAttribute('mmusculus_gene_ensembl','ensembl_gene_id');$query->addAttribute('mmusculus_gene_ensembl','ensembl_transcript_id');
This is the equivalent of picking human as the main dataset in the web interface and mouse as the optional second dataset (ie) the human attributes appear first in the result table followed by the mouse attributes.
Note that BioMart queries are currently restricted to two datasets maximumfor performance reasons and query planning technical difficulties.
Web services type access
● To support GRID projects such as Taverna and other third party users who want to federate mart data without leaving a port to the database server openly accessible.
Web services type access
http://test.biomart.org/cgi-bin/martservice?query=
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "defaultSchema">
<Dataset name = "hsapiens_gene_ensembl">
<Attribute name = ”ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<ValueFilter name = "chromosome_name" value = "1"/>
</Dataset>
</Query>
Web services type access
Change format from default tab-separated format:
<Query virtualSchemaName = "defaultSchema” formatter = “CSV”>
<Dataset name = "hsapiens_gene_ensembl">
<Attribute name = ”ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<ValueFilter name = "chromosome_name" value = "1"/>
</Dataset>
</Query>
Web services type access
Get count instead:
<Query virtualSchemaName = "defaultSchema” count=“1”>
<Dataset name = "hsapiens_gene_ensembl">
<Attribute name = ”ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<ValueFilter name = "chromosome_name" value = "1"/>
</Dataset>
</Query>
Web services type accessMulti-dataset query:
<Query virtualSchemaName = "defaultSchema">
<Dataset name = "mmusculus_gene_ensembl">
<ValueFilter name = "chromosome_name" value = "1"/>
</Dataset>
<Dataset name = "hsapiens_gene_ensembl">
<Attribute name = ”ensembl_gene_id" />
<Attribute name = "chromosome_name" />
<ValueFilter name = "chromosome_name" value = "1"/>
</Dataset>
</Query>
Web services type access
(1) Recover the registry file:
http://test.biomart.org/cgi-bin/martservice?type=registry
(2) Recover the datasets available for a mart:
http://test.biomart.org/cgi-bin/martservice?type=datasets&virtualSchema=default&mart=ensembl
(3) Recover the filters available for a dataset:
http://test.biomart.org/cgi-bin/martservice?type=filters&virtualSchema=default&dataset=hsapiens_gene_ensembl
(4) Recover the attributes available for a dataset:
http://test.biomart.org/cgi-bin/martservice?type=attributes&virtualSchema=default&dataset=hsapiens_gene_ensembl
MartJ
● Java Interface to Biomart Datasets
● Uses XML configuration shared by all BioMart
Software
RegistryDSConfigAdaptor
import org.ensembl.mart.lib.config.RegistryDSConfigAdaptor;
URL confURL = null;
try {
confURL =
InputSourceUtil.getURLForString(“data/defaultMartRegistry.xml”);
} catch (MalformedURLException e) {
throw new ConfigurationException("Warning, could not load "
+ “data/defaultMartRegistry.xml”
+ " file\n");
}
RegistryDSConfigAdaptor adaptor =
new RegistryDSConfigAdaptor(confURL, false, false, false);
DatasetConfig
import org.ensembl.mart.lib.config.DatasetConfig;
DatasetConfig config =
adaptor.getDatasetConfigByDatasetInternalName(
"hsapiens_gene_ensembl",
"default"
);
Query
import org.ensembl.mart.lib.Query;
Query query = new Query();
//query needs some information from the DatasetConfig
query.setDataSource(config.getAdaptor().getDataSource());
query.setMainTables(config.getStarBases());
query.setPrimaryKeys(config.getPrimaryKeys());
FieldAttribute/AttributeDescription
Import org.ensembl.mart.lib.config.AttributeDescription;
import org.ensembl.mart.lib.FieldAttribute;
AttributeDescription adesc =
config.getAttributeDescriptionByInternalName("gene_stable_id");
query.addAttribute(new FieldAttribute( adesc.getField(),
adesc.getTableConstraint(),
adesc.getKey()
)
);
Filter/FilterDescription
There are three types of Filter that can be added to the query, both are
created using the attributes of a FilterDescription
A. BasicFilter
B. BooleanFilter (but watch for the two boolean 'flavors')
C. IDListFilter
FilterDescription
import org.ensembl.mart.lib.config.FilterDescription;
FilterDescription fdesc =
config.getFilterDescriptionByInternalName(“chr_name”);
BasicFilter
import org.ensembl.mart.lib.BasicFilter;
//The config system actually masks alot of complexity
//with regard to filters by requiring the internalName
//again when calling the getXXX methods
query.addFilter(new BasicFilter( fdesc.getField(name),
fdesc.getTableConstraint(name),
fdesc.getKey(name),
"=",
"22"
)
);
BooleanFilter
import org.ensembl.mart.lib.BooleanFilter;
//note there are different types of BooleanFilter
//"boolean" and "boolean_num"
if (fdesc.getType(name).equals("boolean"))
query.addFilter(new BooleanFilter( fdesc.getField(name),
fdesc.getTableConstraint(name),
fdesc.getKey(name),
BooleanFilter.isNULL
)
);
else //”boolean_num”
query.addFilter(new BooleanFilter( fdesc.getField(name),
fdesc.getTableConstraint(name),
fdesc.getKey(name),
BooleanFilter.isNotNULL_NUM
)
);
IDListFilter
import org.ensembl.mart.lib.IDListFilter;
String[] ids = new String[] { “ENSG00000146556.4”,
“ENSG00000197194.1”,
“ENSG00000197490.1”,
“ENSG00000177693.1”
};
query.addFilter(new IDListFilter( fdesc.getField(name),
fdesc.getTableConstraint(name),
fdesc.getKey(name),
ids
)
);
Engine
import org.ensembl.mart.lib.Engine;
import org.ensembl.mart.lib.FormatSpec;
Engine engine = new Engine();
engine.execute(
query,
new FormatSpec(FormatSpec.TABULATED, "\t"),
System.out
);
Future of MartJ
In the future, MartJ will be refactored to use the more flexible
Architecture that we developed for the perl based software.