psi meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · psi...

61
Molecular interactions PSI meeting 2013 IntAct team [email protected]

Upload: others

Post on 14-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Molecular interactions

PSI meeting 2013

IntAct [email protected]

Page 2: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Outline● Summary of 2012/2013 activities and achievements

● MIRIAM and identifiers.org

● MITAB 2.7 and MIQL 2.7

● Clustering

● New PSICQUIC reference implementation

● PSICQUIC view update

● Data Distribution Best Practices

Page 3: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Summary of 2012/2013 activities and achievements

Page 4: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

PSICQUIC Hackathon● 28th May – 1st June 2012

● 10 developers from 7 different partners● BioJS, Cytoscape, DIP, InnateDB, IntAct, MatrixDB, MINT, MPIDB ● http://code.google.com/p/psicquic/wiki/PSICQUICHackathon2012

● 2 working groups● SOLR team :

● reference implementation● indexing MITAB 2.5, 2.6 and 2.7 using SOLR● MIQL 2.7 ● XML indexing and PSICQUIC webservices improvements

● Client team : ● PSICQUIC view visualization: table, network and search● Cytoscape plugin

Page 5: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

2012/2013 releases● MITAB 2.7

http://code.google.com/p/psimi/wiki/PsimiTab27Format● MIQL 2.7

http://code.google.com/p/psicquic/wiki/MiqlReference27

● PSICQUIC reference implementation http://code.google.com/p/psicquic/wiki/PsicquicSpec_1_3_Rest● LUCENE 1.2.3● SOLR 1.3.9

● PSI-MI java librarieshttp://code.google.com/p/psimi/downloads/list● psi25-xml parser 1.8.3● psimitab parser 1.8.3● psi25-xml to RDF/Biopax converter 1.8.3● Calimocho 2.5.0● Calimocho to XGMML converter 2.5.0.3

● PSICQUIC-view http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml

Page 6: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

PSICQUIC growth+ 25 millions binary interactions

since 2012

+ 2 services since 2012 => total of 28 service and one more in progress (Flybase)

Page 7: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Work in progress...

● PSICQUIC/MITAB 2.7 publication submitted and in review

● PSICQUIC view and download all button

● BioJS : new javascript components for molecular interaction visualization

● Clustering improvements (new web interface, …)

● JAMI (Java framework for molecular interactions)● XML/MITAB validator prototype● Enricher

Page 8: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MIRIAM and Identifiers.org

Page 9: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Introduction: http://identifiers.org/about

Page 10: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MIRIAM/identifiers.org benefits

● PSICQUIC links to data entries (pubmed, uniprot, ensembl...)

➢ Automatic remapping when services down → more reliable links

● Up to date resource with database accession regular expressions

➢ Do not duplicate work in psi-mi ontology

Page 11: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

More reliable PSICQUIC links (1)• Several locations/resources for accessing uniprot P00533

3 existing resources for accessing P00533

Identifiers.org/uniprot/P00533

Page 12: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

More reliable PSICQUIC links (2)• Use the most reliable location/resource for uniprot P00533

Identifiers.org/uniprot/P00533?profile=most_reliable

Page 13: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

More reliable PSICQUIC links (3)• Use the uniprotkb location/resource for uniprot P00533

Identifiers.org/uniprot/P00533?resource=MIR:00100134

Page 14: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Up to date database links and regular expressions (1)

Page 15: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Up to date database links and regular expressions (2)

Page 16: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

What next?● The CV MI database terms should have xrefs to MIRIAM

namespace

● The regular expressions in the database MI terms could be obsoleted to rely on MIRIAM

- Hierarchy information - No data/formats update - Relies on MIRIAM for the regular expressions and links

- More work for the MI CV maintainers.- MIRIAM namespaces not visible in MITAB/XML- Need to update PSI-XML validator

Maybe XML 3.0?

Page 17: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7 and MIQL 2.7

Page 18: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: introduction

● Format description at http://code.google.com/p/psicquic/wiki/MITAB27Format

● Extension of MITAB 2.6 and 2.5

● Total of 42 column

Can contain minimum information recommended by MIMIx

Page 19: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: Complex expansion

● Distinguish true binary interactions from binary interactions expanded from n-ary interactions● Know the method used to expand

● Spoke● Matrix● Bipartite

● psi-mi:”MI:1060” (spoke expansion)● psi-mi:”MI:1061” (matrix expansion)● psi-mi:”MI:1062” (bipartite expansion)

Recognized for backward compatibility

Page 20: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: re-build n-ary from spoke expansion?

A BC D

Interaction id 1

Interaction id 2

E FG

5 binary interactions 2 n-ary interactions

bait prey

A B

C

D

bait

A

A

bait

prey

prey

● Interaction id 1● Spoke

● Interaction id 1● Spoke

● Interaction id 1● Spoke

E F

G

bait

E

bait

prey

prey

● Interaction id 2● Spoke

● Interaction id 2● Spoke

Need ● interactor id● expansion

method● interaction id

Not enough ● Publication● Detection method● Host organism● Interaction type

Page 21: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: re-build n-ary from bipartite expansion?

A BC D

Interaction id 1

Interaction id 2

E FG

7 binary interactions 2 n-ary interactions

interactionI1 A

interactor

● Bipartite

Need ● interactor id● expansion

method● interaction id

interactionI1 B

interactor

● Bipartite

interactionI1 C

interactor

● Bipartite

interactionI1 D

interactor

● Bipartite

interactionI2 E

interactor

● Bipartite

interactionI2 F

interactor

● Bipartite

interactionI2 G

interactor

● Bipartite

Page 22: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: re-build n-ary from matrix expansion?

A BC D

Interaction id 1

Interaction id 2

E FG

9 binary interactions2 n-ary interactions

CA

DA

A B

● Interaction id 2● Matrix

Need ● interactor id● expansion

method● interaction id

Not enough ● Publication● Detection method● Host organism● Interaction type

CB

DB

CD

GE

EF

GF

● Interaction id 2● Matrix

● Interaction id 2● Matrix

● Interaction id 1● Matrix

● Interaction id 1● Matrix

● Interaction id 1● Matrix

● Interaction id 1● Matrix

● Interaction id 1● Matrix

● Interaction id 1● Matrix

Page 23: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: MIMIx columns

● Participant's biological roles (col 17 and 18)➢ Ex: psi-mi:”MI:0684” (ancillary)

● Participant's experimental roles (col 19 and 20)➢ Ex: psi-mi:”MI:0496” (bait)

● Participant identification methods (col 41 and 42)➢ Ex: psi-mi:”MI:0113” (western blot)

● Host organism for the experiment (col 29)➢ Ex: taxid:-1 (in vitro)

Page 24: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: new types of interactions accepted

● Negative interactions (col 36)

● Self interactions:– homodimers, homotrimers, …

– auto-catalysis, …

P P

P

Inter-molecular

Intra-molecular

Unique id A (col 1)

Unique id B (col 2)

…. Stoichiometry A(col 39)

Stoichiometry B (col 40)

P P ... x 0

Unique id A (col 1)

Unique id B (col 2)

…. Stoichiometry A(col 39)

Stoichiometry B (col 40)

P - ... 1 -

Page 25: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: interactor types

● Columns 21 and 22➢ Ex: psi-mi:”MI:0327” (peptide)

● Solve some ambiguity with interactor identifiers

● More precise than registry tags

Page 26: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: interactor and interaction xrefs

● Interactor xrefs (col 23 and 24)

● Interaction xrefs (col 25)➢ Ex: go:"GO:0005057"(receptor signaling protein

activity)

➢ Ex: intact:EBI-626658(see-also)

• To give more information about interactor or interaction

• Not an identifier• Allows to lighten the 6 first columns• Not used for clustering• use cross reference type

Page 27: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: interactor and interaction annotations

● Interactor annotations

(col 26 and 27)

● Interaction annotations

(col 28)➢ Ex: dataset:Cancer - Interactions

investigated in the context of cancer

➢ Ex: imex-curation

PSICQUIC Registry tags

Page 28: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: participant's features● Users want to ask: “show me all evidence where molecule X has binding domains”

Binding sites AND other features (eg. Tags, PTMs,..)

yes

binding site:51-124(IPR003651)binding site:45..53-119..129binding site:n-51,99-123gst tag:c-chis tag:?-?

no

51-124(IPR003651)45..53-119..129n-51,99-123c-c?-?

Page 29: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: more...

● Interaction parameters (col 30)➢ Ex: kd:9.0x10^-7 (molar)

● Creation date (col 31)➢ Ex: 2011/03/15

● Last update date (col 32)➢ Ex: 2011/04/05

● Interactor checksum (col 33 and 34)➢ Ex: rogid:bjwQTTv7ws6z/T+fM8bNGnEsEXk6239

● Interaction checksum (col 35)➢ Ex: rigid:G6RtLd3+FtR/ZtRciwH2vj9R0Tc

Page 30: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB 2.7: limitation and issues

● 42 columns!

● Feature, checksum, confidence and parameter types can only be names

● Cannot represent linked features and inferred interactions

● Cannot export feature xrefs and annotations

● Not all the columns have the same syntax

● Same syntax does not mean same content

● Cell types, tissues and compartments cannot be specified in host organism column.

• Issue when converting to XML where Xref is mandatory

• Cannot recognize MI from MOD terms• Names can be ambiguous

Page 31: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MITAB: what next?

● Only column names● A syntax per column● Customize....

– Number of columns

– Order of columns

Page 32: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MIQL 2.7: introduction

● Fields description at http://code.google.com/p/psicquic/wiki/MiqlReference27

● Extension of MIQL 2.5

● Total of 35 fields

Page 33: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MIQL 2.7: new fields

Page 34: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

MIQL 2.7: examples➢ I want to filter out expanded binary interactions

➢ Complex:”-”

➢ I want to include negative interactions➢ negative:(true OR false)

➢ I want all interactions having parameters➢ param:true

➢ I want all interactions having stoichiometry➢ stc:true

➢ I want all interactions having binding sites➢ ftypeA:”binding site” AND ftypeB:”binding site”

➢ I want all intra-molecular interactions➢ idA:\- OR idB:\-

➢ I want all interactions internally-curated➢ annot:”internally-curated”

Page 35: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

What should we do?● Export and index MITAB 2.7

➢ Complex expansion

➢ MIMIx information

➢ Registry tags and tagging interaction

● Use PSICQUIC registry tags that are important at the interaction level

● Move ➢ Gene names and other names to alias columns (col 5 and 6)

➢ Extra unique identifiers to alternative identifiers (col 3 and 4)

➢ Rogid, Inchi key and rigid to checksum columns (col 33, 34 and 35)

➢ GO and non identifiers to xref columns (col 23, 24 and 25)

Page 36: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

PSICQUIC clustering

Page 37: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Clustering binary interactions• Clustering = regrouping multiple interaction evidences of a

unique pair of interactors in a single MITAB line.

• It boils down to grouping molecule pairs, hence the importance of describing your molecules properly

• Necessary for a user doing data analysis and interaction networking

• http://code.google.com/p/micluster/

A-B : Y2HA-B : CIPA-C : Y2HA-B : pull downA-D : pull down

A-B : Y2H | CIP | pull downA-C : Y2HA-D : pull down

Page 38: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

How to deal with ambiguous identifiers?

• Depends on the list of identifiers provided by each PSICQUIC service

= 1 interaction but should it be 2?

- Uses one identifier per species- ambiguous identifiers (uniprot gene and organism demerge) can be moved to xrefs

A1-B : A1 → uniprotkb:Q5R7D3|uniprotkb:P08107

+A2-B : A2 → uniprotkb:Q5R7D3

1

2

A2-B : A2 → uniprotkb:P081073

Page 39: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Should we cluster MITAB 2.7?● Lose experiment/interaction hierarchy : some information are

specific to the experiment!– Experimental roles

– Interaction parameters

– Features and tags

– Host organism

● Some fields are confusing when clustered– Complex expansion

– Interactor types

– Negative

– Stoichiometry

● Some fields make sense associated with source● Created date● Update date

Page 40: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Clustering improvements

● Relying on aliases for identifying molecule? => names are not identifiers

● Proposing other clustering options? (sequence+organism, checksum)

● Respect Data Distribution Best practices avoids inconsistent results => better data integration and analysis for the user

Page 41: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Clustering alternatives● Clustering unique binary pairs during

indexing?Ex: a new field 'binary': identifier1-identifier2

● Getting the unique binary pairs is instantaneous

● Can have statistics related to a binary pair

● Identifiers always sorted so always same order

● Possibility to keep relationships of original MITAB

● Needs to agree on common identifiers

● Needs regular protein updates● Not flexible if several identifiers

Page 42: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

New PSICQUIC reference implementation

Page 43: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

LUCENE reference implementation 1.2.3

MITAB 2.5

Lucene indexing (3.0)Calimocho 2.5.0Psimitab parser 1.8.3

PSICQUIC 1.2

MIQL 2.5 (14 fields)

tab25 (default)tab25-binxgmmlBiopaxRDF

● Fix some memory issues (pagination, threads, …)

● Use psimitab parser and XML converter 1.8.3 with bug fixes

● Improved performances XGMML export (no limits of 5000 interactions)

Page 44: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

SOLR reference implementation 1.3.9

MITAB 2.5 SOLR indexing (3.6.0)Calimocho (2.5.0)Spring batch

PSICQUIC 1.3

MIQL 2.7 (35 fields)

tab25 (default)tab26tab27xgmmlBiopaxRDF

● Use psimitab parser and XML converter 1.8.3 with bug fixes (can convert MITAB 2.7 to PSI-XML 2.5)

● Improved performances XGMML export (no limits of 5000 interactions)

● Common SOLR schema

MITAB 2.6

MITAB 2.7

Page 45: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

What is SOLR?

● Web application and web server

● Based on LUCENE => compatible with MIQL

● SolrJ: java API to index/search

● HTTP requests to SOLR

● Caching results

● Provides admin interface

– Browse indexed data

– Access schema and configuration

– Server, cache and index statistics

Page 46: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

SOLR admin interface

Help/documentation Query

Schema, config, statistics

Page 47: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

SOLR results interfaceQuery parametersQuery parameters

Number of results

Document and 'stored' fields

Page 48: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

What is faceting?

● Breaks up search results into multiple categories

● Show counts for each category (facet field)

● Allows user to restrict/filter search based on those facets

Provides statistics about the content of the results for a given query

Page 49: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Example of faceting

Facet results

facet=trueFacet.field=species_s

Page 50: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Search: how is data indexed? (1)

● MIQL 2.7 fields indexed but not stored

● Bug fix: split by ':' and duplicated terms!➢ Ex: MI:0356 => MI, 0356

➢ Ex: taxid:9606(human)|taxid:9606(homo sapiens) => taxid, 9606, human, taxid, 9606, homo, sapiens

● Default fields (free text search)➢ Identifier, pubauth, pubid, interaction_id, detmethod,

type, species

Page 51: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Search: how is data indexed? (2)

● Database, value and text for general xrefs➢ Ex: uniprotkb:P12346 => uniprotkb, P12345 and uniprotkb:P12345

➢ Ex: taxid:8906(human) => taxid, 9606, human and taxid:9606

➢ Ex: uniprotkb:brca2(gene name) => uniprotkb, brca2, “gene name” and uniprotkb:brca2

● Features, annotations➢ Ex: figure legend:Fig 3. => “figure legend”, “Fig 3.”

➢ Ex: binding site:12-12(text) => “binding site”

● Negative (always excluded by default!)➢ Ex:' -' or false => false

● Parameters and stoichiometry➢ Ex:' 1' or 'kd:9.0x10^-7 (molar)' => true

➢ Ex: '-' => false

● Publication first author– Ex:'author (date)' => “author”, “date”

Page 52: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Search: how is data indexed? (3)● Ignore parenthesis

● Case insensitive

● Discard common english words (a, with, …)

● Discard empty space before and after a word

● White space tokenizer => search for exact words● Ex: BRCA2 will not match BRCA2b● Ex: P12345 will not match P12345-1 => use P12345*● Ex: experimental will match both 'experimental method' and

'experimental feature'

Page 53: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

What is stored and returned?

● MIQL fields + non searchable fields ending with '_o'➢ Ex: taxidA_o, pbioroleA_o, checksumA_o

● Excludes copy fields● Id, alias, identifier, ptype, pbiorole, ftype, species, pmethod

● Stores the original MITAB column

● Missing fields are automatically replaced by '-'

Page 54: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

PSICQUIC facet fields

● MIQL fields ending with '_s'➢ Ex: species_s, pbiorole_s

● Stores the original MITAB cross reference➢ Ex: taxid:9606(human) => taxid:9606

● Exact match

● Excludes text

Page 55: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Current indexing issues and possible improvements

● More default fields?

● Alias names: fuzzy search allowed?

● Annotation description: fuzzy search should be allowed

● Sort fields cannot be multivalued!

➢ Unique identifier?➢ MITAB not clustered => controlled vocabulary terms➢ Current issue with publication (pubmed, imex) ➢ Cannot sort by annotations and xrefs!

Page 56: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

SOLR and PSICQUIC installation

Page 57: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

PSICQUIC webservice extensions

● Add a sort parameter

● Allowing faceting

➢ Define method name (not getByQuery for backward compatibility)

➢ Use SOLR XML to return facets or facets embedded in the response?

Page 58: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Current PSICQUIC specifications issues

● SOAP and REST discrepancies

➢ Do we maintain both?➢ Should we update SOAP with new REST

methods?

● Update and improve documentation, bug tracker, FAQ

Page 59: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

PSICQUIC view update

Page 60: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Data Distribution Best Practices

Page 61: PSI meeting 2013 - psidev.infopsidev.info/sites/default/files/2018-03/psi_april_2013.pdf · PSI meeting 2013 IntAct team intact-help@ebi.ac.uk. Outline Summary of 2012/2013 activities

Master headline

????

??? ?

??

?

?

?

?

?

?

??

?

?

? ?

?