Stargeo API¶

Stargeo provides API to access our data, make annotations, run analyses and more. It will be presented here in the form of copy-pastable examples. Continue reading or jump to series and samples, platforms and probes, tags, analyses or annotations.

We will start with importing requests and pandas.

In [3]:

import requests
import pandas as pd

Series and their samples¶

In [9]:

# Fetch first 10 series, defaults to 100
r = requests.get('http://stargeo.org/api/v2/series/?limit=10')
assert r.ok
data = r.json()

data['count'], len(data['results'])

Out[9]:

(33785, 10)

In [13]:

data['results'][0]

Out[13]:

{u'attrs': '...',
 u'gse_name': u'GSE1',
 u'platforms': [u'GPL7'],
 u'specie': u'human'}

In [ ]:

# Fetch next 10 series
r = requests.get(data['next'])

You can also fetch single serie data and its samples by gse name.

In [16]:

# Fetch GSE1 serie data
requests.get('http://stargeo.org/api/v2/series/GSE1/').text

Out[16]:

u'{"platforms":["GPL7"],"attrs":{"status":"Public on Jan 22 2001","contact_address":" ","relation":"BioProject: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA84463","sample_id":"GSM11 GSM12 GSM13 GSM14 GSM15 GSM16 GSM17 GSM18 GSM19 GSM20 GSM21 GSM22 GSM23 GSM24 GSM25 GSM26 GSM27 GSM28 GSM29 GSM30 GSM31 GSM32 GSM33 GSM34 GSM35 GSM36 GSM37 GSM38 GSM39 GSM40 GSM41 GSM42 GSM43 GSM44 GSM45 GSM46 GSM47 GSM48 ","contact_name":"Michael,,Bittner","contact_country":"USA","title":"NHGRI_Melanoma_class","contact_institute":"NHGRI, NIH","sample_taxid":"9606","pubmed_id":"10952317","type":"Expression profiling by array","submission_date":"Jan 22 2001","contact_state":"MD","contact_zip_postal_code":"20892","geo_accession":"GSE1","contact_email":"[email protected]","last_update_date":"Jul 18 2016","contact_web_link":"http://www.nhgri.nih.gov/Intramural_research/People/bittnerm.html","contact_city":"Bethesda","contact_phone":"301-496-7980","summary":"This series represents a group of cutaneous malignant melanomas and unrelated controls which were clustered based on correlation coefficients calculated through a comparison of gene expression|\\n|profiles.|\\n|Keywords: other","platform_id":"GPL7","contact_department":"Cancer Genetics Branch","contact_fax":"301-402-3241","platform_taxid":"9606"},"specie":"human","gse_name":"GSE1"}'

In [19]:

# Fetch GSE1 samples
samples_json = requests.get('http://stargeo.org/api/v2/series/GSE1/samples/').json()
# or 
samples = pd.read_json('http://stargeo.org/api/v2/series/GSE1/samples/')
samples.head()

Out[19]:

	attrs	gpl_name	gse_name	gsm_name
0	{u'submission_date': u'Jan 08 2001', u'contact...	GPL7	GSE1	GSM20
1	{u'submission_date': u'Jan 08 2001', u'contact...	GPL7	GSE1	GSM15
2	{u'submission_date': u'Jan 08 2001', u'contact...	GPL7	GSE1	GSM12
3	{u'submission_date': u'Jan 08 2001', u'contact...	GPL7	GSE1	GSM18
4	{u'submission_date': u'Jan 08 2001', u'contact...	GPL7	GSE1	GSM19

Platforms¶

In [24]:

# Fetch first 100 platforms, fetching the rest same way as with series above 
r = requests.get('http://stargeo.org/api/v2/platforms/').json()
platforms = r['results']
platforms[:2]

Out[24]:

[{u'gpl_name': u'GPL7',
  u'probes_matched': 6334,
  u'probes_total': 8192,
  u'specie': u'human'},
 {u'gpl_name': u'GPL96',
  u'probes_matched': 20883,
  u'probes_total': 22283,
  u'specie': u'human'}]

In [26]:

# Fetch single platform by gpl name
requests.get('http://stargeo.org/api/v2/platforms/GPL7/').json()

Out[26]:

{u'gpl_name': u'GPL7',
 u'probes_matched': 6334,
 u'probes_total': 8192,
 u'specie': u'human'}

Platform probes¶

In [30]:

probes = pd.read_json('http://stargeo.org/api/v2/platforms/GPL7/probes/', orient='split')
len(probes)

Out[30]:

In [31]:

probes.head()

Out[31]:

	probe	mygene_sym	mygene_entrez
0	5988	ANKRD55	79722
1	5989	DOLPP1	57171
2	5980	SNCA	6622
3	5981	VWA8	23078
4	5986	SCN8A	6334

Tags¶

In [7]:

# Fetch all samples
tags = requests.get('http://stargeo.org/api/v2/tags/').json()
# or
tags = pd.read_json('http://stargeo.org/api/v2/tags/')
tags[tags.concept_name != ''].head()

Out[7]:

	concept_full_id	concept_name	description	id	ontology_id	tag_name
27	http://purl.obolibrary.org/obo/DOID_12206	dengue hemorrhagic fever	Dengue hemorrhagic fever (DOID:12206)	7	DOID	DHF
28	http://purl.obolibrary.org/obo/DOID_9119	acute myeloid leukemia	acute myeloid leukemia (DOID:9119)	117	DOID	AML_Tissue
29	http://purl.obolibrary.org/obo/DOID_9206	Barrett's esophagus	Barrett's esophagus (DOID:9206)	77	DOID	BE_Tissue
30	http://purl.obolibrary.org/obo/DOID_10608	celiac disease	celiac disease (DOID:10608) control	180	DOID	celiac_control
31	http://purl.obolibrary.org/obo/DOID_12140	Chagas disease	Control for Chagas	89	DOID	Chagas_control

Fetch single tag info:

In [8]:

# Fetch tag with id 7 data
requests.get('http://stargeo.org/api/v2/tags/7/').json()

Out[8]:

{u'concept_full_id': u'http://purl.obolibrary.org/obo/DOID_12206',
 u'concept_name': u'dengue hemorrhagic fever',
 u'description': u'Dengue hemorrhagic fever (DOID:12206)',
 u'id': 7,
 u'ontology_id': u'DOID',
 u'tag_name': u'DHF'}

Analyses¶

Stargeo API provides a way to list and load existing analyses and load their results as well as source data and fold changes. Additionally an authorized user can perform new analyses.

In [14]:

data = requests.get('http://stargeo.org/api/v2/analysis/').json()
data['count'], len(data['results'])

Out[14]:

(136, 100)

In [15]:

analysis = data['results'][0]
# or
analysis = requests.get('http://stargeo.org/api/v2/analysis/243/').json()
analysis

Out[15]:

{u'analysis_name': u'hypertension',
 u'case_query': u"PHT == 'PHT' or hypertension == 'hypertension'",
 u'control_query': u"PHT_Control == 'PHT_Control' or hypertension_control == 'hypertension_control'",
 u'description': u'hypertension (DOID:10763)',
 u'df': u'http://analysis-df.stargeo.io.s3.amazonaws.com/243-hypertension',
 u'fold_changes': u'http://fold-changes.stargeo.io.s3.amazonaws.com/243-hypertension',
 u'id': 243,
 u'min_samples': 3,
 u'modifier_query': u'',
 u'platform_count': 6,
 u'sample_count': 309,
 u'series_count': 7,
 u'specie': u'',
 u'success': True}

Source and fold changes dataframes are accessible via links in corresponding analysis fields.

In [17]:

# Fetch source dataframe
analysis_df = pd.read_json(analysis['df'], orient='split')
analysis_df.head()

Out[17]:

	series_id	platform_id	sample_id	gsm_name	gse_name	gpl_name	pht	pht_control	sample_class
0	202	4	8089	GSM271847	GSE10767	GPL570		pht_control	0
1	202	4	8090	GSM271848	GSE10767	GPL570		pht_control	0
2	202	4	8091	GSM271849	GSE10767	GPL570		pht_control	0
3	202	4	8092	GSM271865	GSE10767	GPL570	pht		1
4	202	4	8093	GSM271866	GSE10767	GPL570	pht		1

In [35]:

# Fetch fold changes. WARNING: this could be big
r = requests.get(analysis['fold_changes'])

# It is also compressed with zlib
import zlib
fold_changes = pd.read_json(zlib.decompress(r.content), orient='split')
fold_changes.head()

Out[35]:

	probe	dataMu	dataSigma	dataCount	caseDataMu	caseDataSigma	caseDataCount	controlDataMu	controlDataSigma	controlDataCount	...	log2foldChange	effect_size	ttest	p	direction	subset	gpl	gse	mygene_entrez	mygene_sym
0	1007_s_at	7.121507	0.542821	7	7.463592	0.464840	4	6.665394	0.117251	3	...	0.798197	1.470462	2.842841	0.036126	up	NA	GPL570	GSE10767	780	DDR1
1	1053_at	9.099162	0.378195	7	9.161432	0.426845	4	9.016136	0.371085	3	...	0.145296	0.384184	0.469187	0.658683	up	NA	GPL570	GSE10767	5982	RFC2
2	117_at	4.888623	0.503787	7	4.862397	0.707167	4	4.923591	0.089813	3	...	-0.061194	-0.121468	-0.145489	0.890008	down	NA	GPL570	GSE10767	3310	HSPA6
3	121_at	7.083546	0.281271	7	7.243619	0.269283	4	6.870115	0.094835	3	...	0.373504	1.327916	2.253205	0.073979	up	NA	GPL570	GSE10767	7849	PAX8
4	1255_g_at	2.841085	0.956595	7	2.736909	0.767581	4	2.979985	1.345662	3	...	-0.243075	-0.254105	-0.306554	0.771538	down	NA	GPL570	GSE10767	2978	GUCA1A

5 rows × 21 columns

In [36]:

results = pd.read_json('http://stargeo.org/api/v2/analysis/243/results/', orient='split')
results.head()

Out[36]:

	mygene_entrez	direction	k	casedatacount	controldatacount	random_pval	random_te	random_se	random_lower	random_upper	...	tau2_se	c	h	h_lower	h_upper	i2	i2_lower	i2_upper	q	q_df
BLM	641	up	7	203	106	0.004175	0.009608	0.003354	0.003034	0.016181	...	NaN	18535.273509	1.000000	1.000000	1.000000	0.000000	0.000000	0.000000	5.801143	6
A1BG	1	up	5	127	78	0.745816	0.007206	0.022229	-0.036361	0.050773	...	NaN	5559.082818	1.588522	1.000000	2.595664	0.603710	0.000000	0.851576	10.093607	4
A1BG-AS1	503538	up	2	30	16	0.092131	0.031295	0.018581	-0.005123	0.067713	...	NaN	2.505217	NaN	NaN	NaN	NaN	NaN	NaN	0.562633	1
A1CF	29974	up	5	172	88	0.104480	0.002854	0.001758	-0.000591	0.006299	...	NaN	31493.161994	1.000000	1.000000	1.000000	0.000000	0.000000	0.000000	1.374130	4
A2M	2	down	7	203	106	0.689928	-0.003276	0.008211	-0.019368	0.012817	...	NaN	101435.271415	1.986816	1.362318	2.897590	0.746671	0.461181	0.880896	23.684635	6

5 rows × 34 columns

To create and start an analysis you need to provide an auth token. You can get see yours in the example below once you are logged in.

In [52]:

# This is your auth token, don't share it with anybody
headers = {'Authorization': 'Token your-token-here'}
# Create new analysis
r = requests.post('http://stargeo.org/api/v2/analysis/', headers=headers, data={
    'analysis_name': 'Young Severe Dengue',
    'description': 'Dengue cases in patients under 9',
    'specie': 'human',
    'case_query': "DHF=='DHF' or DSS=='DSS'",
    'control_query': "DF=='DF'",
    'modifier_query': "Age < 9",
})
r.json()

Out[52]:

{u'created': 385}

Annotations¶

Allows listing, fetching and adding annotations. Note that at each point in time you see best available annotations, along with their reliability characteristics. Pay attention at best_cohens_kappa attribute, we consider annotation validated when it equals 1, meaning there are two annotation authors that blindly did it the same way.

In [37]:

annotations = requests.get('http://stargeo.org/api/v2/annotations/').json()
annotations['count'], len(annotations['results'])

Out[37]:

(16791, 100)

In [45]:

annotations['results'][0]

Out[45]:

{u'annotations': 2,
 u'authors': 2,
 u'best_cohens_kappa': 1.0,
 u'captive': False,
 u'column': u'sample_source_name_ch1',
 u'fleiss_kappa': 1.0,
 u'gpl_name': u'GPL7',
 u'gse_name': u'GSE1',
 u'id': 1607,
 u'regex': u'melanoma',
 u'samples': 38,
 u'tag_id': 123}

In [49]:

print(requests.get('http://stargeo.org/api/v2/annotations/1607/samples/').text)

{"GSM48":"melanoma","GSM46":"melanoma","GSM47":"melanoma","GSM44":"melanoma","GSM45":"melanoma","GSM42":"melanoma","GSM43":"melanoma","GSM40":"","GSM41":"melanoma","GSM11":"melanoma","GSM13":"melanoma","GSM12":"","GSM15":"melanoma","GSM14":"melanoma","GSM39":"melanoma","GSM38":"melanoma","GSM37":"melanoma","GSM36":"melanoma","GSM35":"melanoma","GSM34":"melanoma","GSM33":"melanoma","GSM32":"melanoma","GSM31":"melanoma","GSM30":"melanoma","GSM17":"melanoma","GSM16":"melanoma","GSM19":"","GSM18":"","GSM28":"melanoma","GSM29":"melanoma","GSM20":"","GSM21":"","GSM22":"","GSM23":"melanoma","GSM24":"melanoma","GSM25":"melanoma","GSM26":"melanoma","GSM27":"melanoma"}

To post annotations you need to authorize as a competent user. To authorize you need to send Authorization token same as when creating analysis.

In [61]:

# This is your auth token, don't share it with anybody
headers = {'Authorization': 'Token your-token-here'}
# Create new analysis
r = requests.post('http://localhost:5000/api/v2/annotations/', headers=headers, json={
    'tag': 'melanoma',
    'series': 'GSE1',
    'platform': 'GPL7',
    # Need to provide full set of samples
    'annotations': {'GSM11': 'melanoma', 'GSM12': '', ...},
    # Optional text note
    'note': '...',  
})
assert r.ok