Meltzerlab | GEO Site Home | GSE-GPL-GSM | GPL | GSE | GSM | GDS | GDS Subset | sMatrix | Help
Welcome to

Help: GEOmetadb Application

  1. What's new?
  2. Search FAQ
  3. Diagram of GEOmetadb implement workflow
  4. Diagram of GEOmetadb table relationship
  5. Description of table fields in GEOmetadb
 
 

What's New

  • GEOmetadb paper: Bioinformatics 2008 24(23):2798-2800; doi:10.1093/bioinformatics/btn520. Link to the paper
  • GEOmetadb was upgraded to version 2.0 in July, 2008
  • Database tables and search interface has been modified significantly
  • Search performance has been improved
  • Several user-friendly functions have been added/improved, e.g. drill-down search, download search results, view details of multiple records, create/append user lists from selection,
  • Added fulltext search on multiple search fields
  • More contents specific help has been added
  • Search status indicator has been added
  • Field validations have been added
  • Several bugs have been fixed
  • Various other improvements and fixes

Search FAQ

Q: What is GEOmetadb application?
A: The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data of interest can be challenging using current tools. GEOmetadb is an attempt to make access to the metadata associated with GEO samples, platforms, and datasets much more feasible. This is accomplished by parsing all the NCBI GEO metadata into a MySQL database. A SQLite database version of it can be stored and queried locally. GEOmetadb Bioconductor package is simply a thin wrapper around the SQLite database along with associated documentation. Finally, the SQLite database is updated regularly as new data is added to GEO and can be downloaded at will for the most up-to-date metadata.

Q: What is included in GEOmetadb application?
A:

Q: How to do a general search?
A: Brief steps:

  • Click 'Hide/Show SearchBox' if you don't see the searchbox open
  • Click 'Condition Number' pulldown menu to select number of conditions you want to use in your search
  • Check the checkbox 'Search within selected records?' if you want to search within previous search - note the checkbox of 'Search within selected records?' will show up after initial search
  • Check 'And', 'Or' radio button on the right to select boolean operators for your search conditions
  • Click 'Search' to perform search
Q: What fields in the joint GEO search can perform fulltext search?
A:
  • The fulltext search fields in the joint search page are 'Study Keywords', 'gsm_title', 'gsm_source_name', 'gsm_characteristics', 'gsm_description', 'gse_title', 'gse_summary', 'gse_overall_design', 'gpl_title', 'gpl_description' .
  • 'Study Keywords' does fulltext search on 'gse_title', 'gse_summary', 'gse_type' and 'gse_overall_design' .
  • Except 'Study Keywords' in other fulltext fields, normal search will be performed if terms are quoted (as whole).
  • The fulltest search performs MySQL boolean full-text searches using the IN BOOLEAN MODE modifier (MySQL Boolean Full-Text Searches)
  • A word about fulltext search: the rows returned are automatically sorted with the highest relevance first.

Q: What 'kind' of fulltext search was implemented in the fulltext search?
A:

  • The IN BOOLEAN MODE modifier in MySQL fulltext search was implemented - for details please see MySQL Boolean Full-Text Searches.
  • Usage: the IN BOOLEAN MODE modifier operators, e.g. + (AND), -(NOT), *(wildcard), (define the literal phrase) et al. can be used with search terms, which provides a flexible and powerful way to do some complex search for some advanced users.

Q: How to change display options for your search?
A: Brief steps:

  • Click 'Display Options'
  • Add fields in display: select fields you want to see in 'Available Fields' and click '>>' to add to 'Selected Fields'
  • Remove field in display: select fields you don't want to see in 'Selected Fields' and click '<<' to move to 'Available Fields'
  • Order fields in display: select fields you want to reorder in 'Selected Fields' and click 'up' or 'down' arrow to the right to move the fields up and down
  • Change 'Field Sorting Options': select a sorting field and choose 'Ascending' or 'Descending' for this field
  • Change 'Records Per Page': select a number in the pulldown menu behind 'Records Per Page'
  • Change 'Maximum Field length': enter a number for Maximum Field length in the input textbox
Q: How to download records?
A: Brief steps:
  • Perform searches and narrow down the records of your interest by checking 'Search within selected records?'
  • Choose fields to display - only displayed fields will be downloaded
  • Click 'Download' to download records in current view

Q:How to see detail of record(s)?
A:

  • Detail of a single records - click the 'ID' link to see the detail view (all fields in a form) of the record
  • Detail of multiple records:
    • Perform a search and narrow down the records of your interest by checking 'Search within selected records?'
    • Select records by checking the checkbox of the records you want to see detail
    • Select 'View Detail' in the 'Selected Records' pull-down menu

Q:How to create/append a custom record list?
A:

  • Perform a search
  • Select records you want to create/append to a list
  • Click 'Create a List' to create a new list with the selected records in the 'Selected Records' pull-down menu
  • Click 'Append to List' to append the selected records to the existing list

Q: What kinds of searches can be conducted with this online search tool?
A:

Q: Can I search a date types with multiple fields at the same time?
A: Yes. Click the pulldown menu behind 'Condition Number' to select number of conditions (fields) you want to use to search the table.

Q: Can I search within previous search results?
A: Yes. Check the checkbox behind 'Search within selected records?' to enable this function. If you don't see the checkbox please click 'Hide/Show SearchBox' first.

Q: Do you have example searches:
A: Example search 1 - search GSE records with 'time course ' in GSE 'type' field and 'breast cancer' for GSE 'Summary' field.

  • Click 'GSE' page
  • Select '2' in 'Condition Number' pulldown menu
  • Choose 'Summary' field in the pulldown menu of the first search box
  • Choose 'contain' next to it
  • Enter 'breast cancer' in the first textbox
  • Choose 'And' between two conditions
  • Choose 'Type' field in the pulldown menu of the second search box
  • Choose 'contain' next to it
  • Enter 'time course' in the first textbox
  • Click 'Search' to start searching
 
 

Implementation Workflow

Table Relationship

 
  GEOmetadb_workflow GEOmetadb_diagram  

Description of table fields

Tables: gse, gpl, gsm, gse_gpl, gse_gsm, gds, gds_subset, sMatrix
 
GSE Table [Example][To top]  
Web field name SQLite DB field name SQL Type Description
ID ID real database automatically assigned internal ID
Title title text unique name describing the overall study
GSE Acc(Link) gse* text unique accession number approved and issued by GEO, NCBI
Data Status status text date released to public
Submission Date submission_date text date submitted
Update Date last_update_date text date last updated
Pubmed ID pubmed_id integer NCBI PubMed identifier (PMID)
Summary summary text a description of the goals and objectives of this study
GSE Type type text keyword(s)generally describing the type of study, e.g., time course, dose response, comparative genomic hybridization, ChIP-chip, cell type comparison, disease state analysis, stress response, genetic modification, etc.
Contributor contributor text people contributed to this study
Contact contact text contact information for this study
Web link web_link text a Web link to study and/or supplementary information about the study
Overall Design overall_design text overall design, a description of the experimental design, including information about how many samples are in the study, any control and/or reference samples, dye-swaps, etc
Repeats repeats text repeat type, which can be biological replicate, technical replicate - extract, or technical replicate - labeled-extract
Repeat Samples repeats_sample_list text sample list in a repeat
Variable variable text variable type, e.g. dose, time, tissue, strain, gender, cell line, development stage, age, agent, cell type, infection, isolate, metabolism, shock, stress, temperature, specimen, disease state, protocol, growth protocol, genotype/variation, species, individual, or other. For example:
Variable Description variable_description text description of a variable type
Supplementary supplementary_file text ftp link to NCBI GEO supplementary file(s) of this GSE
SOFT FTP†     ftp link to NCBI GEO SOFT format of this GSE
SeriesMatrix FTP†     ftp link to NCBI GEO SOFT format of the Series Matrix
GPL Acc     GPLs separated by comma
GPL Count     number of GPLs
GSM Acc     GSMs separated by comma
GSM Count     number of GSMs
   

*key links to following tables: gse_gpl.gse, gse_gsm.gse, gds.gse, sMatrix.gse

 
GPL Table [Example][To top]  
Web field name SQLite DB field name SQL Type Description
ID ID real database automatically assigned internal ID
Title title text unique name describing the Platform (GPL)
GPL Acc (Link) gpl* text unique GEO Platfrom accession number approved and issued by GEO, NCBI
Data Status status text date released to public
Submission Date submission_date text date submitted
Update Date last_update_date text date last updated
Technology technology text the category describing the Platform technology: spotted DNA/cDNA, spotted oligonucleotide, in situ oligonucleotide, antibody, tissue, SARST, RT-PCR, MS, or MPSS
Distribution distribution text Microarrays are 'commercial', 'non-commercial', or 'custom-commercial' in accordance with how the array was manufactured.
Organism organism text organism
Manufacturer manufacturer text name of the company, facility or laboratory where the array was manufactured or produced
Manufacture Protocol manufacture_protocol text array manufacture protocol, including information, e.g., clone/primer set identification and preparation, strandedness/length, arrayer hardware/software, spotting protocols
Coating coating text coating of the array, e.g., aminosilane, quartz, polysine, unknown
Catalog Number catalog_number text manufacturer catalog number for commercially-available arrays
Support support text surface type of the array, e.g., glass, nitrocellulose, nylon, silicon, unknown
Description description text additional descriptive information not captured in another field, e.g., array and/or feature physical dimensions, element grid system
Web Link web_link text a Web link that directs users to supplementary information about the array
Contact contact text contact information, including name, e-mail, phpone, Fax, Department, Institute, country
Row Count data_row_count real number of data rows in the GPL
Supplementary supplementary_file text ftp link to NCBI GEO supplementary file(s) of this GPL
BioC Package bioc_package text matched Bioconductor annotation package of this GPL
SOFT FTP†     ftp link to NCBI GEO SOFT format of the Series Matrix
GSE Acc     GSEs separated by comma
GSE Count     number of GSMs
GSM Acc     GSMs separated by comma
GSM Count     number of GSMs
 

*key links to following tables: gds.gpl, gse_gpl.gpl, sMatrix.gpl, gsm.gpl

 
GSM Table [To top][Example]  
Web field name SQLite DB field name SQL Type Description
ID ID real database automatically assigned internal ID
Title title text unique name describing this Sample
GSM Acc (Link) gsm* text unique GEO Sample (GSM) accession number approved and issued by GEO, NCBI
GSE Acc series_id text unique GEO Series (GSE) accession number associated with this GSM
GPL Acc (Link) gpl** text unique GEO Platform (GPL) accession number associated with this GSM
Data Status status text date released to public
Submission Date submission_date text date submitted
Last Update last_update_date text date last updated
GSM Type type text type of samples, values in current database are genomic, mixed, MPSS, protein, RNA, SAGE, SARST, other
Channels channel_count real number of labeling channels, could be 1 or 2
Source Name Ch1 source_name_ch1 text name to identify the biological material and the experimental variable(s), e.g., vastus lateralis muscle, exercised, 60 min
Organism Ch1 organism_ch1 text organism(s) from which the biological material was derived
Characteristics Ch1 characteristics_ch1 text list of characteristics of the biological source, including factors not necessarily under investigation, e.g., Strain: C57BL/6, Gender: female, Age: 45 days, Tissue: bladder tumor, Tumor stage: Ta. Multiple characteristics columns can be included
Molecule Ch1 molecule_ch1 text type of molecule that was extracted from the biological material. Include one of the following: total RNA, polyA RNA, cytoplasmic RNA, nuclear RNA, genomic DNA, protein, or other
Label Ch1 label_ch1 text compound used to label the extract e.g., biotin, Cy3, Cy5, 33P
Treatment Protocol Ch1 treatment_protocol_ch1 text protocol of any treatments applied to the biological material prior to extract preparation
Extract Protocol Ch1 extract_protocol_ch1 text protocol used to isolate the extract material
Label Protocol Ch1 label_protocol_ch1 text protocol used to label the extract
Source Name Ch2 source_name_ch2 text same contents as ch1
Organism Ch2 organism_ch2 text same contents as ch1
Characteristics Ch2 characteristics_ch2 text same contents as ch1
Molecule Ch2 molecule_ch2 text same contents as ch1
Label Ch2 label_ch2 text same contents as ch1
Treatment Protocol Ch2 treatment_protocol_ch2 text same contents as ch1
Extract Protocol Ch2 extract_protocol_ch2 text same contents as ch1
Label Protocol Ch2 label_protocol_ch2 text same contents as ch1
Hy Protocol hyb_protocol text protocols used for hybridization, blocking and washing, and any post-processing steps such as staining
Description description text additional information not provided in the other fields
Data Processing data_processing text details of how data in the VALUE column of your table were generated and calculated, i.e., normalization method, data selection procedures and parameters, transformation algorithm (e.g., MAS5.0), and scaling parameters
Contact contact text contact information for this study
Supplementary File supplementary_file text ftp link to NCBI GEO supplementary file(s) of this GSM
Row Count data_row_count real number of data rows
SOFT FTP†     ftp link to NCBI GEO SOFT format of this GSM
 
*key links to tables: gds_gse_gsm.gsm
*key links to tables: gds.gpl, gse_gpl.gpl, sMatrix.gpl, gpl.gpl
 
GDS Table [Example][To top]  
Web field name SQLite DB field name SQL Type Description
ID ID real database automatically assigned internal ID
GDS Acc(Link) gds* text GEO Dataset (GDS) accession number associated with this series (GSE)
Title title text title of this GDS
Description description text description of this GDS
GDS Type type text platform type of this GDS, current values: array CGH, ChIP-chip, gene expression array-based, gene expression MPSS-based, gene expression RT-PCR-based, protein expression array-based
Pubmed  ID pubmed_id text NCBI PubMed identifier (PMID)
GPL Acc(Link) platform text GEO Platform (GPL) accession number associated with this GDS
Organism platform_organism text organism of the platform
Technology Type platform_technology_type text technology type of the platform
Feature Count feature_count integer number of features in the platform
sample_organism text organism of the samples
Sample Type sample_type text type of samples, values in current database are genomic, mixed, MPSS, protein, RNA, SAGE, etc.
Channel Count channel_count text number of labeling channels
Sample Count sample_count integer number of samples
Value Type value_type text type of data values - values in current database are count, log ratio, log10 ratio, log2 ratio, transformed count, Z-score, etc.
GSE Acc (Link) reference_series text GEO Series (GSE) accession number associated with this GDS
Order order text  
Update Date update_date text date updated
 
*key links to tables: gds_subset.gds
 
GDS Subset Table [To top]    
Web field name SQLite DB field name SQL Type Description
ID ID real database automatically assigned internal ID
GDS Name Name text name of the subset - GDS + number
GDS Acc(Link) gds* text GEO Dataset (GDS) accession number
GDS Type type text subset type
Description description text subset description
GSM Acc sample_id text GEO Sample (GSM) included in this GDS, separated by comma
 
*key links to tables: gds.gds
 
sMatrix Table [ To top]    
Web field name SQLite DB field name SQL Type Description
ID ID integer database automatically assigned internal ID
sMatrix File Name Name text name of the SeriesMatrix file in NCBI GEO FTP site
GSE File (Link) gse* text GEO Series name
GPL File (Link) gpl** text GEO Platform name
GSM_Count GSM_Count integer number of GSM(s) associated
GSM File gsm GEO Sample names, separated by comma
Last_Update_Date Last_Update_Date text last update date of this GEO Series (GSE)
 
*key links to tables: gse_gpl.gse, gse_gsm.gse, gds.gse, gse.gse
**key links to tables: gds.gpl, gse_gpl.gpl, gpl.gpl, gsm.gpl
 
GSE_GPL Table [To top]    
  SQLite DB field name SQL Type Description
  gse* text GEO Series name
  gpl** text GEO Platform name
 

*key links to tables: gse_gpl.gse, gse_gsm.gse, gds.gse, sMatrix.gse
**key links to tables: gds.gpl, gse_gpl.gpl, gpl.gpl, sMatrix.gpl

 
GSE_GSM Table [To top]    
  SQLite DB field name SQL Type Description
  gse* text GEO Series name
  gsm** text GEO Sample name
 
* key links tables: gse_gpl.gse, gse.gse, gds.gse, sMatrix.gse
** key links to tables: gsm.gsm
Not viewable in search interface and in record detail page
 
References:      

 

 
Meltzerlab/GB/CCR/NCI/NIH @2008 Contact: e-mail
Powered by BxAF Search