UCSC Gene Sorter User's Guide

Contents

Introduction
Getting started
Understanding the Gene Sorter display
Configuring the Gene Sorter display
Filtering the gene display
Displaying sequence and text-based output

Search the Genome Browser help pages:  

Questions and feedback are welcome.

Introduction

Genes function and evolve together.To understand a gene, you often need to understand an entire gene family. Many such families are already known and described, such as the HOX family that mediates many aspects of limb and brain development and the Cytochrome P450 family that is central to the metabolism of many medications.

One easy way to identify well-known relatives of a gene is by looking for genes with name similarity, because biologists tend to use similar names for similar genes. However, scientists only partly understand the function of perhaps one-third of the genes of the genome; therefore, other techniques for grouping genes into families are necessary.

The UCSC Gene Sorter is an excellent resource for exploring gene families and the relationships among genes. This tool displays a table of genes within a selected genome that are related to one another. Several different relationships may be explored: protein-level homology, similarity of gene expression profiles, or genomic proximity. The Gene Sorter supports searches on a variety of terms and phrases, including the gene name, the UniProtKB protein name, a GenBank accession, or a word or phrase present in a gene's description. The gene family display is highly configurable, allowing the user to control the order and number of columns, the number of rows, and the genes displayed. The tool provides several output formats, including a simple tab-delimited format that may be imported into a spreadsheet or a relational database.

An important use of the Gene Sorter is to gather together a collection of genes that share similar properties for statistical analysis. For instance, one might want to examine promoter regions of genes that share a similar expression pattern or look for protein sequence motifs in genes that share similar GO annotations.

One of the most powerful features of the Gene Sorter is its filtering capabilities. The filter enables the user to quickly select an interesting subset of the 25,000 genes in the genome based on a variety of detailed and flexible selection criteria. For example, the filter may be used to select all human genes over-expressed in the cerebellum that have GO-annotated G-protein coupled receptor activity.

The Gene Sorter was designed and implemented by Jim Kent, Fan Hsu, David Haussler, and the UCSC Genome Bioinformatics Group. This work is supported by a grant from the National Human Genome Research Institute and by the Howard Hughes Medical Institute.

Getting started

To begin using the Gene Sorter, you will first have to select a genomic region and the type of gene relationship you wish to display. You may also want to change some of the Gene Sorter's configuration settings to tailor the display to your research needs. These configuration options are described in Configuring the Gene Sorter display.

Starting the Gene Sorter

  1. Open the Gene Sorter home page.
  2. Specify the genome and assembly you wish to view by selecting the appropriate options from the genome and assembly pull-down menus.
  3. Type a term or phrase into the search text box to determine which genes will be displayed in the browser. Valid search terms (with human genome examples) include:

    • a gene name (HOXA9)
    • a UniProtKB protein name (HXA9)
    • a word or phrase that occurs in the description of a gene (MAP kinase)
    • a GenBank mRNA accession (U14680)
  4. Choose the gene relationship that you would like to examine by selecting an option from the sort by pull-down menu. Genes will be sorted in order of proximity to the chosen gene, based on one of the following criteria:
    • Expression (GNF Atlas1) -- similarity in gene expression, based on GNF Atlas 1 data
    • Protein Homology - BLASTP -- similarity in protein homology, based on the BLASTP E-value
    • Protein Homology - Rankprop -- similarity in protein homology, based on the Rankprop algorithm
    • Protein Homology - PSI-BLAST -- similarity in protein homology, based on the PSI-BLAST E-value
    • Pfam Similarity -- similarity based on number of shared domains
    • Gene Distance -- absolute distance (left or right) on the chromosome from the selected gene
    • Chromosome -- list sorted by chromosomal location
    • Name Similarity -- similarity to the name of selected gene, based on the first several characters of the name
    • Alphabetical -- list sorted by gene name
    • GO Similarity -- number of Gene Ontology (GO) terms shared with selected gene
  5. Choose the number of items to display from the display pull-down menu (the default is 50).
  6. Press the Go! button to display your search results.

Understanding the Gene Sorter display

The main page of the Gene Sorter displays a table containing rows of genes and associated attributes. In most cases, the currently-selected gene is shown at the top of the list, highlighted in light green. The remaining genes are ordered relative to the selected gene based on the sort criteria specified in the sort by menu. For example, in a table sorted by gene distance, the genes are listed in order of greater to lesser chromosomal proximity to the selected gene.

The initial Gene Sorter display shows only a default subset of the columns available. The set of columns may be expanded, reduced, and rearranged by using the Gene Sorter's configuration utility. To view information about the data shown in the column, click on the column's label.

To select a different gene in the table, click on the name of the gene. The Gene Sorter will move the gene entry to the top of the list and highlight it. The remaining genes will be reordered relative to the new selection.

Column descriptions (listed in alphabetical order)

Configuring the Gene Sorter display

The Gene Sorter is highly configurable, allowing you to fine-tune the display to show just the genes and data columns in which you're interested in an order that best suits your research needs. Most of the configuration is controlled through settings on the Configuration page, accessed via the configure button at the top of the Gene Sorter page.

Changing the number of rows displayed
To increase or decrease the number of rows shown in the table, pick a new value from the display pull-down menu, then click the Go! button.

Changing the number of columns displayed
By default, the Gene Sorter shows only a small subset of the table columns available for the genome. You can view the full list of columns, or add or remove columns from your display, on the Configuration page.

The configuration table shows all the columns available for the currently-selected genome, listed in left-to-right display order. To add or remove a column from the Gene Sorter display, click the On checkbox to toggle the setting (a check indicates that the column is displayed). To quickly change the On settings of all columns, click the Hide All or Show All button at the top of the page. Click the Submit button to display the changes in the Gene Sorter.

Changing the column positions
In addition to adding or removing columns, it is also possible to move the columns to the left or right within the Gene Sorter table. The order of the column names in the configuration table indicates the current relative position of the columns in the Gene Sorter display from left to right. To shift a column one position to the left, click the up arrow in the item's Position column. Similarly, click the down arrow to shift a column to the right. When you have finished making changes, click the Submit button.

Changing the expression colors
By default, the gene expression ratios are shown using a red/green color scheme, where red indicates a gene that is more highly expressed and green corresponds to less expression. Color-blind users may find it helpful to switch the coloring from red/green to yellow/blue. To do so, select the "yellow high/blue low" option from the Expression ratio colors pull-down menu on the Configuration page, then click the Submit button.

Changing the brightness of expression colors
To increase or decrease the brightness of the colors in an expression column, edit the brightness value for the corresponding entry in the configuration table. Values greater than 1.0 increase the brightness, while those less than 1.0 dim the color. Click the Submit button to display the new values.

Changing the type of tissue data shown in expression columns
By default, the expression columns show the median ratio of expression of a gene in a small selected set of tissues. Use the the tissues pull-down menu to configure the tissue display for the column. The "all replicas" option will show the value of each individual experimental replica of each tissue. The "median of replicas" option displays a single value for each tissue that represents the median of all replicas for that tissue.

Toggling between ratio and absolute expression values
By default, expression columns show the ratio of expression of a gene relative to expression of the gene overall. To view absolute expression values instead, select the "absolute" option from the values pull-down menu.

Displaying splicing variants
By default, the Gene Sorter shows only one splicing variant: the one that produces the largest protein. To show all splicing variants, click the Show all splicing variants checkbox. Note that in most cases, the column values (and sometimes the names) will be identical across variants.

Restoring the default settings
At any time during your Gene Sorter session, you can restore the Gene Sorter table to its default layout by clicking the Default button on the Configuration page, then clicking Submit.

Saving a configuration for future use
The Gene Sorter configuration utility allows you to store multiple configurations for use in future sessions. This feature is particularly useful if you require different layouts for different research uses. To save the current configuration of the Gene Sorter layout, click the Save button on the Configuration page. Type in a name for the configuration in the text box at the top of the page, then click Save.

Loading a previously-saved configuration
Once you have saved a configuration, you can load it back into your Gene Sorter in a future session. To load a configuration, click the Load button on the Configuration page. The Gene Sorter will display a list of the names of your saved configurations. Click on a name to highlight it, then click Load to reconfigure your Gene Sorter based on the saved settings.

Viewing a list of saved configurations
To display a list of configurations that you have saved, click the Save button on the Configuration page. If you have any saved configurations, the Gene Sorter will display an Existing Setups list that shows the configuration names. To permanently remove a configuration from the list, click on the name to highlight it, then click the Delete Existing Setup button.

Filtering the gene display

The Gene Sorter's gene filtering capabilities provide a versatile way to fine-tune the display to show just the genes in which you are interested. Filters are applied to individual gene fields, and may be combined to increase the specificity of the search. To access the Filter page, click the filter button at the top of the Gene Sorter page.

At any time during the filter setup process, you can click the List Names button on the Filter page to view a list of genes that will be returned when the current filter settings are applied to the genome. You may find this list helpful in fine-tuning the filter.

Filtering based on matching one or more terms
Filters based on names, IDs, or other words restrict the display to only those genes that match one or more terms typed into the search text box. Examples of values that can be filtered on this basis include the gene name, RefSeq accession number, gene description, coding SNPs, and GO terms.

This search supports wildcard matching on "*" and "?". Multiple terms must be separated by a space or tab. For example, the search criteria "HOXA9 FOX*" on the gene name field returns the gene named HOXA9 and any gene whose name begins with the letters "FOX". When searching on fields that consist of values containing more than one word (GO terms, coding SNPs, Pfam domains, and gene descriptions), the multi-word elements must be enclosed in single quotes. For instance, a search on the description phrase "forkhead box protein" should be entered as "forkhead box protein". Use the "any" and "all" options to determine whether the search should return any gene that matches any term ("any") or only those genes that match all terms ("all").

To facilitate searching on multiple terms, the Gene Sorter provides the option to paste in or upload a list of search terms. To paste in a list of terms, click the filter's Paste List button, then paste or type the terms into the text box. Terms must be separated by a space, a tab, or be entered on separate lines, and may not include wildcards. When you have completed the list, click the Submit button to return to the main Filter page. The file upload utility - accessed via the Upload List button - has a similar functionality.

Filtering based on numerical ranges
Several of the gene fields can be filtered by specifying a numerical range within which the value must fall. Examples of fields in this category include expression ratios, Blastp data, and genome position. To use this type of filter, enter the minimum and maximum values delimiting the range in which you are interested. In some cases, the range of valid values is indicated in the filter box.

The genome position filter requires the name of a chromosome (in the format chrN) in addition to the chromosomal start and end positions. To list all genes on a chromosome, enter only the chromosome name.

Expression filters include "any" and "all" options to determine whether the search should return a gene if any of the tissue expression values meet the minimum and maximum criteria ("any") or only if all tissue expression values meet the search criteria ("all").

Saving filter settings
The Gene Sorter provides a mechanism for saving filter settings for use in future sessions. To preserve the current filter configuration, click the Save Filter button on the Filter page. Type in a name for the filter, then click Save to save the filter and return to the Filter page.

Loading a saved filter
Once a filter configuration has been saved, you can retrieve it in later sessions by loading it back into your Gene Sorter. To load the saved filter settings, click the Load Filter button on the Filter page. Click on the name of the filter you wish to load, then click the Load button. Click the Submit button on the Filter page to apply the filter settings to the Gene Sorter.

Viewing a list of saved filters
To display a list of filter settings that you have saved, click the Save button on the Filter page. If you have any saved filters, the Gene Sorter will display an Existing Setups list that shows the filter names. To permanently remove a filter from the list, click on the name to highlight it, then click the Delete Existing Setup button.

Displaying sequence and text-based output

The Gene Sorter's graphical presentation of data facilitates the visual observation of relationships and patterns among the genes in the display. However, it is often useful to convert the data to a text-based format that can be easily saved to a file or loaded into another program, database, or spreadsheet for further analysis. The Gene Sorter provides a mechanism for saving the current display in a tab-delimited text file or showing a text-based view of the sequence underlying the current display.

Creating text-based output
To output the current Gene Sorter table as text, click the text button at the top of the page. The Gene Sorter will display each row of table data on a separate tab-delimited line.

Viewing the underlying sequence
To display the protein, mRNA, or genomic sequence underlying the current Gene Sorter table, click the sequence button at the top of the page. On the Get Sequence page, select the desired sequence configuration settings that you'd like, then click the Get Sequence button. The Gene Sorter will display a text-based list of FASTA format records for each gene displayed in the table. The FASTA records may be cut and pasted into Blat for further study.