Skip to content

Webbased Phylogenomic analysis

Mats Töpel edited this page Nov 19, 2019 · 13 revisions

Introduction

You will in this exercise practice your newly acquired skills in phylogenetic inference and "tree thinking", by analysing the evolutionary history of a gene family. One part of the exercise is also to collect the data necessary for running the analysis (in the form of homologous protein sequences from different species) as well as interpreting the result (you first have to draw a species tree for the taxa included in the analysis).

Suggestions for gene families to analyse in this exercise

  • Toc75
  • Toc34
  • Alb3

Instructions

  1. Select one of the gene families from the list above for your analysis.
  2. Download an amino acid reference sequence from the species Arabidopsis thaliana at the NCBI site and save it in a text file using your favorite text editor (Hint: use e.g the search string "Arabidopsis thaliana[orgn] Alb3").
  3. Download the file Viridiplantae.pdf and look up the phylogenetic position of the species in the list below. Draw a "species tree" for these species on a piece of paper and save for later. The first five species will represent your in-group, and the last one will represent your out-group. You will compare the topology of the gene tree generated in this exercise to the topology of this species tree in order to identify speciation events and gene duplications.
  • Arabidopsis thaliana
  • Medicago truncatula
  • Zea mays
  • Oryza sativa
  • Physcomitrella patens
  • Volvox carteri
  1. Navigate your web browser to www.phytozome.net. Select the species you want to search for homologous sequences from by clicking on its name in the phylogenetic tree. Select "Target type: Proteome" and then make a BLASTp search for homologous sequences using your Arabidopsis thaliana reference sequence. Click the "G" (Go to Gene View) next to the gene name of the blast result you want to analyse further. On the next page select the "Sequences" tab and then "Peptide sequence" in order to see the amino acid sequence. Save the sequences you want to include in the analysis in the same file as you saved the reference sequence (Hint: at this stage it is better to save too many, rather than too few sequences). However, be aware that the same gene may have several slightly different gene models an you only need to save one of them (e.g the two sequence names AT2G28800.1 and AT2G28800.2 are two different gene models for the gene AT2G28800).

  2. Upload your sequences to the Mafft alignment server and align them (Alternatively you can use a local installation of your favorite alignment program). The alignment is easier to read if you click "View" and then "Start MSAViewer in this window". Look for sequences in the alignment that are poorly aligned to the rest, and exclude them if you suspect that they are not homologous to your reference sequence. Keep aligning/analysing/excluding sequences until you are happy with the alignment. Save the resulting sequences to your computer.

  3. Once again redirect your web browser to a new web site. This time to www.phylogeny.fr/. Select their "One Click" function and upload your data and run the analysis using the default settings. After the analysis has finished you'll be presented with a phylogenetic tree. Manipulate your tree by changing the rooting (try "Mid-point rooting" and "Reroot (outgroup)") etc. Also try the "Flip" and "Swap" options in order to facilitate the comparison to the species tree you draw earlier. Does your result make sense? Can you distinguish the nodes representing gene duplications from the once indicating speciation events? Also play around with the other options to make the tree look its best.

  4. Go back and add more sequences to your analysis, if you think you have missed some homologues. Then redo the steps outlined above.

Clone this wiki locally