CSC334 Lab10

From dftwiki3
Revision as of 16:58, 13 August 2008 by Thiebaut (talk | contribs)
Jump to: navigation, search

<meta name="keywords" content="computer science, bioinformatics, DNA, CSC334, Lab" /> <meta name="description" content="DNA Sequence Logo Lab" /> <meta name="title" content="Bioinformatics Lab" /> <meta name="abstract" content="DNA Sequence Logo" /> <meta name="author" content="thiebaut at cs.smith.edu" />

Back to lab page


Finding the Secondary Structure of a Protein

The page on Secondary Structures in Wikipedia defines it well:

In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids (DNA/RNA). It does not, however, describe specific atomic positions in three-dimensional space, which are considered to be tertiary structure.
Secondary structure is formally defined by the hydrogen bonds of the biopolymer, as observed in an atomic-resolution structure. In proteins, the secondary structure is defined by patterns of hydrogen bonds between backbone amide and carboxyl groups (sidechain-mainchain and sidechain-sidechain hydrogen bonds are irrelevant), where the DSSP definition of a hydrogen bond is used. In nucleic acids, the secondary structure is defined by the hydrogen bonding between the nitrogenous bases.

Goal

In this lab we use the PSIPRED system to get the secondary structure of a protein.

Steps

MESTVDTELLKTFLEVSRTRHFGRAAEALYLTQSAVSFRIRQLENQLGVNLFTRHRNNIRLTTAGEKLLP
YAETLMNTWQAARKEVAHTSRHNEFSIGASASLWECMLNAWLGRLYQLQEPQSGLQFEARIAQRQSLVKQ
LHERQLDLLITTEAPKMDEFSSQLLGHFTLALYCSSPARKKSELNYLRLEWGPDFQQHETGLIAADEVPV
LTTSSAELARQQLSALNGCSWLPVNWANEKGGLHTVADSATLSRPLYAIWLQNSDKYSLICDLLKTDVLD
EQ
  • Accept the defaults, enter your email address, then click on Predict
PsipredRequestFormEColi.png
  • Wait a while
  • Wait some more
  • Depending on the number of job requests at Psipred, you may have to wait quite a while. If this is too long and you want to continue with one of the sequences I was able to get processed ahead of time and get back via email, follow this link.
  • Assuming that you have received the result back from Psipred, you will observe that you get several pieces of information, including text, pdf, and jpeg outputs.
  • Observe the text output. It should look something like this:
PSIPRED PREDICTION RESULTS

Key

Conf: Confidence (0=low, 9=high)
Pred: Predicted secondary structure (H=helix, E=strand, C=coil)
  AA: Target sequence

    
# PSIPRED HFORMAT (PSIPRED V2.6 by David Jones)

Conf: 999888999999999998399899999967888489999999999829843897699436
Pred: CCCCCCHHHHHHHHHHHHCCCHHHHHHHHCCCCCHHHHHHHHHHHHHCCEEEEECCCCEE
  AA: MESTVDTELLKTFLEVSRTRHFGRAAEALYLTQSAVSFRIRQLENQLGVNLFTRHRNNIR
              10        20        30        40        50        60

. . .

Conf: 988943789986148999995899707999999998764059
Pred: CCEEECCCCCCCEEEEEEEEECCCCCHHHHHHHHHHHHHHCC
  AA: GGLHTVADSATLSRPLYAIWLQNSDKYSLICDLLKTDVLDEQ
             250       260       270       280

  • Question 1: Print your query result on a piece of paper and color the helix, strand, and coil regions of the protein with different colors.
  • Question 2: You will have noticed that the first line of each group is a confidence value.
For the secondary structure that you got back for your query, what is the overal, average confidence value for the prediction of regions to helix, coil or strand structure? Be precise, and figure out how to use the computer to give you the answer!
  • Question 3: Compute the confidence level for the helix structures only. For the coil structures only. For the strand structures only. What part or parts of the protein are predicted with high confidence? Low confidence?
  • Question 4: Display/print the pdf or jpeg versions you will have received from Psipred. Note that the confidence level is represented as a short rectangle. Do you get more information from the visual display than from your own computation for the confidence of the prediction?



Back to lab page