Finding the Secondary Structure of a Protein

The page on Secondary Structures in Wikipedia defines it well:

In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids (DNA/RNA). It does not, however, describe specific atomic positions in three-dimensional space, which are considered to be tertiary structure.

Secondary structure is formally defined by the hydrogen bonds of the biopolymer, as observed in an atomic-resolution structure. In proteins, the secondary structure is defined by patterns of hydrogen bonds between backbone amide and carboxyl groups (sidechain-mainchain and sidechain-sidechain hydrogen bonds are irrelevant), where the DSSP definition of a hydrogen bond is used. In nucleic acids, the secondary structure is defined by the hydrogen bonding between the nitrogenous bases.

Goal

In this lab we use the PSIPRED system to get the secondary structure of a protein.

Steps

Connect to the Psipred system at bioinf.cs.ucl.ac.uk/psipred/
Click on CLICK HERE TO ENTER THE SERVER
Copy/Paste a protein sequence in FASTA format (without the first text line). Here we are using a sequence we obtained in Lab #9 for the E. Coli sequence reference AAF33488:

MESTVDTELLKTFLEVSRTRHFGRAAEALYLTQSAVSFRIRQLENQLGVNLFTRHRNNIRLTTAGEKLLP
YAETLMNTWQAARKEVAHTSRHNEFSIGASASLWECMLNAWLGRLYQLQEPQSGLQFEARIAQRQSLVKQ
LHERQLDLLITTEAPKMDEFSSQLLGHFTLALYCSSPARKKSELNYLRLEWGPDFQQHETGLIAADEVPV
LTTSSAELARQQLSALNGCSWLPVNWANEKGGLHTVADSATLSRPLYAIWLQNSDKYSLICDLLKTDVLD
EQ

Accept the defaults, enter your email address, then click on Predict

Wait a while
Wait some more
Depending on the number of job requests at Psipred, you may have to wait quite a while. If this is too long and you want to continue with one of the sequences I was able to get processed ahead of time and get back via email, follow this link.
Assuming that you have received the result back from Psipred, you will observe that you get several pieces of information, including text, pdf, and jpeg outputs.
Observe the text output. It should look something like this:

PSIPRED PREDICTION RESULTS

Key

Conf: Confidence (0=low, 9=high)
Pred: Predicted secondary structure (H=helix, E=strand, C=coil)
  AA: Target sequence

    
# PSIPRED HFORMAT (PSIPRED V2.6 by David Jones)

Conf: 999888999999999998399899999967888489999999999829843897699436
Pred: CCCCCCHHHHHHHHHHHHCCCHHHHHHHHCCCCCHHHHHHHHHHHHHCCEEEEECCCCEE
  AA: MESTVDTELLKTFLEVSRTRHFGRAAEALYLTQSAVSFRIRQLENQLGVNLFTRHRNNIR
              10        20        30        40        50        60

. . .

Conf: 988943789986148999995899707999999998764059
Pred: CCEEECCCCCCCEEEEEEEEECCCCCHHHHHHHHHHHHHHCC
  AA: GGLHTVADSATLSRPLYAIWLQNSDKYSLICDLLKTDVLDEQ
             250       260       270       280

Question 1: Print your query result on a piece of paper and color the helix, strand, and coil regions of the protein with different colors.

Question 2: You will have noticed that the first line of each group is a confidence value.

For the secondary structure that you got back for your query, what is the overal, average confidence value for the prediction of regions to helix, coil or strand structure? Be precise, and figure out how to use the computer to give you the answer!

Question 3: Compute the confidence level for the helix structures only. For the coil structures only. For the strand structures only. What part or parts of the protein are predicted with high confidence? Low confidence?

Question 4: Display/print the pdf or jpeg versions you will have received from Psipred. Note that the confidence level is represented as a short rectangle. Do you get more information from the visual display than from your own computation for the confidence of the prediction?

Back to lab page

CSC334 Lab10

Finding the Secondary Structure of a Protein

Goal

Steps

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools