CSC111 Lab 8 2014

From dftwiki3
Revision as of 15:14, 24 March 2014 by Thiebaut (talk | contribs) (Processing DNA Strings)
Jump to: navigation, search

--D. Thiebaut (talk) 14:01, 24 March 2014 (EDT)


This lab deals with strings and list operations, and transforming strings into lists and lists into strings.


Splitting Strings


Work in the console, and try these different commands. Observe what the different operations do.

>>> line = "The quick, red fox jumped.  It jumped over the lazy, sleepy, brown dog."
>>> line

>>> line.split()
>>> words = line.split()
>>> words

>>> words[0]

>>> words[1]

>>> words[-1]

>>> words[-2]

>>> chunks = line.split( ',' )      # split on commas
>>> chunks

>>> chunks = line.split( '.' )      # split on periods
>>> chunks

>>> words

>>> separator = "+"
>>> newLine = separator.join( words )    # join the words into a new string and use separator as the glue
>>> newLine

>>> separator = "$$$"
>>> newLine = separator.join( words )    # same but use $$$ as the glue
>>> newLine

>>> words       # verify that you still have individual words in this list

>>> newWords = [ words[0], words[3], words[4], words[7], words[8], words[12] ] # create a new list 
>>> newWords

>>> " ".join( newWords )      # join strings in newWords list with a space

Mini Assignments


The solution program for the Exercises we saw in class on Monday and Wednesday contains good models of code that can be used to answer most of the challenges in this lab.


Challenge 1

QuestionMark1.jpg
  • Use a judicious mix() of split() and join operations to convert the string
"1	China	1,339,190,000	9,596,960.00	139.54	3,705,405.45	361.42"
into a new string:
"China 1339190000"
Note 1: the lack of commas in the number! (Hints: string objects have replace methods that could prove useful here!)
Note 2: that this line is taken from a table from this URL where the numbers after the country indicate a) the population, the area and population density expressed with square-kilometers, and the area and population density expressed with square-miles.




Challenge 2

QuestionMark2.jpg
  • Given the following list, store it into a multi-line variable called text, split it into individual lines, and apply your transformation to each line so that your program outputs only the country names and their populations.
Bangladesh	164,425,000	144,000.00	1,141.84	55,598.69	2,957.35
Brazil	193,364,000	8,511,965.00	22.72	3,286,486.71	58.84
China	1,339,190,000	9,596,960.00	139.54	3,705,405.45	361.42
Egypt	78,848,000	1,001,450.00	78.73	386,661.85	203.92
Ethiopia	79,221,000	1,127,127.00	70.29	435,185.99	182.04
Germany	81,757,600	357,021.00	229.00	137,846.52	593.11
India	1,184,639,000	3,287,590.00	360.34	1,269,345.07	933.27
Indonesia	234,181,400	1,919,440.00	122.01	741,099.62	315.99
Iran	75,078,000	1,648,000.00	45.56	636,296.10	117.99
Japan	127,380,000	377,835.00	337.13	145,882.85	873.17
Mexico	108,396,211	1,972,550.00	54.95	761,605.50	142.33
Nigeria	170,123,000	923,768.00	171.32	356,668.67	443.71
Pakistan	170,260,000	803,940.00	211.78	310,402.84	548.51
Phillipines	94,013,200	300,000.00	313.38	115,830.60	811.64
Russia	141,927,297	17,075,200.00	8.31	6,592,768.87	21.53
United-States	309,975,000	9,629,091.00	32.19	3,717,811.29	83.38
Vietnam	85,789,573	329,560.00	260.32	127,243.78	674.21
Your first variable should be text, defined as follows:


text = """ Bangladesh	164,425,000	144,000.00	1,141.84	55,598.69	2,957.35
 Brazil	193,364,000	8,511,965.00	22.72	3,286,486.71	58.84
 China	1,339,190,000	9,596,960.00	139.54	3,705,405.45	361.42
 Egypt	78,848,000	1,001,450.00	78.73	386,661.85	203.92
 Ethiopia	79,221,000	1,127,127.00	70.29	435,185.99	182.04
 Germany	81,757,600	357,021.00	229.00	137,846.52	593.11
 India	1,184,639,000	3,287,590.00	360.34	1,269,345.07	933.27
 Indonesia	234,181,400	1,919,440.00	122.01	741,099.62	315.99
 Iran	75,078,000	1,648,000.00	45.56	636,296.10	117.99
 Japan	127,380,000	377,835.00	337.13	145,882.85	873.17
 Mexico	108,396,211	1,972,550.00	54.95	761,605.50	142.33
 Nigeria	170,123,000	923,768.00	171.32	356,668.67	443.71
 Pakistan	170,260,000	803,940.00	211.78	310,402.84	548.51
 Phillipines	94,013,200	300,000.00	313.38	115,830.60	811.64
 Russia	141,927,297	17,075,200.00	8.31	6,592,768.87	21.53
 United-States	309,975,000	9,629,091.00	32.19	3,717,811.29	83.38
 Vietnam	85,789,573	329,560.00	260.32	127,243.78	674.21"""




Challenge 3

QuestionMark3.jpg
  • Take your solution for Challenge 2 and make it output the country with the largest population.








Challenge 4

QuestionMark4.jpg
  • Same as Challenge 3, but this time make your program output the country with the largest population density.










Sorting Lists, Reversing List, finding the Min or Max of a List


Enter the different commands below in the console, and observe how Python executes each line.

>>> seven = [ "Sleepy", "Sneezy", "Bashful", "Happy", "Grumpy", "Dopey", "Doc" ]
>>> seven.sort()
>>> seven

>>> seven.reverse()
>>> seven

>>> nums = [0, 10, -200, 3, 4, 100]
>>> nums.sort()
>>> nums

>>> nums.reverse()
>>> nums
 
>>> min( nums )

>>> max( nums )


>>> dwarvesHeight = [('Doc', 2), ('Dopey', 6), ('Grumpy', 4.5), ('Happy', 7),('Bashful', 3)]
>>> dwarvesHeight.sort()
>>> dwarvesHeight

>>> heightDwarves = []
>>> for pair in dwarvesHeight:
	      name = pair[0]
	      height = pair[1]
	      heightDwarves.append( (height, name ) )

	
>>> heightDwarves

>>> heightDwarves.sort()
>>> heightDwarves

>>> heightDwarves.reverse()
>>> heightDwarves

>>> min( heightDwarves )

>>> max( heightDwarves )

>>> 


Challenge 5

QuestionMark5.jpg
  • Make your program use the original text variable and store the pairs (population, country name) into a list
  • Make your program output the country with the smallest population, nicely formatted (i.e. no parentheses or commas printed)
  • Make your program output the country with the largest population.
  • Make your program output the list of countries and population sorted from largest population to smallest population. The information should show the country first on each line, followed by its population.








Challenge 6

QuestionMark6.jpg
  • Make your program output the list of countries and population sorted from largest population to smallest population. The information should show the country first on each line, followed by its population.









Processing DNA Strings


A DNA string is a string composed of sequences of four nucleobase (guanine, adenine, thymine, and cytosine) represented by the letters G, A, T, and C. Assume that we have a DNA string defined as follows:

AGCCTTCTAAGGTTAATTAACTCGAGAGAGGGTTGGCGCAGTTAAAGGCCTTAATCGGTTCTGT

Figure out a way in Python to extract the string that is between the two markers AAGG. In other words create a variable called DNA equal to the string above, then use all the methods we've seen so far to isolate the string between the markers and print it.



Challenge 7

QuestionMark8.jpg
  • Assume that DNA now is a multi-line string defined as follows:


DNA = """AGCCTTCTAGCGTTAATTAACTCGAGAGAGGGTTGGCGCAGTTACCTTAATCGGTTCTGT
     TCCTGAGCGAAAGGGCTCAAGCACCTGTTACCTCTGTGATAACGCCAGAGTAACTCGAGC
     AAAGACAAGGGAAGCTCTAACCATGTCCGAGACAAGTTGTCTAGCAGTCCCAGTTCACACTTG      ACAATCTACAAATTAGAGCACGGATCATTTACAGGCCAATCTGGCGCGTTAATCGA
     TTTCCGCAAACCGCCATGCTGCATCATTACGGGAACCACACGCCGGAAGCAGGAACAGCA"""
where the markers are on separate lines. Modify your previous solution so that it works on this new string.
  • Make your program display the string between the markers on one line only.
  • Make your program output the length of the string between markers
  • Make your program display how many adenine (A) nucleobases the string between markers contains.







Coldest year in Oxford?


The page at URL http://www.metoffice.gov.uk/climate/uk/stationdata/ contains historical temperature data for different cities in the United Kingdom.
Click on Oxford and get a page of recorded temperatures since 1853.

Oxford
Location: 4509E 2072N, 63 metres amsl
Estimated data is marked with a * after the value.
Missing data (more than 2 days missing in month) is marked by  ---.
Sunshine data taken from an automatic Kipp & Zonen sensor marked with a #, otherwise sunshine data taken from a Campbell Stokes recorder.
   yyyy  mm   tmax    tmin      af    rain     sun
              degC    degC    days      mm   hours
   1853   1    8.4     2.7       4    62.8     ---
   1853   2    3.2    -1.8      19    29.3     ---
   1853   3    7.7    -0.6      20    25.9     ---
   1853   4   12.6     4.5       0    60.1     ---
   1853   5   16.8     6.1       0    59.5     ---
   1853   6   20.1    10.7       0    82.0     ---
   1853   7   21.2    12.2       0    86.2     ---


Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg





Challenge x

QuestionMark1.jpg




  • Figure out a way to take a string of the form "Pakistan 108 166 226" where the first word is a country name, and the following three numbers are estimated populations of this country in 1900, 2008, and 2025, into a new string with only the first and last words, i.e. "Pakistan 226".


China 1,458 India 1,398 United-States 352 Indonesia 273 Brazil 223 Pakistan 226 Bangladesh 198 Nigeria 208 Russia 137 Japan 126