Difference between revisions of "CSC111 Homework 11 2018"
Line 115: | Line 115: | ||
40 Dallas, TX | 40 Dallas, TX | ||
<br /> | <br /> | ||
+ | </showafterdate> | ||
<br /> | <br /> | ||
[[Category:CSC111]][[Category:Homework]] | [[Category:CSC111]][[Category:Homework]] |
Revision as of 20:36, 19 April 2018
D. Thiebaut (talk) 20:58, 19 April 2018 (EDT)
Make-Up Homework 11
This homework is due on Thursday, May 3rd, at 11:55 p.m. and is a make-up homework. You can use it to replace the lowest grade you have received on Homework Assignments 1 to 10.
Below is the algorithm explaining how will count toward your homework average grade.
hws = [ HW1, HW2, HW3, HW4, HW5, HW6, HW7, HW8, HW9, HW10 ] if HW11 != None: # if you have submitted HW11 minHW = min( hws ) if HW11 > minHW: hws.remove( minHW ) hws.append( HW11 ) # drop lowest, as specified in syllabus minHW = min( hws ) hws.remove( minHW ) # compute average homework grade avgHW = sum( hws )/len( hws )
Assignment
- Write a program that processes a csv file that has been downloaded from the department of eduction of the U.S. government. It contains the scorecard for 7,594 colleges and universities in the States. Each line of the csv file represent one college/university. This is what is called a scorecard. Each scorecard contains 123 fields. The city where the college/university is located is the field at Index 4. The state is in the field at Index 5.
- The original file was downloaded from https://catalog.data.gov/dataset?res_format=CSV and is mirrored here: http://cs.smith.edu/~dthiebaut/111/collegeScorecard.csv
- You should use the smith college URL in your program.
- Your program should output the 10 cities that contain the largest number of colleges and universities.
- The output should be formatted as followed:
91 New York, NY 76 Chicago, IL ... ... ... 40 Dallas, TX
- The first number is the number of universities or colleges, followed by the name of the city, followed by a comma, followed by the state, as two uppercase letters.
- Your program will not prompt for any input, and will only output 10 lines, as shown above.
- To help you test your program you are given the true first, second and tenth lines of the output. Your program will come up with the others.
- Your program should be well documented and make good use of functions.
- Submit your program as hw11.py in the Homework 11 section on Moodle.
<showafterdate after="20180504 12:00" before="20180601 00:00">
Solution Program
Source
# collegeScorecard.py # https://ed-public-download.app.cloud.gov/downloads/Most-Recent-Cohorts-Scorecard-Elements.csv def getLines( fileName ): file = open( fileName, 'r' ) lines = file.read() file.close() lines = lines.split( "\n" ) return lines def main(): lines = getLines( "collegeScorecard.csv" ) header = lines[0].split( ',' ) noFields = len( header ) cityIndex = 4 stateIndex = 5 cityStateDico = {} for line in lines[1: ]: try: city = line.split(',')[cityIndex].strip() state = line.split(',')[stateIndex].strip() except: continue cityState = city + "_" + state if cityState not in cityStateDico: cityStateDico[ cityState ] = 1 else: cityStateDico[ cityState ] += 1 listCities = [] for cityState in cityStateDico.keys(): listCities.append( (cityStateDico[cityState], cityState ) ) listCities.sort() listCities.reverse() for i in range( 10 ): print( listCities[i] ) main()
Output
91 New York, NY 76 Chicago, IL 74 Houston, TX 59 Los Angeles, CA 52 San Antonio, TX 50 Miami, FL 48 Brooklyn, NY 46 Philadelphia, PA 42 Atlanta, GA 40 Dallas, TX
</showafterdate>