You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: core/survey.md
+88-79
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,11 @@
1
1
---
2
2
3
3
layout: ots
4
-
title: Working with Strings
4
+
title: A typical problem -- Analyzing a survey
5
5
6
6
---
7
7
8
-
# A problem
9
-
10
-
Now we know how to work with text files, we'll use that knowledge to solve a problem:
8
+
# Our very, very important problem
11
9
12
10
Suppose you're a greengrocer, and you run a survey to see what radish varieties your customers prefer the most. You have your assistant type up the survey results into a text file on your computer, so you have 300 lines of survey data in the file [radishsurvey.txt](../files/radishsurvey.txt). Each line consists of a name, a hyphen, then a radish variety:
13
11
@@ -26,6 +24,8 @@ Suppose you're a greengrocer, and you run a survey to see what radish varieties
26
24
27
25
<ahref="http://www.flickr.com/photos/brixton/2045816352/"title="Radishes radishes radishes by brixton, on Flickr"><imgsrc="http://farm3.staticflickr.com/2298/2045816352_25cba9e434_m.jpg"width="240"height="180"alt="Radishes radishes radishes"></a>
28
26
27
+
(You may have noticed that this is a very simple file: Unlike on a document or web page, there is no formatting whatsoever. It doesn't look pretty, but it has one big advantage: This is the simplest type of text format to work with on a computer, so it is also the most easy to process and analyze.)
28
+
29
29
You want to know:
30
30
31
31
* What's the most popular radish variety?
@@ -41,24 +41,26 @@ You want to know:
41
41
42
42
Save the file [radishsurvey.txt](../files/radishsurvey.txt) to your computer. How do we write a program to find out which person voted for each radish preference?
43
43
44
-
From the previous chapter, we know that we can easily go through the file line by line, and each line will have a value like `"Jin Li - White Icicle\n"`. We also know that we can strip off the trailing newline with the `strip()` method:
44
+
We can easily open the file with Python and go through the file line by line. Each line will have a value like `"Jin Li - White Icicle\n"`. Then we can strip off the trailing newline with the `strip()` method. (If you are curious, you can look at the documentation for [open](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) and [split](https://docs.python.org/3/library/stdtypes.html?highlight=strip#str.strip) )
45
45
46
-
for line in open("radishsurvey.txt"):
47
-
line = line.strip()
48
-
# Do something with each line
46
+
whith open("radishsurvey.txt") as file:
47
+
for line in file:
48
+
line = line.strip()
49
+
# Do something with each line
49
50
50
51
We need a way to split each line into the name and the vote. Thankfully, Python comes with dozens of string methods including one called `split()`. [Have a look at the documentation for split()](http://docs.python.org/3.3/library/stdtypes.html#str.split) and see if you can figure out how to split each line into the name and the vote.
51
52
52
53
(Don't worry if you can't write a program that does this just yet, but at least have a think about it before you skip to the solution.)
53
54
54
55
### Solution
55
56
56
-
for line in open("radishsurvey.txt"):
57
-
line = line.strip()
58
-
parts = line.split(" - ")
59
-
name = parts[0]
60
-
vote = parts[1]
61
-
print(name + " voted for " + vote)
57
+
whith open("radishsurvey.txt") as file:
58
+
for line in file:
59
+
line = line.strip()
60
+
parts = line.split(" - ")
61
+
name = parts[0]
62
+
vote = parts[1]
63
+
print(name + " voted for " + vote)
62
64
63
65
There's a few things going on here, so let's go through it line by line. *Walking through a program in your head and thinking about what each line does by itself is a good way to start to understand it*
64
66
@@ -105,12 +107,13 @@ Use the previous example as a base. You'll need to compare the vote with the str
105
107
106
108
### Solution
107
109
108
-
for line in open("radishsurvey.txt"):
109
-
line = line.strip()
110
-
parts = line.split(" - ")
111
-
name, vote = parts
112
-
if vote == "White Icicle":
113
-
print(name + " likes White Icicle!")
110
+
with open("radishsurvey.txt") as file:
111
+
for line in file:
112
+
line = line.strip()
113
+
parts = line.split(" - ")
114
+
name, vote = parts
115
+
if vote == "White Icicle":
116
+
print(name + " likes White Icicle!")
114
117
115
118
You might notice that the code splitting the line has become even shorter here. Instead of assigning each element of parts separately, we can assign them together using a technique called "multiple assignment". The line `name, vote = parts` means to assign each variable to the corresponding item in the list.
116
119
@@ -159,11 +162,12 @@ Use your previous solution as a base. You'll need a variable to hold the number
159
162
160
163
print("Counting votes for White Icicle...")
161
164
count = 0
162
-
for line in open("radishsurvey.txt"):
163
-
line = line.strip()
164
-
name, vote = line.split(" - ")
165
-
if vote == "White Icicle":
166
-
count = count + 1
165
+
whith open("radishsurvey.txt") as file:
166
+
for line in file:
167
+
line = line.strip()
168
+
name, vote = line.split(" - ")
169
+
if vote == "White Icicle":
170
+
count = count + 1
167
171
print(count)
168
172
169
173
@@ -178,11 +182,12 @@ Using your function, can you write a program which counts votes for White Icicle
178
182
def count_votes(radish):
179
183
print("Counting votes for " + radish + "...")
180
184
count = 0
181
-
for line in open("radishsurvey.txt"):
182
-
line = line.strip()
183
-
name, vote = line.split(" - ")
184
-
if vote == radish:
185
-
count = count + 1
185
+
whith open("radishsurvey.txt") as file:
186
+
for line in file:
187
+
line = line.strip()
188
+
name, vote = line.split(" - ")
189
+
if vote == radish:
190
+
count = count + 1
186
191
return count
187
192
188
193
print(count_votes("White Icicle"))
@@ -241,15 +246,16 @@ Remember that for dictionaries `counts[vote]` means "the value in `counts` which
241
246
# with vote counts
242
247
counts = {}
243
248
244
-
for line in open("radishsurvey.txt"):
245
-
line = line.strip()
246
-
name, vote = line.split(" - ")
247
-
if vote not in counts:
248
-
# First vote for this variety
249
-
counts[vote] = 1
250
-
else:
251
-
# Increment the vote count
252
-
counts[vote] = counts[vote] + 1
249
+
whith open("radishsurvey.txt") as file:
250
+
for line in file:
251
+
line = line.strip()
252
+
name, vote = line.split(" - ")
253
+
if vote not in counts:
254
+
# First vote for this variety
255
+
counts[vote] = 1
256
+
else:
257
+
# Increment the vote count
258
+
counts[vote] = counts[vote] + 1
253
259
print(counts)
254
260
255
261
### Pretty printing
@@ -319,17 +325,18 @@ There are lots of functions which could remove the case distinction. `str.lower(
319
325
# with vote counts
320
326
counts = {}
321
327
322
-
for line in open("radishsurvey.txt"):
323
-
line = line.strip()
324
-
name, vote = line.split(" - ")
325
-
# munge the vote string to clean it up
326
-
vote = vote.strip().capitalize()
327
-
if not vote in counts:
328
-
# First vote for this variety
329
-
counts[vote] = 1
330
-
else:
331
-
# Increment the vote count
332
-
counts[vote] = counts[vote] + 1
328
+
whith open("radishsurvey.txt") as file:
329
+
for line in file:
330
+
line = line.strip()
331
+
name, vote = line.split(" - ")
332
+
# munge the vote string to clean it up
333
+
vote = vote.strip().capitalize()
334
+
if not vote in counts:
335
+
# First vote for this variety
336
+
counts[vote] = 1
337
+
else:
338
+
# Increment the vote count
339
+
counts[vote] = counts[vote] + 1
333
340
print(counts)
334
341
335
342
If you're having trouble spotting the difference here, it's
@@ -386,24 +393,25 @@ This is just one of many ways to do this:
386
393
# Create an empty list with the names of everyone who voted
387
394
voted = []
388
395
389
-
for line in open("radishsurvey.txt"):
390
-
line = line.strip()
391
-
name, vote = line.split(" - ")
392
-
# clean up the person's name
393
-
name = name.strip().capitalize().replace(" "," ")
394
-
# check if this person already voted
395
-
if name in voted:
396
-
print(name + " has already voted! Fraud!")
397
-
continue
398
-
voted.append(name)
399
-
# munge the vote string to clean it up
400
-
vote = vote.strip().capitalize().replace(" "," ")
401
-
if not vote in counts:
402
-
# First vote for this variety
403
-
counts[vote] = 1
404
-
else:
405
-
# Increment the vote count
406
-
counts[vote] += 1
396
+
whith open("radishsurvey.txt") as file:
397
+
for line in file:
398
+
line = line.strip()
399
+
name, vote = line.split(" - ")
400
+
# clean up the person's name
401
+
name = name.strip().capitalize().replace(" "," ")
402
+
# check if this person already voted
403
+
if name in voted:
404
+
print(name + " has already voted! Fraud!")
405
+
continue
406
+
voted.append(name)
407
+
# munge the vote string to clean it up
408
+
vote = vote.strip().capitalize().replace(" "," ")
409
+
if not vote in counts:
410
+
# First vote for this variety
411
+
counts[vote] = 1
412
+
else:
413
+
# Increment the vote count
414
+
counts[vote] += 1
407
415
408
416
print("Results:")
409
417
print()
@@ -473,15 +481,16 @@ This is just one possible way to break it down:
473
481
counts[radish] = counts[radish] + 1
474
482
475
483
476
-
for line in open("radishsurvey.txt"):
477
-
line = line.strip()
478
-
name, vote = line.split(" - ")
479
-
name = clean_string(name)
480
-
vote = clean_string(vote)
481
-
482
-
if not has_already_voted(name):
483
-
count_vote(vote)
484
-
voted.append(name)
484
+
whith open("radishsurvey.txt") as file:
485
+
for line in file:
486
+
line = line.strip()
487
+
name, vote = line.split(" - ")
488
+
name = clean_string(name)
489
+
vote = clean_string(vote)
490
+
491
+
if not has_already_voted(name):
492
+
count_vote(vote)
493
+
voted.append(name)
485
494
486
495
print("Results:")
487
496
print()
@@ -524,7 +533,7 @@ The loop shown above keeps track of one name, `winner_name`, and the number of v
524
533
525
534
## Challenge
526
535
527
-
Can you refactor the part of the program that finds the winner into a function?
536
+
Can you extract the part of the program that finds the winner into a function?
528
537
529
538
## Bigger Challenge
530
539
@@ -534,4 +543,4 @@ Can you write a winner function that could deal with a tie?
534
543
535
544
## Next Chapter
536
545
537
-
When you're done counting radish votes, the next chapter is [Creating Charts](charts.html)
546
+
That became complicated pretty quickly, didn't it? In the next chapter, we will try an easier way to [analyze the survey using pandas](pandas.html), a Python library designed for data analysis.
0 commit comments