Skip to content

Commit cca60e3

Browse files
committed
Combined files and strings section into a tutorial on survey analysis using basic tools
1 parent a80b051 commit cca60e3

File tree

4 files changed

+90
-376
lines changed

4 files changed

+90
-376
lines changed

_config.yml

+2-5
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,8 @@ map:
1818
- title: Introducing IPython Notebook
1919
path: /core/notebook.html
2020
caption: A whole new way to work with Python!
21-
- title: Working With Text Files
22-
path: /core/text-files.html
23-
caption: What is a text file? How do we get them in and out of Python?
24-
- title: Working With Strings
25-
path: /core/strings.html
21+
- title: A typical problem -- Analyzing a survey
22+
path: /core/survey.html
2623
caption: Once we have our text in Python, what can we do with it?
2724
- title: Creating Charts
2825
path: /core/charts.html

core/strings.md renamed to core/survey.md

+88-79
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,11 @@
11
---
22

33
layout: ots
4-
title: Working with Strings
4+
title: A typical problem -- Analyzing a survey
55

66
---
77

8-
# A problem
9-
10-
Now we know how to work with text files, we'll use that knowledge to solve a problem:
8+
# Our very, very important problem
119

1210
Suppose you're a greengrocer, and you run a survey to see what radish varieties your customers prefer the most. You have your assistant type up the survey results into a text file on your computer, so you have 300 lines of survey data in the file [radishsurvey.txt](../files/radishsurvey.txt). Each line consists of a name, a hyphen, then a radish variety:
1311

@@ -26,6 +24,8 @@ Suppose you're a greengrocer, and you run a survey to see what radish varieties
2624

2725
<a href="http://www.flickr.com/photos/brixton/2045816352/" title="Radishes radishes radishes by brixton, on Flickr"><img src="http://farm3.staticflickr.com/2298/2045816352_25cba9e434_m.jpg" width="240" height="180" alt="Radishes radishes radishes"></a>
2826

27+
(You may have noticed that this is a very simple file: Unlike on a document or web page, there is no formatting whatsoever. It doesn't look pretty, but it has one big advantage: This is the simplest type of text format to work with on a computer, so it is also the most easy to process and analyze.)
28+
2929
You want to know:
3030

3131
* What's the most popular radish variety?
@@ -41,24 +41,26 @@ You want to know:
4141

4242
Save the file [radishsurvey.txt](../files/radishsurvey.txt) to your computer. How do we write a program to find out which person voted for each radish preference?
4343

44-
From the previous chapter, we know that we can easily go through the file line by line, and each line will have a value like `"Jin Li - White Icicle\n"`. We also know that we can strip off the trailing newline with the `strip()` method:
44+
We can easily open the file with Python and go through the file line by line. Each line will have a value like `"Jin Li - White Icicle\n"`. Then we can strip off the trailing newline with the `strip()` method. (If you are curious, you can look at the documentation for [open](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) and [split](https://docs.python.org/3/library/stdtypes.html?highlight=strip#str.strip) )
4545

46-
for line in open("radishsurvey.txt"):
47-
line = line.strip()
48-
# Do something with each line
46+
whith open("radishsurvey.txt") as file:
47+
for line in file:
48+
line = line.strip()
49+
# Do something with each line
4950

5051
We need a way to split each line into the name and the vote. Thankfully, Python comes with dozens of string methods including one called `split()`. [Have a look at the documentation for split()](http://docs.python.org/3.3/library/stdtypes.html#str.split) and see if you can figure out how to split each line into the name and the vote.
5152

5253
(Don't worry if you can't write a program that does this just yet, but at least have a think about it before you skip to the solution.)
5354

5455
### Solution
5556

56-
for line in open("radishsurvey.txt"):
57-
line = line.strip()
58-
parts = line.split(" - ")
59-
name = parts[0]
60-
vote = parts[1]
61-
print(name + " voted for " + vote)
57+
whith open("radishsurvey.txt") as file:
58+
for line in file:
59+
line = line.strip()
60+
parts = line.split(" - ")
61+
name = parts[0]
62+
vote = parts[1]
63+
print(name + " voted for " + vote)
6264

6365
There's a few things going on here, so let's go through it line by line. *Walking through a program in your head and thinking about what each line does by itself is a good way to start to understand it*
6466

@@ -105,12 +107,13 @@ Use the previous example as a base. You'll need to compare the vote with the str
105107

106108
### Solution
107109

108-
for line in open("radishsurvey.txt"):
109-
line = line.strip()
110-
parts = line.split(" - ")
111-
name, vote = parts
112-
if vote == "White Icicle":
113-
print(name + " likes White Icicle!")
110+
with open("radishsurvey.txt") as file:
111+
for line in file:
112+
line = line.strip()
113+
parts = line.split(" - ")
114+
name, vote = parts
115+
if vote == "White Icicle":
116+
print(name + " likes White Icicle!")
114117

115118
You might notice that the code splitting the line has become even shorter here. Instead of assigning each element of parts separately, we can assign them together using a technique called "multiple assignment". The line `name, vote = parts` means to assign each variable to the corresponding item in the list.
116119

@@ -159,11 +162,12 @@ Use your previous solution as a base. You'll need a variable to hold the number
159162

160163
print("Counting votes for White Icicle...")
161164
count = 0
162-
for line in open("radishsurvey.txt"):
163-
line = line.strip()
164-
name, vote = line.split(" - ")
165-
if vote == "White Icicle":
166-
count = count + 1
165+
whith open("radishsurvey.txt") as file:
166+
for line in file:
167+
line = line.strip()
168+
name, vote = line.split(" - ")
169+
if vote == "White Icicle":
170+
count = count + 1
167171
print(count)
168172

169173

@@ -178,11 +182,12 @@ Using your function, can you write a program which counts votes for White Icicle
178182
def count_votes(radish):
179183
print("Counting votes for " + radish + "...")
180184
count = 0
181-
for line in open("radishsurvey.txt"):
182-
line = line.strip()
183-
name, vote = line.split(" - ")
184-
if vote == radish:
185-
count = count + 1
185+
whith open("radishsurvey.txt") as file:
186+
for line in file:
187+
line = line.strip()
188+
name, vote = line.split(" - ")
189+
if vote == radish:
190+
count = count + 1
186191
return count
187192

188193
print(count_votes("White Icicle"))
@@ -241,15 +246,16 @@ Remember that for dictionaries `counts[vote]` means "the value in `counts` which
241246
# with vote counts
242247
counts = {}
243248

244-
for line in open("radishsurvey.txt"):
245-
line = line.strip()
246-
name, vote = line.split(" - ")
247-
if vote not in counts:
248-
# First vote for this variety
249-
counts[vote] = 1
250-
else:
251-
# Increment the vote count
252-
counts[vote] = counts[vote] + 1
249+
whith open("radishsurvey.txt") as file:
250+
for line in file:
251+
line = line.strip()
252+
name, vote = line.split(" - ")
253+
if vote not in counts:
254+
# First vote for this variety
255+
counts[vote] = 1
256+
else:
257+
# Increment the vote count
258+
counts[vote] = counts[vote] + 1
253259
print(counts)
254260

255261
### Pretty printing
@@ -319,17 +325,18 @@ There are lots of functions which could remove the case distinction. `str.lower(
319325
# with vote counts
320326
counts = {}
321327

322-
for line in open("radishsurvey.txt"):
323-
line = line.strip()
324-
name, vote = line.split(" - ")
325-
# munge the vote string to clean it up
326-
vote = vote.strip().capitalize()
327-
if not vote in counts:
328-
# First vote for this variety
329-
counts[vote] = 1
330-
else:
331-
# Increment the vote count
332-
counts[vote] = counts[vote] + 1
328+
whith open("radishsurvey.txt") as file:
329+
for line in file:
330+
line = line.strip()
331+
name, vote = line.split(" - ")
332+
# munge the vote string to clean it up
333+
vote = vote.strip().capitalize()
334+
if not vote in counts:
335+
# First vote for this variety
336+
counts[vote] = 1
337+
else:
338+
# Increment the vote count
339+
counts[vote] = counts[vote] + 1
333340
print(counts)
334341

335342
If you're having trouble spotting the difference here, it's
@@ -386,24 +393,25 @@ This is just one of many ways to do this:
386393
# Create an empty list with the names of everyone who voted
387394
voted = []
388395

389-
for line in open("radishsurvey.txt"):
390-
line = line.strip()
391-
name, vote = line.split(" - ")
392-
# clean up the person's name
393-
name = name.strip().capitalize().replace(" "," ")
394-
# check if this person already voted
395-
if name in voted:
396-
print(name + " has already voted! Fraud!")
397-
continue
398-
voted.append(name)
399-
# munge the vote string to clean it up
400-
vote = vote.strip().capitalize().replace(" "," ")
401-
if not vote in counts:
402-
# First vote for this variety
403-
counts[vote] = 1
404-
else:
405-
# Increment the vote count
406-
counts[vote] += 1
396+
whith open("radishsurvey.txt") as file:
397+
for line in file:
398+
line = line.strip()
399+
name, vote = line.split(" - ")
400+
# clean up the person's name
401+
name = name.strip().capitalize().replace(" "," ")
402+
# check if this person already voted
403+
if name in voted:
404+
print(name + " has already voted! Fraud!")
405+
continue
406+
voted.append(name)
407+
# munge the vote string to clean it up
408+
vote = vote.strip().capitalize().replace(" "," ")
409+
if not vote in counts:
410+
# First vote for this variety
411+
counts[vote] = 1
412+
else:
413+
# Increment the vote count
414+
counts[vote] += 1
407415

408416
print("Results:")
409417
print()
@@ -473,15 +481,16 @@ This is just one possible way to break it down:
473481
counts[radish] = counts[radish] + 1
474482

475483

476-
for line in open("radishsurvey.txt"):
477-
line = line.strip()
478-
name, vote = line.split(" - ")
479-
name = clean_string(name)
480-
vote = clean_string(vote)
481-
482-
if not has_already_voted(name):
483-
count_vote(vote)
484-
voted.append(name)
484+
whith open("radishsurvey.txt") as file:
485+
for line in file:
486+
line = line.strip()
487+
name, vote = line.split(" - ")
488+
name = clean_string(name)
489+
vote = clean_string(vote)
490+
491+
if not has_already_voted(name):
492+
count_vote(vote)
493+
voted.append(name)
485494

486495
print("Results:")
487496
print()
@@ -524,7 +533,7 @@ The loop shown above keeps track of one name, `winner_name`, and the number of v
524533

525534
## Challenge
526535

527-
Can you refactor the part of the program that finds the winner into a function?
536+
Can you extract the part of the program that finds the winner into a function?
528537

529538
## Bigger Challenge
530539

@@ -534,4 +543,4 @@ Can you write a winner function that could deal with a tie?
534543

535544
## Next Chapter
536545

537-
When you're done counting radish votes, the next chapter is [Creating Charts](charts.html)
546+
That became complicated pretty quickly, didn't it? In the next chapter, we will try an easier way to [analyze the survey using pandas](pandas.html), a Python library designed for data analysis.

0 commit comments

Comments
 (0)