Google's Python Course wordcount.py -
i taking google's python course, uses python 2.7. running 3.5.2.
the script functions. 1 of exercises.
#!/usr/bin/python -tt # copyright 2010 google inc. # licensed under apache license, version 2.0 # http://www.apache.org/licenses/license-2.0 # google's python class # http://code.google.com/edu/languages/google-python-class/ """wordcount exercise google's python class main() below defined , complete. calls print_words() , print_top() functions write. 1. --count flag, implement print_words(filename) function counts how each word appears in text , prints: word1 count1 word2 count2 ... print above list in order sorted word (python sort punctuation come before letters -- that's fine). store words lowercase, 'the' , 'the' count same word. 2. --topcount flag, implement print_top(filename) similar print_words() prints top 20 common words sorted common word first, next common, , on. use str.split() (no arguments) split on whitespace. workflow: don't build whole program @ once. intermediate milestone , print data structure , sys.exit(0). when that's working, try next milestone. optional: define helper function avoid code duplication inside print_words() , print_top(). """ import sys # +++your code here+++ # define print_words(filename) , print_top(filename) functions. # write helper utility function reads fcd ile # , builds , returns word/count dict it. # print_words() , print_top() can call utility function. ### def word_count_dict(filename): """returns word/count dict filename.""" # utility used count() , topcount(). word_count={} #map each word count input_file=open(filename, 'r') line in input_file: words=line.split() word in words: word=word.lower() # special case if we're seeing word first time. if not word in word_count: word_count[word]=1 else: word_count[word]=word_count[word] + 1 input_file.close() # not strictly required, form. return word_count def print_words(filename): """prints 1 per line '<word> <count>' sorted word given file.""" word_count=word_count_dict(filename) words=sorted(word_count.keys()) word in words: print(word,word_count[word]) def get_count(word_count_tuple): """returns count dict word/count tuple -- used custom sort.""" return word_count_tuple[1] def print_top(filename): """prints top count listing given file.""" word_count=word_count_dict(filename) # each (word, count) tuple. # sort big counts first using key=get_count() extract count. items=sorted(word_count.items(), key=get_count, reverse=true) # print first 20 item in items[:20]: print(item[0], item[1]) # basic command line argument parsing code provided , # calls print_words() , print_top() functions must define. def main(): if len(sys.argv) != 3: print('usage: ./wordcount.py {--count | --topcount} file') sys.exit(1) option = sys.argv[1] filename = sys.argv[2] if option == '--count': print_words(filename) elif option == '--topcount': print_top(filename) else: print ('unknown option: ' + option) sys.exit(1) if __name__ == '__main__': main()
here questions course not answering:
where says following, unsure of
1
,+1
mean. meanif word not in list, add list? (word_count[word]=1)
? and, don't understand each part of means, saysword_count[word]=word_count[word] + 1
.if not word in word_count: word_count[word]=1 else: word_count[word]=word_count[word] + 1
when says
word_count.keys()
, not sure other calls key in dictionary defined , loaded keys , values into. want understand whyword_count.keys()
there.words=sorted(word_count.keys())
word_count
redefined in couple of locations, , know why instead of creating new variable name suchword_count1
.word_count={} word_count=word_count_dict(filename) ...and in places outlined in 1st question.
does
if len(sys.argv) != 3:
mean if arguments not 3, or characters not 3 (e.g.sys.argv[1]
,sys.argv[2]
,sys.argv[3]
?
thank help!
if
word
not in dictionary, create new entry in dictionary it, , set value number1
, since we've far found 1 occurrence of word. otherwise, retrieve old value dictionary, use+ 1
add 1 value, , put in dictionary entry assigningword_count[word]
. written as:word_count[word] += 1
word_count.keys()
returns list of keys inword_count
dictionary. being used contents of dictionary can printed in alphabetical order, usingsort()
. if printed dictionary way is, words in unpredictable order.the variable not being redefined. variables local each function, each
word_count
different variable. happen use same name in each function, because it's name variable contains.list indexes start
0
,if (len(sys.argv) != 3
checks haveargv[0]
,argv[1]
, ,argv[2]
.argv[0]
contains script name, checking gave 2 arguments script. first argument must either--count
or--topcount
, second argument must filename count words in.
Comments
Post a Comment