Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialShawndee Boyd
6,002 PointsCreate a function named find_words that takes a count and a string
I have been stuck on this for a while and maybe I'm not understanding what they are asking. Please help!
import re
# EXAMPLE:
# >>> find_words(4, "dog, cat, baby, balloon, me")
# ['baby', 'balloon']
def find_words(count, string):
count = str(6)
return re.findall(r'/w{'+ count + ', }', string)
11 Answers
Chris Freeman
Treehouse Moderator 68,457 PointsHey Shawndee, you are very close. The challenge asks:
Create a function named find_words
that takes a count and a string. Return a list of all of the words in the string that are count
word characters long or longer.
Your regular expression needed two fixes: change /w
to \w
, and remove space later part of string changing ', }'
to ',}'
.
The updated line is now:.
return re.findall(r'\w{'+ str(count) + ',}', string) #<-- cast count as string
An alternate to using the range specifiers "{}", you can multiply the character search \w
by count then check for optional additional characters with \w*
as shown below:
return re.findall(r'\w' * count + '\w*', string)
Finally you could also use the string format method:
def find_words(cnt,s):
return re.findall(r'\w{{{:d},}}'.format(cnt), s)
Note the need to escape the braces to differentiate between re braces and format braces.
Grzegorz Gancarczyk
10,064 PointsThis is what I came up with and it passed:
def find_words(count, string):
re_string = re.findall(r'\w+' * count, string)
return re_string
However, I have a question here. We we've been said that count is a number, which means it has to be an int. Adding this:
count = int(count)
won't break anything. What is wrong with this though?
def(int(count), string)
I guess it is a very basic or kind of stupid question. I know it won't be useful to add this, or anything, but I'm just curious.
[edited format --cf]
Chris Freeman
Treehouse Moderator 68,457 PointsIt is allowable to pass a function as an argument. In fact, some functions require that the first argument is a function such as map
.
In your example, it is simply illegal syntax. If it were legal, the issue with using int(count)
as an argument is that it would be evaluated at compile time and then try to used the non-mutable return value: (an integer) as the argument variable. :
def find_words(int(count), string): #<-- illegal syntax
pass
# given the argument of count=5, turns into
def find_words(5, string):
pass
You could, of course, process the arguments on the call:
find_words(int(count), string)
# which is the same as:
find_words(5, string)
Grzegorz Gancarczyk
10,064 PointsThank you, sir. It makes sense now.
hector villasano
12,937 Pointsvery interesting and straight forward approach, compared to every one else in the blog !!
Philippe Mitton
iOS Development Techdegree Student 22,877 PointsThis worked for me:
def find_words(cnt, string): return re.findall(r'\w+' * cnt, string)
[edit formatting -cf]
Chris Freeman
Treehouse Moderator 68,457 PointsInteresting solution. It works because the first w+
catches all the characters, then the remaining "cnt - 1" w+
insure the minimum cnt
is reached, each absorbing a single character.
This solution is slightly less efficient because the regex parser has to backtrack. But that's tecnical minutiae.
Katherina Kallis
3,385 PointsWhat is the difference with the following? In my eyes, we do the same but mine doesn't work.
re.findall(r'\w{count,}\s', string)
Chris Freeman
Treehouse Moderator 68,457 PointsKatherina Kallis the characters “count” are seen as the literal individual ascii letters “c”, “o”, “u”, “n”, “t” in the pattern and not as the variable label count
.
Robert Erick
24,909 Pointsthis also works
def find_words(cnt,s):
p=r'\w{%i,}'%cnt
return re.findall(p,s)
[edited format --cf]
Chris Freeman
Treehouse Moderator 68,457 PointsCorrect, but I would use the string format method:
def find_words(cnt,s):
return re.findall(r'\w{{{:d},}}'.format(cnt), s)
Note the need to escape the braces to differentiate between re braces and format braces.
Mason Preusser
2,279 PointsCan you explain why when you use the percent sign method of adding the count to p it works but when you just put count in itself it gives an error? I've got mine working but I'm puzzled as to why it works like that.
Chris Freeman
Treehouse Moderator 68,457 PointsMason Preusser : The percent method reformats the integer as a string. Using count directly could give you a error trying to concatenate ints and strings depending on your code. Can you post you failing version?
Mason Preusser
2,279 PointsSure thing Chris,
import re
def find_words(count, stringy):
p = r'\w{count,}'
s = stringy
return re.findall(p, s)
Now that I look at it and remember the way to modify strings it makes more sense.
[edit formatting -cf]
Chris Freeman
Treehouse Moderator 68,457 PointsMason, in your case, the characters "count" are being interpreted as the 5 literal string characters and not a the variable name. In this context a "c" following a left brace is not valid regex syntax.
To include the variable count in the regular expression and an in-line fashion look at my first solution in the excepted answer above.
Abdillah Hasny
Courses Plus Student 3,886 Pointswhat %i means
Chris Freeman
Treehouse Moderator 68,457 PointsIn the string "r'\w{%i,}' % cnt
", the %i
is part of the older string formatting syntax.
%i
is a field placeholder for an integer.
Mario Ibanez
10,644 PointsWhy doesn't this work? :
def find_words(count, input_str):
return re.findall(r'\w{count,}', input_str)
[edit formatting --cf]
Chris Freeman
Treehouse Moderator 68,457 Points"count" in your code is a interpreted as characters instead of a variable count
. See my response to Mason above.
luke hammer
25,513 PointsI'm attempting to solve this also and i am confused on some behavior
def find_words(count, arg):
regex = "r{}{}{}".format("'\w{",count,"}\w*'")
return re.findall(regex,arg)
however when i put this
regex = "r{}{}{}".format("'\w{",count,"}\w*'")
note i set count = 3 for testing
the result is
regex is regex = "r'\\w{3}\\w*'"
why the extra '\' before the 'w' ?
I'm attempting to get
regex to equal r'\w{3}\w*'
can anyone help??
[edit: added backticks to show \\
-cf]
Chris Freeman
Treehouse Moderator 68,457 PointsThe "r" is used to declare the string to be a raw string. It's not part of the regex and should be outside the quotes.
Moving the "r" outside, then removing the unnecessary single quotes in the format arguments, gives following passing code:
def find_words(count, arg):
regex = r"{}{}{}".format("\w{",count,"}\w*")
return re.findall(regex,arg)
Note, you will always see the double \\
when inspecting variables. It's the python shell's why of showing it's a literal backslash. The second slash is not really there:
In [74]: count = 3
In [75]: regex = r"{}{}{}".format("\w{",count,"}\w*")
In [76]: regex
Out[76]: '\\w{3}\\w*'
In [77]: print regex
\w{3}\w*
In [78]: len(regex)
Out[78]: 8
Bikram Mann
Courses Plus Student 12,647 Points<p>This worked for me</p>
def find_words(count,string):
pattern = (r'\w{%d,}') % (count)
return re.findall(pattern,string)
Dana Kennedy
6,711 PointsI know I'm late to the party, but can someone explain one thing? In the following answer, and I'm asking about this one because it seems to be the answer the instructor is looking for, before the string variable you have this: ',}', I do not understand what the comma before the parentheses does. If you take out the count variable, which would include + str(count) + ' and replaced it will a whole number like 6, you'd have {6,} Here is the full string i'm referring to, thank you in advance for any help.
return re.findall(r'\w{'+ str(count) + ',}', string)
Chris Freeman
Treehouse Moderator 68,457 PointsWhen using curly braces "{ }
" to specify a count range for the preceding regex token, the first number is the minimum and the second is the maximum allowed to match. If the second number is omitted it means "or more". So {6,} means 6 or more.
Dana Kennedy
6,711 PointsThat was driving me crazy, Chris. Thank you for the response!
Yuda Leh
7,618 PointsThis is what I came up with it passes but I am not sure if this is the best way after reading some of the other posts. Code:
def find_words(count, string):
return re.findall(r"\w+" * count, string)
Also, am I getting this right? \w to find all the Unicode character, such as "dog, cat, baby, balloon, me" \w+ says there are at least one or more characters then we take the \w+ multiple by say for example 4 returns -> baby (\w+\w+\w+\w+) and balloon(\w+\w+\w+\w+ plus longer than it)
Chris Freeman
Treehouse Moderator 68,457 PointsYour regex works but it looses performance with unnecessary extra "+" each causing extra backtracking during matching.
Andrew Krause
10,679 PointsWhat if the count test went in as 7? Then it would be incorrect. You shouldn't have to explicitly set the count number.
Zachary Vacek
11,133 PointsZachary Vacek
11,133 PointsHey Chris,
when I tried you're suggestion:
return re.findall(r'\w{'+ count + ',}', string)
I got an error: TypeError: Can't convert 'int' object to str implicitly.
After searching around, I landed on this solution:
return re.findall(r'\w{'+ str(count) + ',}', string)
It works, but I am wondering if there is a better way to do it?
Thanks!
Chris Freeman
Treehouse Moderator 68,457 PointsChris Freeman
Treehouse Moderator 68,457 PointsThanks for the correction. I've updated my first solution to cast count as string. The second answer works correctly as is.
The funny part, is when I tested the my original first solution, I had included your code
count = str(6)
, which evidently is the correct answer.Not sure what you mean by is there a better way? Do you mean an alternative regular expression, or some other string parsing approach?
I am curious what you had in mind.
There are certainly other ways such as splitting the input string on spaces, then looping over each word and checking it's length. Using regular expressions may seem complex but
re
is some of the most optimized library code available built explicitly for parsing strings.Julie Pham
12,290 PointsJulie Pham
12,290 Pointsreturn re.findall(r'\w{{{:d},}}'.format(cnt), s)
Why there are 2 pair of {} cover {:d},? Why one pair of {} is not right?
Chris Freeman
Treehouse Moderator 68,457 PointsChris Freeman
Treehouse Moderator 68,457 PointsA single-{} is interpreted by the
format
method as field marker to be replaced. The double-{} are used to "escape" this substitution. At parsing,format
replaces a double-{} with a single-{} without doing a field substitution. The regex can then interpret the formatted string containing the remaining single-{}. Hope this helps.Abdillah Hasny
Courses Plus Student 3,886 PointsAbdillah Hasny
Courses Plus Student 3,886 Points:d for ?
Chris Freeman
Treehouse Moderator 68,457 PointsChris Freeman
Treehouse Moderator 68,457 PointsThe
:d
indicates to theformat()
method to format as a "decimal" integer (not as a float)Timothy Tseng
3,292 PointsTimothy Tseng
3,292 PointsHi Chris,
If we didn't need to add the comma, after count would it just be '\w{{:d}}'
Chris Freeman
Treehouse Moderator 68,457 PointsChris Freeman
Treehouse Moderator 68,457 PointsIf the comma isn’t needed to set a range of values, it would be triple curly brackets:
return re.findall(r'\w{{{:d}}}'.format(cnt), s)
The outer two curly brackets result in one printed curly bracket. The inner curly brackets mark the substitution field that is consumed in the variable substitution for the field.
Post back if you need more help. Good luck!!!
Timothy Tseng
3,292 PointsTimothy Tseng
3,292 PointsSo format always goes from outside to inside when escaping {}'s correct?
so format see's the outside {{}}'s and replaces it with just {} so we're left with {{:d},}
and then format replaces {:d} with count so we're left with {count,} and then the regex can evaluate {count,}
I guess what;s still really hindering my understand of this is why doesn't format escape again when it sees {{:d},}... shouldn't this trigger format to "simplify" the brackets again and only be left with {:d},? Or does format only escape once meaning if I have {{{{}}}}, format will remove only the outer most brackets and keep the rest
I'm sorry if my rambling is hard to understand.. my mind is jumbled from spending way too much time trying to understand this.
Chris Freeman
Treehouse Moderator 68,457 PointsChris Freeman
Treehouse Moderator 68,457 PointsIt’s not so much as outside-in as it is {{ become a { ignored by
format
. Same goes for }} is seen as an ignored }When parsing,
format
does a single pass. to look for fields to replace. If it sees and opening { it thinks, “Ooh, here’s a field”. But if it immediately sees another { it says “dang, the pair is just the printable char {, keep looking for start of field”. Once an actual start of field is detected, it begins looking for the end of field, ignoring double }} in a similar manner.{ and } tagged as part of a field will be replaced with the field contents.