Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Regular Expressions in Python Introduction to Regular Expressions Word Length

Shawndee Boyd
Shawndee Boyd
6,002 Points

Create a function named find_words that takes a count and a string

I have been stuck on this for a while and maybe I'm not understanding what they are asking. Please help!

word_length.py
import re

# EXAMPLE:
# >>> find_words(4, "dog, cat, baby, balloon, me")
# ['baby', 'balloon']

def find_words(count, string):
    count = str(6)
    return re.findall(r'/w{'+ count + ', }', string)

11 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,457 Points

Hey Shawndee, you are very close. The challenge asks:

Create a function named find_words that takes a count and a string. Return a list of all of the words in the string that are count word characters long or longer.

Your regular expression needed two fixes: change /w to \w, and remove space later part of string changing ', }' to ',}'.

The updated line is now:.

    return re.findall(r'\w{'+ str(count) + ',}', string) #<-- cast count as string

An alternate to using the range specifiers "{}", you can multiply the character search \w by count then check for optional additional characters with \w* as shown below:

    return re.findall(r'\w' * count + '\w*', string)

Finally you could also use the string format method:

def find_words(cnt,s):
    return re.findall(r'\w{{{:d},}}'.format(cnt), s)

Note the need to escape the braces to differentiate between re braces and format braces.

Zachary Vacek
Zachary Vacek
11,133 Points

Hey Chris,

when I tried you're suggestion:

return re.findall(r'\w{'+ count + ',}', string)

I got an error: TypeError: Can't convert 'int' object to str implicitly.

After searching around, I landed on this solution:

return re.findall(r'\w{'+ str(count) + ',}', string)

It works, but I am wondering if there is a better way to do it?

Thanks!

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Thanks for the correction. I've updated my first solution to cast count as string. The second answer works correctly as is.

The funny part, is when I tested the my original first solution, I had included your code count = str(6), which evidently is the correct answer.

Not sure what you mean by is there a better way? Do you mean an alternative regular expression, or some other string parsing approach?

I am curious what you had in mind.

There are certainly other ways such as splitting the input string on spaces, then looping over each word and checking it's length. Using regular expressions may seem complex but re is some of the most optimized library code available built explicitly for parsing strings.

Julie Pham
Julie Pham
12,290 Points

return re.findall(r'\w{{{:d},}}'.format(cnt), s)

Why there are 2 pair of {} cover {:d},? Why one pair of {} is not right?

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

A single-{} is interpreted by the format method as field marker to be replaced. The double-{} are used to "escape" this substitution. At parsing, format replaces a double-{} with a single-{} without doing a field substitution. The regex can then interpret the formatted string containing the remaining single-{}. Hope this helps.

:d for ?

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

The :d indicates to the format() method to format as a "decimal" integer (not as a float)

Hi Chris,

If we didn't need to add the comma, after count would it just be '\w{{:d}}'

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

If the comma isn’t needed to set a range of values, it would be triple curly brackets:

    return re.findall(r'\w{{{:d}}}'.format(cnt), s)

The outer two curly brackets result in one printed curly bracket. The inner curly brackets mark the substitution field that is consumed in the variable substitution for the field.

Post back if you need more help. Good luck!!!

So format always goes from outside to inside when escaping {}'s correct?

so format see's the outside {{}}'s and replaces it with just {} so we're left with {{:d},}

and then format replaces {:d} with count so we're left with {count,} and then the regex can evaluate {count,}

I guess what;s still really hindering my understand of this is why doesn't format escape again when it sees {{:d},}... shouldn't this trigger format to "simplify" the brackets again and only be left with {:d},? Or does format only escape once meaning if I have {{{{}}}}, format will remove only the outer most brackets and keep the rest

I'm sorry if my rambling is hard to understand.. my mind is jumbled from spending way too much time trying to understand this.

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

It’s not so much as outside-in as it is {{ become a { ignored by format. Same goes for }} is seen as an ignored }

When parsing, format does a single pass. to look for fields to replace. If it sees and opening { it thinks, “Ooh, here’s a field”. But if it immediately sees another { it says “dang, the pair is just the printable char {, keep looking for start of field”. Once an actual start of field is detected, it begins looking for the end of field, ignoring double }} in a similar manner.

{ and } tagged as part of a field will be replaced with the field contents.

Grzegorz Gancarczyk
Grzegorz Gancarczyk
10,064 Points

This is what I came up with and it passed:

def find_words(count, string):
  re_string = re.findall(r'\w+' * count, string)
  return re_string

However, I have a question here. We we've been said that count is a number, which means it has to be an int. Adding this:

count = int(count)

won't break anything. What is wrong with this though?

def(int(count), string)

I guess it is a very basic or kind of stupid question. I know it won't be useful to add this, or anything, but I'm just curious.

[edited format --cf]

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

It is allowable to pass a function as an argument. In fact, some functions require that the first argument is a function such as map.

In your example, it is simply illegal syntax. If it were legal, the issue with using int(count) as an argument is that it would be evaluated at compile time and then try to used the non-mutable return value: (an integer) as the argument variable. :

def find_words(int(count), string): #<-- illegal syntax
    pass

# given the argument of count=5, turns into 

def find_words(5, string):
    pass 

You could, of course, process the arguments on the call:

find_words(int(count), string)

# which is the same as:
find_words(5, string)
Grzegorz Gancarczyk
Grzegorz Gancarczyk
10,064 Points

Thank you, sir. It makes sense now.

hector villasano
hector villasano
12,937 Points

very interesting and straight forward approach, compared to every one else in the blog !!

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Interesting solution. It works because the first w+ catches all the characters, then the remaining "cnt - 1" w+ insure the minimum cnt is reached, each absorbing a single character.

This solution is slightly less efficient because the regex parser has to backtrack. But that's tecnical minutiae.

What is the difference with the following? In my eyes, we do the same but mine doesn't work.

re.findall(r'\w{count,}\s', string)

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Katherina Kallis the characters “count” are seen as the literal individual ascii letters “c”, “o”, “u”, “n”, “t” in the pattern and not as the variable label count.

this also works

def find_words(cnt,s):
    p=r'\w{%i,}'%cnt
    return re.findall(p,s)

[edited format --cf]

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Correct, but I would use the string format method:

def find_words(cnt,s):
    return re.findall(r'\w{{{:d},}}'.format(cnt), s)

Note the need to escape the braces to differentiate between re braces and format braces.

Can you explain why when you use the percent sign method of adding the count to p it works but when you just put count in itself it gives an error? I've got mine working but I'm puzzled as to why it works like that.

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Mason Preusser : The percent method reformats the integer as a string. Using count directly could give you a error trying to concatenate ints and strings depending on your code. Can you post you failing version?

Sure thing Chris,

import re
def find_words(count, stringy):
    p = r'\w{count,}'
    s = stringy
    return re.findall(p, s)

Now that I look at it and remember the way to modify strings it makes more sense.

[edit formatting -cf]

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Mason, in your case, the characters "count" are being interpreted as the 5 literal string characters and not a the variable name. In this context a "c" following a left brace is not valid regex syntax.

To include the variable count in the regular expression and an in-line fashion look at my first solution in the excepted answer above.

what %i means

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

In the string "r'\w{%i,}' % cnt", the %i is part of the older string formatting syntax.

%i is a field placeholder for an integer.

Mario Ibanez
Mario Ibanez
10,644 Points

Why doesn't this work? :

def find_words(count, input_str):
  return re.findall(r'\w{count,}', input_str)

[edit formatting --cf]

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

"count" in your code is a interpreted as characters instead of a variable count. See my response to Mason above.

luke hammer
luke hammer
25,513 Points

I'm attempting to solve this also and i am confused on some behavior

def find_words(count, arg):
  regex = "r{}{}{}".format("'\w{",count,"}\w*'")
  return re.findall(regex,arg)

however when i put this

regex = "r{}{}{}".format("'\w{",count,"}\w*'")

note i set count = 3 for testing

the result is regex is regex = "r'\\w{3}\\w*'"

why the extra '\' before the 'w' ?

I'm attempting to get

regex to equal r'\w{3}\w*'

can anyone help??

[edit: added backticks to show \\ -cf]

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

The "r" is used to declare the string to be a raw string. It's not part of the regex and should be outside the quotes.

Moving the "r" outside, then removing the unnecessary single quotes in the format arguments, gives following passing code:

def find_words(count, arg):
  regex = r"{}{}{}".format("\w{",count,"}\w*")
  return re.findall(regex,arg)

Note, you will always see the double \\ when inspecting variables. It's the python shell's why of showing it's a literal backslash. The second slash is not really there:

In [74]: count = 3

In [75]: regex = r"{}{}{}".format("\w{",count,"}\w*")

In [76]: regex

Out[76]: '\\w{3}\\w*'

In [77]: print regex
\w{3}\w*

In [78]: len(regex)

Out[78]: 8

<p>This worked for me</p>

def find_words(count,string):
  pattern = (r'\w{%d,}') % (count)
  return re.findall(pattern,string)
Dana Kennedy
Dana Kennedy
6,711 Points

I know I'm late to the party, but can someone explain one thing? In the following answer, and I'm asking about this one because it seems to be the answer the instructor is looking for, before the string variable you have this: ',}', I do not understand what the comma before the parentheses does. If you take out the count variable, which would include + str(count) + ' and replaced it will a whole number like 6, you'd have {6,} Here is the full string i'm referring to, thank you in advance for any help.

return re.findall(r'\w{'+ str(count) + ',}', string)
Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

When using curly braces "{ }" to specify a count range for the preceding regex token, the first number is the minimum and the second is the maximum allowed to match. If the second number is omitted it means "or more". So {6,} means 6 or more.

Dana Kennedy
Dana Kennedy
6,711 Points

That was driving me crazy, Chris. Thank you for the response!

Yuda Leh
Yuda Leh
7,618 Points

This is what I came up with it passes but I am not sure if this is the best way after reading some of the other posts. Code:

def find_words(count, string):
    return re.findall(r"\w+" * count, string)

Also, am I getting this right? \w to find all the Unicode character, such as "dog, cat, baby, balloon, me" \w+ says there are at least one or more characters then we take the \w+ multiple by say for example 4 returns -> baby (\w+\w+\w+\w+) and balloon(\w+\w+\w+\w+ plus longer than it)

Chris Freeman
Chris Freeman
Treehouse Moderator 68,457 Points

Your regex works but it looses performance with unnecessary extra "+" each causing extra backtracking during matching.

Andrew Krause
Andrew Krause
10,679 Points

What if the count test went in as 7? Then it would be incorrect. You shouldn't have to explicitly set the count number.