Don't understand word boundaries

Question

Hello,

Could someone explain the concept of word boundary

print(re.findall(r'@[-\w\d.]*[^gov\t]', data))
print(re.findall(r'\b@[-\w\d.]*[^gov\t]\b', data))

These are two different results I got.

>> ['@teamtreehouse.com', '@kennethlove\n', '@teamtreehouse.com', '@camelot.co.uk', '@norrbotten.co.se', '@sverik\n', '@killerrabbit.com', '@teamtreehouse.com', '@ryancarson\n', '@tardis.
co.uk', '@example.com', '@example\n', '@us.', '@potus44\n', '@teamtreehouse.com', '@chalkers\n', '@empire.', '@darthvader\n', '@spain.']                                                
>> ['@teamtreehouse.com', '@teamtreehouse.com', '@camelot.co.uk', '@norrbotten.co.se', '@killerrabbit.com', '@teamtreehouse.com', '@tardis.co.uk', '@example.com', '@us.', '@teamtreehouse.
com', '@empire.', '@spain.']

Could someone explain why we have to use \b in the front and the back?

Answer 1 · 2017-04-14T19:16:04Z

April 14, 2017 7:16pm

A word boundary \b says "in this place, a word character is expected." In your second example, this means a word character is expected before the "@" and the last matching character must proceed a word character.

Since some of the matches in the first group end in a newline character "\n", they will be rejected by the second pattern.

The boundary character is an anchor that says a word character must be hear but it doesn't "consume" the character into the results. You may think of it like a word match character "\w" that doesn't hold on to the match results.

Answer 2 · 2019-03-10T01:34:19Z

March 10, 2019 1:34am

For anyone else reading this question, I can understand how the code shown in this exercise appears confusing. When Kenneth defined a word boundary in the Escape Hatches video, he specifically said a word boundary is, quote, "It's the edges of a word, defined by white space or the edges of a screen."

This definition may be misleading because it suggests that a word boundary cannot existing between two non-white space characters in a string. However, a word boundary can in fact exist under such circumstances, as one source notes that a word boundary can occur "between two characters in the string, where one is a word character and the other is not a word character."

So in the case of an email address such as "sender@address.com", all of the characters up until the "@" symbol are word characters, while the @ symbol itself is not a word character. Thus, the "gap" between "sender" and "@" constitutes a word boundary.

Answer 3 · 2017-07-08T13:24:46Z

July 8, 2017 1:24pm

Ohhhhh, ok- I was wrong... thanks for steering me to the correct answer, Chris!!

Welcome to the Treehouse Community

Looking to learn something new?

Lingjian Kong

Lingjian Kong

Don't understand word boundaries

3 Answers

Chris Freeman

Chris Freeman

ds1

ds1

Chris Freeman

Chris Freeman

Bronson Avila

Bronson Avila

Chris Freeman

Chris Freeman

ds1

ds1