"AJAX Basics (retiring)" was retired on February 5, 2020. You are now viewing the recommended replacement.
Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
You have completed Scraping Data From the Web!
You have completed Scraping Data From the Web!
Preview
Forms are a big part of many websites. Scrapy provides a FormRequest class for handling them.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
[MUSIC]
0:00
We've managed to make a couple
of spiders that were great for
0:04
sites that don't require interaction.
0:07
But many sites do indeed require
some sort of interaction.
0:10
For example,
logging in to a site with a username and
0:13
password requires a form submission.
0:17
There are many different reasons for
0:20
needing to work with forms when
getting and scraping data.
0:21
Let's head back into our code to
take a look at some techniques.
0:25
Our Horse Land site is
hosted on GitHub pages,
0:30
which doesn't support
backend technologies.
0:34
So we'll be using a bit of a workaround
from Formspree to handle the form posts.
0:37
Check the teacher's notes for additional
information about formspree.io and
0:43
how to get started with that.
0:47
If we'd look at our form page,
we see that it's a pretty simple form
0:48
with just a first name,
last name, and a job title.
0:53
Scrapy has a class called form request,
which allows for form processing.
0:57
And, hold your horses, it's easy to use.
1:02
Let's mosey on over to our code and
create a new spider.
1:05
So I'll create a new file, gonna be a
Python file, and we'll call it formSpider.
1:10
The first form request
will need to be imported.
1:18
So from scrapy.http import FormRequest.
1:21
And we need to import spider.
1:28
Scrapy.spiders.
1:29
import Spider.
1:32
We need to create a new class that
inherits from Spider as our next step.
1:37
Call it FormSpider and, as we've seen,
we need to give our Spider a name.
1:43
We'll just call it horseForm.
1:51
And we define our start URL.
1:55
Which again, is a list.
2:00
What's the URL for our form?
2:02
We'll just cut and paste that in.
2:06
This looks pretty familiar this far,
I think.
2:11
Next we define our parse method and we'll
define the formdata we want to pass in.
2:14
So define parse and formdata.
2:21
Let's go use the developer tools in
the browser to see what the form
2:25
fields are called.
2:28
Come over here, Developer Tools.
2:30
So they're down in here in this form.
2:36
So we have firstname,
Lastname, And jobtitle.
2:46
All lower case and no spaces.
2:54
So we want firstname.
2:58
My first name is Ken.
3:00
Lastname, Alger.
3:04
And jobtitle is Teacher.
3:09
Now we need to return a form
request from response object.
3:15
So return FormRequest.from_response.
3:19
We'll return the response, the form
number on the page we're processing,
3:26
and that's zero based, formnumber,
and then the form data we want.
3:33
So formdata = formdata.
3:40
And then a callback for what to do next.
3:45
So callback.
3:48
We'll make a method
here called after_post.
3:51
This passes the data we
defined into the form and,
3:55
by default, utilizes the submit
button to submit our data.
3:59
Then it will do whatever we
define in the after_post method.
4:04
Here we could do data saving or
data processing or further scraping tasks.
4:08
For now, let's just print out
that the form was processed and
4:14
the response object itself.
4:19
So we'll define after_post, self,
and again, that takes a response.
4:21
We'll print and we'll do
4:26
a little formatting, just so
4:31
we can see it in the terminal.
4:36
And we'll print the response.
4:41
Let's just copy this line here.
4:45
There we go.
4:50
And we can, all right,
let's open a Terminal window,
4:51
Go to our Spiders folder, And
have Scrapy run our crawler.
5:00
We look up here.
5:13
Great, we see that the spider found and
submitted our form.
5:16
In our case here, it was posted
to formspree.io for processing.
5:19
Here's our printed information and
our 200 response code.
5:24
Great, I've included links
in the teacher's notes
5:27
about form request as well.
5:30
I'd encourage you to look at it
as it is a powerful tool for
5:32
processing forms and
can even be used to handle login forms.
5:35
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up