Skip to content

raazgupta/Beautiful-Soup-Example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

This file explains how to run Beautiful Soup and extract content from a website.

Follow these steps until things go horribly wrong or you reach climax and a satisfactory 
conclusion:

1. First make sure that you have Python installed on your machine. 
a. If you are using a Mac, you should already have it by default and all you need to do 
is pull up the command prompt and type 'python'
b. If you are using Windows, go here - http://www.python.org/download/ and follow 
the instructions for setup. Once setup is complete, you will need to pull up a 
command prompt from Start -> Run -> cmd. Then CD to the directory where Phython 
is installed and type 'python'
c. If you are using Linux, you should not be reading this file!

2. After you type 'python' and you don't see some horrible error but something like:

>>>
Then type;

>>> print "Hello World!"

If the output is 'Hello World!' Then you are good to go to the next step. If not, you need
to Google the shit out of any errors that are displayed. 

3. Installing Beautiful Soup. Now we are getting close to having fun. You need to go to
this website - http://www.crummy.com/software/BeautifulSoup/download/3.x/ and download
BeautifulSoup-3.2.0.tar.gz. Unpack the file. In it you will find a file named 
BeautifulSoup.py. Copy and paste it in your 'Python/Lib' folder. In my case, I copied and 
pasted it in C:\Pythong27\Lib\. This will differ based on where your Python folder is located.

4. Almost there! Now bring up that command prompt again and type 'python' then the following:
>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> doc = ['<html><head><title>Page Title</title></head></html>']
>>> soup = BeautifulSoup(''.join(doc))
>>> print soup.prettify()

If you get the output, then you are good to go on to the next step:
<html>
	<head>
		<title>
			Page Title
		</title>
	</head>
</html>

5. Now to parse an actual website. This is where the fun begins. Go to the file - 
https://github.com/raazgupta/Beautiful-Soup-Example/blob/master/google-news-query.py, 
save it on your computer, read the comments carefully and then
run it using python on your machine. Enjoy the results. And then realize that this is just the
tip of the iceberg!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages