This project is a web scraper built with Python and Flask that extracts product data from Amazon, stores it in a MySQL database, and allows you to search for products and view their details. The scraper uses BeautifulSoup for parsing HTML and handles multiple retries and proxy rotations to avoid getting blocked by Amazon.
- Scrape product details including name, price, description, and customer reviews from Amazon.
- Store scraped data in a MySQL database.
- Simple web interface for searching products.
- Rotates user agents and proxies to avoid getting blocked.
- Python 3.x
- Flask
- BeautifulSoup4
- Requests
- MySQL Connector/Python
-
Clone the repository:
git clone https://github.com/yourusername/amazon-crawler.git cd amazon-crawler
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Set up your MySQL database:
CREATE DATABASE product;
-
Configure your database credentials in the code:
Update the following lines in the
search
function:cnx = mysql.connector.connect(user='root', password='Your password', host='localhost', database='product')
-
Run the Flask app:
python app.py
-
Open your browser and navigate to:
http://127.0.0.1:5000/
-
Enter a product name in the search box and submit.
-
The application will scrape Amazon for the product details and store them in the MySQL database.
-
Imports and Setup:
from flask import Flask, render_template, request from bs4 import BeautifulSoup import requests import random import time import mysql.connector app = Flask(__name__)
-
Headers and Proxies:
- Lists of user-agent headers and proxies to avoid getting blocked by Amazon.
-
scrape_amazon
Function:- Scrapes product data from Amazon including name, price, description, and customer reviews.
- Rotates user-agent headers and proxies.
- Handles multiple retries and exceptions.
-
Routes:
/
: Renders the search form./search
: Handles the search form submission, scrapes Amazon, stores data in MySQL, and displays the results.
templates/index.html
- Simple form for entering the product name.
- Table:
product_info
id
: INT, Primary Key, Auto Incrementproduct_description
: VARCHAR(500)price
: VARCHAR(25)category
: VARCHAR(500)customer_name
: VARCHAR(200)rating
: VARCHAR(500)comment
: VARCHAR(5000)
You can use the following SQL commands to interact with the database:
-
Select the database:
USE product;
-
Show tables in the database:
SHOW TABLES;
-
Select all data from the
product_info
table:SELECT * FROM product_info;
-
Drop the
product_info
table:DROP TABLE product_info;
- Make sure to update the
headers_list
andproxies
with valid user-agent strings and proxy addresses. - Be aware of Amazon's terms of service regarding web scraping.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
For any questions or feedback, please contact [email protected].