Skip to content

This project is a web scraper built with Python and Flask that extracts product data from Amazon, stores it in a MySQL database, and allows you to search for products and view their details. The scraper uses BeautifulSoup for parsing HTML and handles multiple retries and proxy rotations to avoid getting blocked by Amazon.

Notifications You must be signed in to change notification settings

Kushagra1taneja/AmazonCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Crawler Project

This project is a web scraper built with Python and Flask that extracts product data from Amazon, stores it in a MySQL database, and allows you to search for products and view their details. The scraper uses BeautifulSoup for parsing HTML and handles multiple retries and proxy rotations to avoid getting blocked by Amazon.

Features

  • Scrape product details including name, price, description, and customer reviews from Amazon.
  • Store scraped data in a MySQL database.
  • Simple web interface for searching products.
  • Rotates user agents and proxies to avoid getting blocked.

Requirements

  • Python 3.x
  • Flask
  • BeautifulSoup4
  • Requests
  • MySQL Connector/Python

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/amazon-crawler.git
    cd amazon-crawler
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate   # On Windows use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r requirements.txt
  4. Set up your MySQL database:

    CREATE DATABASE product;
  5. Configure your database credentials in the code:

    Update the following lines in the search function:

    cnx = mysql.connector.connect(user='root',
                                  password='Your password',
                                  host='localhost',
                                  database='product')
  6. Run the Flask app:

    python app.py

Usage

  1. Open your browser and navigate to:

    http://127.0.0.1:5000/
    
  2. Enter a product name in the search box and submit.

  3. The application will scrape Amazon for the product details and store them in the MySQL database.

Code Overview

app.py

  • Imports and Setup:

    from flask import Flask, render_template, request
    from bs4 import BeautifulSoup
    import requests
    import random
    import time
    import mysql.connector
    
    app = Flask(__name__)
  • Headers and Proxies:

    • Lists of user-agent headers and proxies to avoid getting blocked by Amazon.
  • scrape_amazon Function:

    • Scrapes product data from Amazon including name, price, description, and customer reviews.
    • Rotates user-agent headers and proxies.
    • Handles multiple retries and exceptions.
  • Routes:

    • / : Renders the search form.
    • /search : Handles the search form submission, scrapes Amazon, stores data in MySQL, and displays the results.

HTML Templates

  • templates/index.html
    • Simple form for entering the product name.

Database Schema

  • Table: product_info
    • id: INT, Primary Key, Auto Increment
    • product_description: VARCHAR(500)
    • price: VARCHAR(25)
    • category: VARCHAR(500)
    • customer_name: VARCHAR(200)
    • rating: VARCHAR(500)
    • comment: VARCHAR(5000)

Database Operations

You can use the following SQL commands to interact with the database:

  • Select the database:

    USE product;
  • Show tables in the database:

    SHOW TABLES;
  • Select all data from the product_info table:

    SELECT * FROM product_info;
  • Drop the product_info table:

    DROP TABLE product_info;

Notes

  • Make sure to update the headers_list and proxies with valid user-agent strings and proxy addresses.
  • Be aware of Amazon's terms of service regarding web scraping.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Contact

For any questions or feedback, please contact [email protected].

About

This project is a web scraper built with Python and Flask that extracts product data from Amazon, stores it in a MySQL database, and allows you to search for products and view their details. The scraper uses BeautifulSoup for parsing HTML and handles multiple retries and proxy rotations to avoid getting blocked by Amazon.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published