PriceComparator

Price Comparator using Web Scraping with Python

Getting Started

The reason behind the project:

Automate the search, table and manipulate several information from different databases with specific structure. In this case, the database was a lot of commercial websites in witch each of them have their own builded HTML and CSS differently to the others. This project came when a realize the time and effort to collect all information manually, not easy and inefficient by the time overview. So, the idea was automate this process.

How it works?

Basically it works by automating the internet browsing using:

  1. Selenium and Webdriver to open browser and click or submit.
  2. BeautifulSoup to get html page, after establish a stable connection (Response ~200).
  3. Pandas and Matplotlib to analyze and build plot with the collected data.

When the connection is stable, the script pulls the HTML page content. Once I have the HTML, I can look for elements that I’m interested to collect about the current page and store it into a list.
Using Selenium to go through the websites pages, and bs4 to extract all data that I want to, the automating system is almost 100% concluded.
When my product list is finished, I start to build plots (histograms) -Matplotlib- to analyse the collected data.
Then, I read the price column into dataframe (Pandas) and start to manipulate it to obtain important values like mean, median and (future) other info.

At the end, I will have .xlsx files with all product information (Name, price, link…) and two statistical graph to interpret the data.

Histograms:





XLSX files:





Last Updates:

Additional Information: