pip install selenium
pip install bs4
pip install pyautogui
pip install webdrivermanager
pip install matplotlib
pip install pandas
git clone https://github.com/GabrielZuany/PriceComparator.git
Automate the search, table and manipulate several information from different databases with specific structure. In this case, the database was a lot of commercial websites in witch each of them have their own builded HTML and CSS differently to the others. This project came when a realize the time and effort to collect all information manually, not easy and inefficient by the time overview. So, the idea was automate this process.
Basically it works by automating the internet browsing using:
When the connection is stable, the script pulls the HTML page content. Once I have the HTML, I can look for elements that I’m interested to collect about the
current page and store it into a list.
Using Selenium to go through the websites pages, and bs4 to extract all data that I want to, the automating system is almost
100% concluded.
When my product list is finished, I start to build plots (histograms) -Matplotlib- to analyse the collected data.
Then, I read the price column into dataframe (Pandas) and start to manipulate it to obtain important values like mean, median and (future) other info.
At the end, I will have .xlsx files with all product information (Name, price, link…) and two statistical graph to interpret the data.