BeautifulSoup is a Python library designed for quick turnaround projects like screen-scraping. Three features make it powerful:

  • BeautifulSoup provides few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you want. It doesn’t take much code to write an application.
  • It automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don’t have to think about encodings, unless the document doesn’t specify an encoding and BeautifulSoup can’t detect one. Then you just have to specify the original encoding.
  • Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.
smartspidering

Pre-Requisite

Before scraping, we need :

  1. Python3
  2. BeautifulSoup

    Open command prompt and run command:

    pip install beautifulsoup4
  3. Lxml Parser

    Open command prompt and run command:

    pip install lxml

Let’s scrape with BeautifulSoup!

In this tutorial, we will scrape data from econpy website using BeautifulSoup.

First import all required libraries

smart spidering

Now, we will use get request to fetch html

url = "http://econpy.pythonanywhere.com/ex/001.html"
response = requests.get(url)

check the response status code

print(response.status_code)

You will see we got 200 status code which means we got the content. To get content of response into html variable use

html = response.content
##Intialize beautifulSoup Instance,
soup = BeautifulSoup(html, "lxml") ## we are using lxml parser

Now, to get all the buyers info, Right click on the page and then click on inspect element. The code will be open:

buyer-info
divs = soup.find_all("div", {"title": "buyer-info"})

find_all function will get all the div having title “buyer-info”.  We will use for loop to get each buyer name and price.

buyer name
for div in divs:
    name = div.find("div", {"title": "buyer-name"}).text
    price = div.find("span", {"class": "item-price"}).text
    print(name)
    print(price)

You can see the names and price of buyer printed. So, to fetch data of all buyers on 5 pages you can use while loop.

Source Code available on Github to store data into CSV file.


4 Comments

YeManAung · May 9, 2019 at 4:41 pm

Support and support issues for my account and device accept permission issues.

    Fahad Khalid · May 10, 2019 at 2:05 pm

    Can you send error message so I can check.

    Lura Lsan · May 29, 2019 at 12:39 pm

    Open cmd as an administrator

      Fahad Khalid · May 29, 2019 at 8:22 pm

      Not necessary to open cmd as an administrator.Simply type cmd in windows search and open it

Leave a Reply

Your email address will not be published. Required fields are marked *