BeautifulSoup code optimization (Python)

I have code that uses the library BeautifulSoupfor parsing, but it is very slow. The code is written in such a way that threads cannot be used. Can anyone help me with this?

I use BeautifulSoupfor parsing and saving to the database. If I comment on the instructions save, it will take a lot of time, so there is no problem with the database.

def parse(self,text):                
    soup = BeautifulSoup(text)
    arr = soup.findAll('tbody')                

    for i in range(0,len(arr)-1):
        data=Data()
        soup2 = BeautifulSoup(str(arr[i]))
        arr2 = soup2.findAll('td')

        c=0
        for j in arr2:                                       
            if str(j).find("<a href=") > 0:
                data.sourceURL = self.getAttributeValue(str(j),'<a href="')
            else:  
                if c == 2:
                    data.Hits=j.renderContents()

            #and few others...

            c = c+1

            data.save()

Any suggestions?

Note: I already asked this question here , but was closed due to incomplete information.

+5
source share
1 answer
soup2 = BeautifulSoup(str(arr[i]))
arr2 = soup2.findAll('td')

: arr2 = arr[i].findAll('td').


:

if str(j).find("<a href=") > 0:
    data.sourceURL = self.getAttributeValue(str(j),'<a href="')

, getAttributeValue href, :

a = j.find('a', href=True)       #find first <a> with href attribute
if a:
    data.sourceURL = a['href']
else:
    #....

, BeautifulSoup , , , - . find findAll , , find/findAll/etc. .

+6

All Articles