Genius site was updated, lyrics header/contributor info was relocated within a LyricsContainer div, resulting in that header/garbage being prefixed to returned lyrics. Resolved by finding unwanted tags and extracting from html
This commit is contained in:
@ -101,6 +101,10 @@ class Genius:
|
||||
|
||||
|
||||
html = BeautifulSoup(htm.unescape(scrape_text).replace('<br/>', '\n'), "html.parser")
|
||||
header_tags: Optional[ResultSet] = html.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'span'])
|
||||
if header_tags:
|
||||
for tag in header_tags:
|
||||
tag.extract()
|
||||
divs: Optional[ResultSet] = html.find_all("div", {"data-lyrics-container": "true"})
|
||||
|
||||
if not divs:
|
||||
|
Reference in New Issue
Block a user