Genius site was updated, lyrics header/contributor info was relocated within a LyricsContainer div, resulting in that header/garbage being prefixed to returned lyrics. Resolved by finding unwanted tags and extracting from html

This commit is contained in:
2025-04-07 11:08:07 -04:00
parent 8958636232
commit fed5307386
2 changed files with 7 additions and 3 deletions

View File

@ -101,6 +101,10 @@ class Genius:
html = BeautifulSoup(htm.unescape(scrape_text).replace('<br/>', '\n'), "html.parser")
header_tags: Optional[ResultSet] = html.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'span'])
if header_tags:
for tag in header_tags:
tag.extract()
divs: Optional[ResultSet] = html.find_all("div", {"data-lyrics-container": "true"})
if not divs: