uweziegenhagen.de » Blog Archive » Mit Python PDFs herunterladen

Mit Python PDFs herunterladen

2019-11-24, 21:20

Mit dem folgenden Python-Skript lassen sich auf einfache Weise alle PDFs von einer Webseite herunterladen.

from bs4 import BeautifulSoup
import urllib.request
import requests
 
url = 'http://irgendeineurl.de'
 
r  = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
 
for link in soup.find_all('a'):
    if link.get('href').endswith('.pdf'):
        urllib.request.urlretrieve(url + link.get('href'), link.get('href'))
        print(url + link.get('href'))

Uwe

Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined. Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.

More Posts - Website

Schlagwörter: Python, BS4
Category: Python / SciPy / pandas

Entries (RSS) and Comments (RSS). Valid XHTML and CSS.
Powered by WordPress and Fluid Blue theme.

Durch die weitere Nutzung der Seite stimmst du der Verwendung von Cookies zu. Weitere Informationen