download#
- pythonning.web.download.clear_download_cache()#
Delete any file that might have been cached since multiple sessions.
- pythonning.web.download.download_file(url: str, target_file: Path, use_cache: bool = False, step_callback: Callable[[int, int, int], object] | None = None, user_agent: str = 'Mozilla/5.0')#
Download a single file from the web at the given url and display download progress in terminal.
You can cache the result when you know that you may call this function multiple time for the same url.
Tip to download Google Drive files:
Extract the ID from whatever url you got and replace it in the following:
https://drive.usercontent.google.com/download?id=YOURID&export=download&authuser=0&confirm=t
- Parameters:
url – url to download from, ensure it’s a file.
target_file – filesytem path of the file to download
use_cache – True to use the cached downloaded file. Will create it the first time.
step_callback – function called everytime the download progress one step. Arguments for the function are (block_number, block_size, total_size)
user_agent – change the User-Agent header to fake the browser used for the connection
- pythonning.web.download.download_file_smart(url: str, target_dir: Path, use_cache: bool = False, step_callback: Callable[[int, int, int], object] | None = None, user_agent: str = 'Mozilla/5.0') Path #
Same as
download_file()
but the function guess the target filename from the url.- Parameters:
url – url to download from, ensure it’s a file.
target_dir – filesytem path to an existing directory to downlaod the file to.
use_cache – True to use the cached downloaded file. Will create it the first time.
step_callback – function called everytime the download progress one step. Arguments for the function are (block_number, block_size, total_size)
user_agent – change the User-Agent header to fake the browser used for the connection
- Returns:
filesystem path of the downloaded file
- pythonning.web.download.get_url_content_type(url, **kwargs) str #
Retrieve the Content-Type the given url header response.
Examples:
image/jpeg image/svg+xml text/html; charset=utf-8 application/octet-stream
References
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type
[2] http://www.iana.org/assignments/media-types/media-types.xhtml
- Parameters:
url – a valid web url whose request should return a header
kwargs – passed to
urllib.request.urlopen()
- pythonning.web.download.get_url_filename(url, **kwargs) str #
Retrieve the filename from the given url header response.
Not all urls might define a filename in their header and the function will raise in that case.
References
- Parameters:
url – a valid web url whose request should return a header
kwargs – passed to
urllib.request.urlopen()
- pythonning.web.download.guess_url_filename(url: str, **kwargs) str #
Try to find the most plausible filename from the given filename.
We try in the order:
find it from the header’s Content-Disposition filename attribute.
extract a filename from the url string last component
if the filename doesn’t have a file extension, try to guess it from the header’s Content-Type attribute
If you know both of this method would fail, you can at least try to get the file extension with
get_url_content_type()
The returned output robustness widly depends on the url you gave.
- Parameters:
url – a valid web url
kwargs – passed to
urllib.request.urlopen()
- Returns:
a file name that may not have a file extension