download#

pythonning.web.download.clear_download_cache()#

Delete any file that might have been cached since multiple sessions.

pythonning.web.download.download_file(url: str, target_file: Path, use_cache: bool = False, step_callback: Callable[[int, int, int], object] | None = None, user_agent: str = 'Mozilla/5.0')#

Download a single file from the web at the given url and display download progress in terminal.

You can cache the result when you know that you may call this function multiple time for the same url.

Tip to download Google Drive files:

Extract the ID from whatever url you got and replace it in the following: https://drive.usercontent.google.com/download?id=YOURID&export=download&authuser=0&confirm=t

Parameters:
  • url – url to download from, ensure it’s a file.

  • target_file – filesytem path of the file to download

  • use_cache – True to use the cached downloaded file. Will create it the first time.

  • step_callback – function called everytime the download progress one step. Arguments for the function are (block_number, block_size, total_size)

  • user_agent – change the User-Agent header to fake the browser used for the connection

pythonning.web.download.download_file_smart(url: str, target_dir: Path, use_cache: bool = False, step_callback: Callable[[int, int, int], object] | None = None, user_agent: str = 'Mozilla/5.0') Path#

Same as download_file() but the function guess the target filename from the url.

Parameters:
  • url – url to download from, ensure it’s a file.

  • target_dir – filesytem path to an existing directory to downlaod the file to.

  • use_cache – True to use the cached downloaded file. Will create it the first time.

  • step_callback – function called everytime the download progress one step. Arguments for the function are (block_number, block_size, total_size)

  • user_agent – change the User-Agent header to fake the browser used for the connection

Returns:

filesystem path of the downloaded file

pythonning.web.download.get_url_content_type(url, **kwargs) str#

Retrieve the Content-Type the given url header response.

Examples:

image/jpeg
image/svg+xml
text/html; charset=utf-8
application/octet-stream

References

Parameters:
  • url – a valid web url whose request should return a header

  • kwargs – passed to urllib.request.urlopen()

pythonning.web.download.get_url_filename(url, **kwargs) str#

Retrieve the filename from the given url header response.

Not all urls might define a filename in their header and the function will raise in that case.

References

Parameters:
  • url – a valid web url whose request should return a header

  • kwargs – passed to urllib.request.urlopen()

pythonning.web.download.guess_url_filename(url: str, **kwargs) str#

Try to find the most plausible filename from the given filename.

We try in the order:

  1. find it from the header’s Content-Disposition filename attribute.

  2. extract a filename from the url string last component

    • if the filename doesn’t have a file extension, try to guess it from the header’s Content-Type attribute

If you know both of this method would fail, you can at least try to get the file extension with get_url_content_type()

The returned output robustness widly depends on the url you gave.

Parameters:
  • url – a valid web url

  • kwargs – passed to urllib.request.urlopen()

Returns:

a file name that may not have a file extension