aim.hathifiles.poll module

class aim.hathifiles.poll.NewFileHandler(new_files: list, store: list)[source]

Bases: object

notify_webhook()[source]

Sends a list of update files that haven’t been seen to the argo events webhook for hathifiles.

replace_store(store_path: str = 'tmp/hathi_file_list_store.json')[source]

Replaces the store file with a list of hathifile update files

Parameters:

store_path (str, optional) – path to hathifiles store file. Defaults to S.hathifiles_store_path.

property slim_store

Removes files from the store that are over one year old

Returns:

list of update files that are newer than one year

Return type:

list

aim.hathifiles.poll.check_for_new_update_files(latest_update_files: list | None = None, store: list | None = None, new_file_handler_klass: ~typing.Type[~aim.hathifiles.poll.NewFileHandler] = <class 'aim.hathifiles.poll.NewFileHandler'>)[source]

Gets the latest list of hathifiles from hathitrust.org, loads up the store file and compares them. If there are new files triggers the argo events webhook and updates the store. If there are no new files, it exits.

Parameters:
  • latest_update_files (list | None, optional) – list of latest update files. This will call get_latest_update_files() when None is given.

  • store (list | None, optional) – list of hathifiles update files that have been seen before. This will call get_store() if None is given.

  • new_file_handler_klass (Type[NewFileHandler], optional) – Class that handles new update files. Defaults to NewFileHandler.

aim.hathifiles.poll.create_store_file(store_path: str = 'tmp/hathi_file_list_store.json') None[source]

Creates a store file of the current list of update files from hathitrust.org if there does not already exist a store file.

Parameters:

store_path (str, optional) – path to store file. Defaults to S.hathifiles_store_path.

aim.hathifiles.poll.filter_for_update_files(hathi_file_list: list) list[source]

Takes a plain hathifile_file_list list and filters to get only the file names for update files

Parameters:

hathi_file_list (list) – full list of current hathifiles from hathitrust.org

Returns:

flat list of update file names

Return type:

list

aim.hathifiles.poll.get_hathi_file_list() list[source]

Gets the latest current list of hathifiles from hathitrust.org.

Returns:

list of dictionairies that describe hathifiles

Return type:

list

aim.hathifiles.poll.get_latest_update_files()[source]

Gets the latest list of current hathifiles from hathitrust.org and filters for just a list of update files.

Returns:

flat list of update file names

Return type:

list

aim.hathifiles.poll.get_store(store_path: str = 'tmp/hathi_file_list_store.json') list[source]

Loads the store file that contains the list of all hathifile update files that have been seen before.

Parameters:

store_path (str, optional) – path to the store file. Defaults to S.hathifiles_store_path.

Returns:

list of hathifile update files that have been seen before

Return type:

list