src.master package

Submodules

src.master.main module

src.master.main.get_combo_string(records: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c804d6d30>, cols: list)[source]

Paste column values together, separated by ‘_’.

Parameters:
  • records (pd.DataFrame) – Input dataset.
  • cols (list) – Columns to be pasted together.
Returns:

List of pasted column values.

Return type:

list

src.master.main.get_new_records(records: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c804d62e8>, previous_update: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c804d6828>, cols: list)[source]

Identify new records in an update data and a previous update data.

Based on a string of cols pasted together to form an identifier.

Example

Given cols = [‘country_territory_area’, ‘date_start’], pastes values in these columns together. Referred to as a “combo string”.

Any records in records with a “combo string” in previous_update will be not be recognised as a new record.

i.e. “United States of America_2020-01-01” == “United States of America_2020-01-01” means that records match.

Parameters:
  • records (pd.DataFrame) – Newly updated data.
  • previous_update (pd.DataFrame) – Previously updated data.
  • cols (list) – Columns to be considered when merging records.
Returns:

New records not present in previous_update.

Return type:

pd.DataFrame

Module contents