src.processing package

Submodules

src.processing.ACAPS module

src.processing.ACAPS.transform(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8080fb00>, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8080fef0>)[source]

Apply transformations to ACAPS records.

Parameters:
  • record (dict) – Input record.
  • key_ref (dict) – Reference for key mapping.
  • country_ref (pd.DataFrame) – Reference for WHO accepted country names.
  • who_coding (pd.DataFrame) – Reference for WHO coding.
Returns:

Record with transformations applied.

Return type:

dict

src.processing.CDC_ITF module

src.processing.CDC_ITF.add_date_end(record: dict)[source]

Function to make date_end date_start if measure_stage is “Lift”

Parameters:record (dict) – Input record.
Returns:Record with date_end changed conditionally, or original record.
Return type:type
src.processing.CDC_ITF.area_covered_national(record: dict)[source]

Function to remove area_covered == “national”

Replace with None.

Parameters:record (dict) – Input record.
Returns:Record with area_covered changed.
Return type:type
src.processing.CDC_ITF.join_comments(record: dict)[source]

Combine comments from “Concise Notes” and “Notes” fields.

Both will be stored in comments column of output dataset.

Parameters:record (dict) – Input record.
Returns:Record with merged comments.
Return type:type
src.processing.CDC_ITF.transform(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c80245048>, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c80245160>)[source]

Apply transformations to CDC_ITF records.

Parameters:
  • record (dict) – Input record.
  • key_ref (dict) – Reference for key mapping.
  • country_ref (pd.DataFrame) – Reference for WHO accepted country names.
  • who_coding (pd.DataFrame) – Reference for WHO coding.
Returns:

Record with transformations applied.

Return type:

dict

src.processing.JH_HIT module

src.processing.JH_HIT.apply_prov_measure_filter(record: dict, prov_measure_filter: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d240>)[source]

Filter only some prov_measure and prov_category values.

Only some JH_HIT codings are accepted.

Relies on prov_measure_filter defined in config.

Parameters:
  • record (dict) – Input record.
  • prov_measure_filter (pd.DataFrame) – Config of which codings to drop. Defined in config directory.
Returns:

If coding is included in WHO PHSM dataset, record, else None.

Return type:

type

src.processing.JH_HIT.blank_record_and_url(record: dict)[source]

Assign who_code == 11 and ‘Not enough to code’ to records with no comments AND no url.

Parameters:record (dict) – Input record.
Returns:Record with coding altered.
Return type:type
src.processing.JH_HIT.fill_not_enough_to_code(record: dict)[source]

Function to add “not enough to code” label when comments are blank.

Parameters:record (dict) – Input record.
Returns:Record with prov_measure and prov_category values altered conditionally.
Return type:type
src.processing.JH_HIT.transform(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d048>, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d1d0>, prov_measure_filter: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d2b0>)[source]

Apply transformations to JH_HIT records.

Parameters:
  • record (dict) – Input record.
  • key_ref (dict) – Reference for key mapping.
  • country_ref (pd.DataFrame) – Reference for WHO accepted country names.
  • who_coding (pd.DataFrame) – Reference for WHO coding.
  • prov_measure_filter (pd.DataFrame) – Reference for filtering by prov_measure values.
Returns:

Record with transformations applied.

Return type:

dict

src.processing.check module

src.processing.check.check_missing_iso(record: dict)[source]

DEPRACTED by output check?

Function to check for missing ISO codes

Note: will not throw an error for “unknown” values which much be handled later

src.processing.check.check_missing_who_code(record: dict)[source]

DEPRACTED by output check? Function to check for null who codes

Note: will not throw an error for “unknown” values which must be handled later

src.processing.check.check_record_keys_agree(record: dict, blank_record: dict)[source]

DEPRACTED by output check?

Parameters:
  • record (dict) – Description of parameter record.
  • blank_record (dict) – Description of parameter blank_record.
Returns:

Description of returned object.

Return type:

type

src.processing.main module

main.py

Functions to combine dataset specific transformers to individual records.

Needed:

individual transformers for each dataset

put shared methods in utils.py

Comprehensive testing

Documentation

Logging

General checks for record numbers etc

src.processing.main.process(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c803de7f0>, who_coding: dict, prov_measure_filter: dict, no_update_phrase: dict)[source]

Unify individual dataset transformers.

Applies different transformations for records from different datasets.

Parameters:
  • record (dict) – Input record.
  • key_ref (dict) – Reference for key mapping.
  • country_ref (pd.DataFrame) – Reference for WHO accepted country names.
  • who_coding (dict) – Reference for WHO coding.
  • prov_measure_filter (dict) – Reference for filtering by prov_measure values.
  • no_update_phrase (dict) – Reference for “no update” phrases.
Returns:

Record with transformations applied.

Return type:

type

src.processing.utils module

src.processing.utils.add_admin_level(record: dict)[source]

Set admin_level values to “national” or “other”.

If area_covered is blank: “national”, else: “other”.

Parameters:record (dict) – Input record.
Returns:Record with admin_level added.
Return type:type
src.processing.utils.apply_key_map(new_record: dict, old_record: dict, key_ref: dict)[source]

Apply key mapping between two records based on a key reference.

Example:

Given key_ref: {‘column1’:’column2’}.

Extracts values from old_record[‘column1’] to new_record[‘column2’].

Parameters:
  • new_record (dict) – Record with WHO PHSM keys.
  • old_record (dict) – Record with provider keys.
  • key_ref (dict) – Reference for mapping keys between records.
Returns:

  • Record with new values appliued to specified keys.
  • type – dict.

src.processing.utils.assign_id(records: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8002f4e0>, min_id: int = 1)[source]

Function to assign a unique ID to each record.

IDs are assigned in the format DATASET_NUMBER. i.e. ACAPS_1234.

Parameters:
  • records (pandas.DataFrame) – Dataframe of records which will have ID numbers added.
  • min_id (int) – Number to begin incrementing IDs from.
Returns:

Dataframe with IDs added.

Return type:

type

src.processing.utils.assign_who_coding(record: dict, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8002f3c8>, missing_value: str = 'unknown')[source]

Assign WHO coding to a record.

Adds: who_code, who_measure, who_subcategory, who_category.

Optionally adds: targeted, non_compliance, enforcement.

Transforms provider coding of interventions to WHO PHSM coding.

Parameters:
  • record (dict) – Input record.
  • who_coding (pd.DataFrame) – Dataframe of WHO PHSM intervention mappings.
  • missing_value (str) – Value to add if name mapping fails - defaults to “unknown”. This value is recognized by output checks.
Returns:

Record with WHO PHSM code mapping applied.

Return type:

type

src.processing.utils.assign_who_country_name(record: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8002fa90>, missing_value: str = 'unknown')[source]

Function to assign country names by ISO code.

Also adds: who_region, country_territory_area, iso_3166_1_numeric.

WHO recognizes standard country names which are transformed from ISOs defined on provider country names.

Parameters:
  • record (dict) – Input record.
  • country_ref (pd.DataFrame) – Dataframe of country name mappings.
  • missing_value (str) – Value to add if name mapping fails - defaults to “unknown”. This value is recognized by output checks.
Returns:

Record with country name mapping applied.

Return type:

type

src.processing.utils.create_id(dataset: str, length: int = 6)[source]

Create a random id of characters and numbers.

DEPRACATED?

Parameters:
  • dataset (str) – Dataset to which ids will be added.
  • length (int) – Length of new ID number.
Returns:

  • New ID number.
  • type – str.

src.processing.utils.generate_blank_record()[source]

Generate a blank record with the correct WHO PHSM keys.

Other objects requiring the same selection of keys descend from here.

Returns:
  • A blank record with keys in WHO PHSM column format.
  • type – dict.
src.processing.utils.get_min_id(fn: str, id_column: str = 'who_id')[source]

Function to open a file and extract the maximum numeric.

This will be the new min id to be incremented for the ID field.

Future: should be replaced by a set difference of existing IDs and an arbitrary ID sequence.

Example

Extracts numeric valeu of ID ACAPS_1234 -> 1234.

Parameters:
  • fn (str) – Filename to reference dataset.
  • id_column (str) – ID column name in reference dataset.
Returns:

  • Maximum numeric ID value.
  • type – int.

src.processing.utils.key_map(new_record: dict, old_record: dict, new_key: str, old_key: str)[source]

Implements key mapping from new_record to old_record.

For more information see apply_key_map.

Parameters:
  • new_record (dict) – Record with WHO PHSM keys.
  • old_record (dict) – Record with provider keys.
  • new_key (str) – Key in old_record.
  • old_key (str) – Corresponding key in new_record.
Returns:

Record with information mapped from new_key to old_key.

Return type:

type

src.processing.utils.new_id(dataset: str, length: int = 6, existing_ids: list = [None])[source]

Function to create a unique id given a list of existing ids.

DEPRACATED?

Parameters:
  • dataset (str) – Dataset to which ids will be added.
  • length (int) – Length of new ID number.
  • existing_ids (list) – Vector of existing IDs.
Returns:

  • New ID number.
  • type – str.

src.processing.utils.none_to_empty_str(s)[source]

Convert None values to an empty string.

Useful for changing None values for smooth mapping of who coding.

Parameters:s (type) – String to be converted.
Returns:Outut string, if string equalled None, returns ‘’, else returns original string.
Return type:type
src.processing.utils.parse_date(record: dict)[source]

Function to parse record date format.

Currently relying on parsing behaviour of pandas.to_datetime. NOTE: This is vulnerable to USA format dates parsed as EU dates

DEPRACATED?

Parameters:record (dict) – Dataset record.
Returns:Dataset record.
Return type:type
src.processing.utils.remove_tags(record: dict, keys: list = ['comments'])[source]

Remove HTML tags from defined columns.

Some datasets (CDC_ITF) provide comments that are enclosed in HTML tags for display on the web.

Identifies content inside of HTML tags and returns content only.

Example:

“<p>Content</p>” -> “Content”

Parameters:
  • record (dict) – Input record.
  • keys (list) – List of which keys HTML tage replacement should be applied to.
Returns:

Record with HTML tags replaced in the defined tags.

Return type:

type

src.processing.utils.replace_conditional(record: dict, field: str, value: str, replacement: str)[source]

Function to conditionally replace a value in a field.

Parameters:
  • record (dict) – Input record.
  • field (str) – Key of field to be conditionally altered.
  • value (str) – Value to identify and replace.
  • replacement (str) – Value to insert on replacement.
Returns:

Record with specified key altered if record[key] == value. Otherwise, the original record is returned.

Return type:

type

src.processing.utils.replace_country(record: dict, country_name: str, area_name: str)[source]

Replace country name with an area_covered name.

Promote a string in area_covered to country_territory_area.

Applies to records where a WHO recognised country is defined as an administrative region of a different country.

Parameters:
  • record (dict) – Input record.
  • country_name (str) – Country name to be matched.
  • area_name (str) – Area name to be matched.
Returns:

Record with country area_covered promotion applied.

Return type:

type

src.processing.utils.replace_sensitive_regions(record)[source]

Replace a selection of commonly occuring admin level issues.

WHO recognizes certain administrative definitions that differ from ISO conventions.

Future: Move specific region definitions to config directory.

Parameters:record (type) – Input record.
Returns:Record with sensitive regions changed.
Return type:type
src.processing.utils.shift_sensitive_region(record: dict, original_name: str, new_name: str)[source]

Function to demote sensitive country names to area_covered from country_territory_area.

Parameters:
  • record (dict) – Input record.
  • original_name (str) – Original country name from provider dataset.
  • new_name (str) – New WHO-recognised country name.
Returns:

Record with sensitive countries changed.

Return type:

type

Module contents