src.processing package¶
Submodules¶
src.processing.ACAPS module¶
-
src.processing.ACAPS.
transform
(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8080fb00>, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8080fef0>)[source]¶ Apply transformations to ACAPS records.
Parameters: - record (dict) – Input record.
- key_ref (dict) – Reference for key mapping.
- country_ref (pd.DataFrame) – Reference for WHO accepted country names.
- who_coding (pd.DataFrame) – Reference for WHO coding.
Returns: Record with transformations applied.
Return type: dict
src.processing.CDC_ITF module¶
-
src.processing.CDC_ITF.
add_date_end
(record: dict)[source]¶ Function to make
date_end
date_start
ifmeasure_stage
is “Lift”Parameters: record (dict) – Input record. Returns: Record with date_end changed conditionally, or original record. Return type: type
-
src.processing.CDC_ITF.
area_covered_national
(record: dict)[source]¶ Function to remove area_covered == “national”
Replace with None.
Parameters: record (dict) – Input record. Returns: Record with area_covered changed. Return type: type
-
src.processing.CDC_ITF.
join_comments
(record: dict)[source]¶ Combine comments from “Concise Notes” and “Notes” fields.
Both will be stored in comments column of output dataset.
Parameters: record (dict) – Input record. Returns: Record with merged comments. Return type: type
-
src.processing.CDC_ITF.
transform
(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c80245048>, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c80245160>)[source]¶ Apply transformations to CDC_ITF records.
Parameters: - record (dict) – Input record.
- key_ref (dict) – Reference for key mapping.
- country_ref (pd.DataFrame) – Reference for WHO accepted country names.
- who_coding (pd.DataFrame) – Reference for WHO coding.
Returns: Record with transformations applied.
Return type: dict
src.processing.JH_HIT module¶
-
src.processing.JH_HIT.
apply_prov_measure_filter
(record: dict, prov_measure_filter: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d240>)[source]¶ Filter only some prov_measure and prov_category values.
Only some JH_HIT codings are accepted.
Relies on prov_measure_filter defined in config.
Parameters: - record (dict) – Input record.
- prov_measure_filter (pd.DataFrame) – Config of which codings to drop. Defined in config directory.
Returns: If coding is included in WHO PHSM dataset, record, else None.
Return type: type
-
src.processing.JH_HIT.
blank_record_and_url
(record: dict)[source]¶ Assign who_code == 11 and ‘Not enough to code’ to records with no comments AND no url.
Parameters: record (dict) – Input record. Returns: Record with coding altered. Return type: type
-
src.processing.JH_HIT.
fill_not_enough_to_code
(record: dict)[source]¶ Function to add “not enough to code” label when comments are blank.
Parameters: record (dict) – Input record. Returns: Record with prov_measure and prov_category values altered conditionally. Return type: type
-
src.processing.JH_HIT.
transform
(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d048>, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d1d0>, prov_measure_filter: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8049d2b0>)[source]¶ Apply transformations to JH_HIT records.
Parameters: - record (dict) – Input record.
- key_ref (dict) – Reference for key mapping.
- country_ref (pd.DataFrame) – Reference for WHO accepted country names.
- who_coding (pd.DataFrame) – Reference for WHO coding.
- prov_measure_filter (pd.DataFrame) – Reference for filtering by prov_measure values.
Returns: Record with transformations applied.
Return type: dict
src.processing.check module¶
-
src.processing.check.
check_missing_iso
(record: dict)[source]¶ DEPRACTED by output check?
Function to check for missing ISO codes
Note: will not throw an error for “unknown” values which much be handled later
src.processing.main module¶
main.py¶
Functions to combine dataset specific transformers to individual records.
Needed:
individual transformers for each dataset
put shared methods in utils.py
Comprehensive testing
Documentation
Logging
General checks for record numbers etc
-
src.processing.main.
process
(record: dict, key_ref: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c803de7f0>, who_coding: dict, prov_measure_filter: dict, no_update_phrase: dict)[source]¶ Unify individual dataset transformers.
Applies different transformations for records from different datasets.
Parameters: - record (dict) – Input record.
- key_ref (dict) – Reference for key mapping.
- country_ref (pd.DataFrame) – Reference for WHO accepted country names.
- who_coding (dict) – Reference for WHO coding.
- prov_measure_filter (dict) – Reference for filtering by prov_measure values.
- no_update_phrase (dict) – Reference for “no update” phrases.
Returns: Record with transformations applied.
Return type: type
src.processing.utils module¶
-
src.processing.utils.
add_admin_level
(record: dict)[source]¶ Set admin_level values to “national” or “other”.
If area_covered is blank: “national”, else: “other”.
Parameters: record (dict) – Input record. Returns: Record with admin_level added. Return type: type
-
src.processing.utils.
apply_key_map
(new_record: dict, old_record: dict, key_ref: dict)[source]¶ Apply key mapping between two records based on a key reference.
Example:
Given key_ref: {‘column1’:’column2’}.
Extracts values from old_record[‘column1’] to new_record[‘column2’].
Parameters: - new_record (dict) – Record with WHO PHSM keys.
- old_record (dict) – Record with provider keys.
- key_ref (dict) – Reference for mapping keys between records.
Returns: - Record with new values appliued to specified keys.
- type – dict.
-
src.processing.utils.
assign_id
(records: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8002f4e0>, min_id: int = 1)[source]¶ Function to assign a unique ID to each record.
IDs are assigned in the format DATASET_NUMBER. i.e. ACAPS_1234.
Parameters: - records (pandas.DataFrame) – Dataframe of records which will have ID numbers added.
- min_id (int) – Number to begin incrementing IDs from.
Returns: Dataframe with IDs added.
Return type: type
-
src.processing.utils.
assign_who_coding
(record: dict, who_coding: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8002f3c8>, missing_value: str = 'unknown')[source]¶ Assign WHO coding to a record.
Adds: who_code, who_measure, who_subcategory, who_category.
Optionally adds: targeted, non_compliance, enforcement.
Transforms provider coding of interventions to WHO PHSM coding.
Parameters: - record (dict) – Input record.
- who_coding (pd.DataFrame) – Dataframe of WHO PHSM intervention mappings.
- missing_value (str) – Value to add if name mapping fails - defaults to “unknown”. This value is recognized by output checks.
Returns: Record with WHO PHSM code mapping applied.
Return type: type
-
src.processing.utils.
assign_who_country_name
(record: dict, country_ref: <sphinx.ext.autodoc.importer._MockObject object at 0x7f5c8002fa90>, missing_value: str = 'unknown')[source]¶ Function to assign country names by ISO code.
Also adds: who_region, country_territory_area, iso_3166_1_numeric.
WHO recognizes standard country names which are transformed from ISOs defined on provider country names.
Parameters: - record (dict) – Input record.
- country_ref (pd.DataFrame) – Dataframe of country name mappings.
- missing_value (str) – Value to add if name mapping fails - defaults to “unknown”. This value is recognized by output checks.
Returns: Record with country name mapping applied.
Return type: type
-
src.processing.utils.
create_id
(dataset: str, length: int = 6)[source]¶ Create a random id of characters and numbers.
DEPRACATED?
Parameters: - dataset (str) – Dataset to which ids will be added.
- length (int) – Length of new ID number.
Returns: - New ID number.
- type – str.
-
src.processing.utils.
generate_blank_record
()[source]¶ Generate a blank record with the correct WHO PHSM keys.
Other objects requiring the same selection of keys descend from here.
Returns: - A blank record with keys in WHO PHSM column format.
- type – dict.
-
src.processing.utils.
get_min_id
(fn: str, id_column: str = 'who_id')[source]¶ Function to open a file and extract the maximum numeric.
This will be the new min id to be incremented for the ID field.
Future: should be replaced by a set difference of existing IDs and an arbitrary ID sequence.
Example
Extracts numeric valeu of ID ACAPS_1234 -> 1234.
Parameters: - fn (str) – Filename to reference dataset.
- id_column (str) – ID column name in reference dataset.
Returns: - Maximum numeric ID value.
- type – int.
-
src.processing.utils.
key_map
(new_record: dict, old_record: dict, new_key: str, old_key: str)[source]¶ Implements key mapping from new_record to old_record.
For more information see apply_key_map.
Parameters: - new_record (dict) – Record with WHO PHSM keys.
- old_record (dict) – Record with provider keys.
- new_key (str) – Key in old_record.
- old_key (str) – Corresponding key in new_record.
Returns: Record with information mapped from new_key to old_key.
Return type: type
-
src.processing.utils.
new_id
(dataset: str, length: int = 6, existing_ids: list = [None])[source]¶ Function to create a unique id given a list of existing ids.
DEPRACATED?
Parameters: - dataset (str) – Dataset to which ids will be added.
- length (int) – Length of new ID number.
- existing_ids (list) – Vector of existing IDs.
Returns: - New ID number.
- type – str.
-
src.processing.utils.
none_to_empty_str
(s)[source]¶ Convert None values to an empty string.
Useful for changing None values for smooth mapping of who coding.
Parameters: s (type) – String to be converted. Returns: Outut string, if string equalled None, returns ‘’, else returns original string. Return type: type
-
src.processing.utils.
parse_date
(record: dict)[source]¶ Function to parse record date format.
Currently relying on parsing behaviour of pandas.to_datetime. NOTE: This is vulnerable to USA format dates parsed as EU dates
DEPRACATED?
Parameters: record (dict) – Dataset record. Returns: Dataset record. Return type: type
Remove HTML tags from defined columns.
Some datasets (CDC_ITF) provide comments that are enclosed in HTML tags for display on the web.
Identifies content inside of HTML tags and returns content only.
Example:
“<p>Content</p>” -> “Content”
Parameters: - record (dict) – Input record.
- keys (list) – List of which keys HTML tage replacement should be applied to.
Returns: Record with HTML tags replaced in the defined tags.
Return type: type
-
src.processing.utils.
replace_conditional
(record: dict, field: str, value: str, replacement: str)[source]¶ Function to conditionally replace a value in a field.
Parameters: - record (dict) – Input record.
- field (str) – Key of field to be conditionally altered.
- value (str) – Value to identify and replace.
- replacement (str) – Value to insert on replacement.
Returns: Record with specified key altered if record[key] == value. Otherwise, the original record is returned.
Return type: type
-
src.processing.utils.
replace_country
(record: dict, country_name: str, area_name: str)[source]¶ Replace country name with an area_covered name.
Promote a string in area_covered to country_territory_area.
Applies to records where a WHO recognised country is defined as an administrative region of a different country.
Parameters: - record (dict) – Input record.
- country_name (str) – Country name to be matched.
- area_name (str) – Area name to be matched.
Returns: Record with country area_covered promotion applied.
Return type: type
-
src.processing.utils.
replace_sensitive_regions
(record)[source]¶ Replace a selection of commonly occuring admin level issues.
WHO recognizes certain administrative definitions that differ from ISO conventions.
Future: Move specific region definitions to config directory.
Parameters: record (type) – Input record. Returns: Record with sensitive regions changed. Return type: type
-
src.processing.utils.
shift_sensitive_region
(record: dict, original_name: str, new_name: str)[source]¶ Function to demote sensitive country names to area_covered from country_territory_area.
Parameters: - record (dict) – Input record.
- original_name (str) – Original country name from provider dataset.
- new_name (str) – New WHO-recognised country name.
Returns: Record with sensitive countries changed.
Return type: type