File location: P:\VDEC\source\analysis\tabulation\create_descriptive_table.sas
Creates a descriptive table in HTML format.
If writing to HTML: Must call %open_output_file(title=, file=) beforehand and %close_output_file(file=) afterwards. It uses offset-v0_1_0.css (or later) for styling, so copy these to the project folder as well.
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\analysis\tabulation\create_rates_table-v0_1_3.sas
Create a rates table in HTML format
Must call %open_output_file(title=, file=) beforehand and %close_output_file(file=) afterwards. It uses rates_offset-v0_1_0.css (or later) for styling, so copy this to the project folder as well.
There is a lengthy to-do list. Feel free to contribute to improve the macro.
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\perform_attrition-v0_2_2.sas
This macro allows the creation of a text-based attrition diagram and sets an inclusion flag to off in a dataset.
###Notes Ensure the input datasets (attrition_ds=work.attrition_steps and inclusion_ds=work.included_groups in example) look right before running the macro. They can be create using (look at actual source): * N.B. Use spaces instead of tabs for spacing; * N.B. Use parentheses around logical conditions; * Attrition steps; data work.attrition_steps; infile datalines truncover; input step_name $1-48 step_exclusion $49-150;
datalines; Main hospital is rural (main_hosp_type = ‘03’ or main_hosp_type = ‘04’ or main_hosp_type = ‘05’) Main hospital is out of province or unknown (main_hosp_type = ‘12’ or main_hosp_type = ‘99’) No valid C and S results exposure_status = ‘NDT’ No hospital pharmacy data exposure_status = ‘NHP’ No ET exposure_status = ‘NET’ ; * Included groups; data work.included_groups; infile datalines truncover; input group_name $1-48 group_inclusion $49-150;
datalines; Appropriate ET (exposure_status = ‘AET’ or exposure_status = ‘AAE’) Inappropriate ET (exposure_status = ‘IET’ or exposure_status = ‘AIE’) C and S does not cover ET exposure_status = ‘UET’ ;
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\sasfmts_to_stata_value_label.sas
This macro is to convert sas formats to STATA value labels and label the values as they were in SAS dataset. This macro takes in formats from work and common library and create a .do file at the desired location with “label define” and “label val” statements.
input_ds = Input dataset name output_file_loc = Location where one wants to have the output .do file. No quotes reqd. e.g. P:\flu_eval\tests , output_file_name = Output .do file name. No quotes reqd. e.g. label_vars.do,
Example code: %sasfmts_to_stata_value_label( input_ds=project.test ,output_file_loc=P:\VDEC\test suites\SAS macros ,output_file_name=test_labels.do );
SAS formats with ranges are not converted to STATA value labels. If SAS format has missing or other value, no labels will be created for those values.
Please use notsorted option in SAS proc format statement to have label define statment created in same sort order. e.g. proc format value $rha_f (notsorted) ‘WP’= ‘Winnipeg’ ‘IE’= ‘Interlake-Eastern’ ‘NO’= ‘Northern’ ‘SO’= ‘Southern’ ‘WE’= ‘Prairie Mountain’ ‘PC’= ‘Public Trustee / In CFS care’; run; Running this macro creates ouput as ‘label define rha_f_decode 0 “WP” 1 “IE” 2 “NO” 3 “SO” 4 “WE” 5 “PC”‘. If ‘notsorted’ is not used, output line will be ‘label define rha_f_decode 0 “IE” 1 “NO” 2 “PC” 3 “SO” 4 “WE” 5 “WP”’
Since STATA doesn’t allow value labels to character variables, so new variable will replace the old variable with numeric values and assign appropriate value labels.
To use value labels in your STATA script: Call the .do file in the script
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\conditions\import_condition_db.sas
This macro imports the standard format condition database into sas from 3 csv files based on the standard spreadsheet format.
Due to SAS limits only the first 32,767 rows are evaluated to determine the format of the columns. This can lead to problems when your data has more observations.
File location: P:\VDEC\source\data preparation\drugs\characterize_drug_usage.sas
New macro for characterizing drug use. Allows for both single ATC drugs and drug groups. Calculation of duration accounts for the full history of each person’s drug use.
Non-repeated mode - unequal length periods: %describe_drug_use( drug_filepath = “P:\VDEC\test suites\SAS macros\Test_drug_macro\docs\drug_trees3.csv”, periods = (0,1)(1,5)(5,10), periodunit = year direction_ = -1, cohortds = cohort, entry_datevar = subject_entry_date, exit_datevar = index_date, dpdds = health.mhdp_udm_1998apr_v3, dpinds = project.cll_epi_orig_dpin, outputds = test_output, mode_configfile = “U:\Documents\cll_epi\source\mode_config.csv”, postfix = _a );
Repeated mode - equal length periods %describe_drug_use( drug_filepath = “P:\VDEC\test suites\SAS macros\Test_drug_macro\docs\drug_trees3.csv”, periods = (0,5), periodunit = year direction_ = -1, cohortds = cohort, entry_datevar = subject_entry_date, exit_datevar = index_date, dpdds = health.mhdp_udm_1998apr_v3, dpinds = project.cll_epi_orig_dpin, outputds = test_output, mode_configfile = “U:\Documents\cll_epi\source\mode_config.csv”, postfix = _a, repeated_mode = 1, numperiods = 10 );
Input parameters: * drug_filepath = The path of the CSV file specifying drug structures of interest. * periods_ = Periods of interest * Non-repeated mode: (period_1 start, period_1 end)(period_2 start, period_2 end)… * Repeated mode: (overall start, repeated period length) * periodunit = Unit of the periods, default is year. * direction_ = The direction of the periods, i.e. 1 for forward and -1 for backward. * cohortds = The name of the dataset containing the cohort of interest. * entry_datevar = The name of the entry date variable. * exit_datevar = The name of the index date variable. * dpdds = The name of the DPD dataset. * dpinds = The name of the DPIN dataset. * outputds = The name of the output dataset. * id_var = identification variable, default scrphin. * drugtree_toolpath = The path of the drugtree_tool macro. (Default path is P:\VDEC\source\data preparation\drugs\drug_tree_tools.sas) * mode_configfile = The configuration file path specifying the mode of each drug class. * 1: full mode –> number of prescriptions, ever-use, duration and dose information will be extracted. * 0: ever-use mode –> ever-use information and number of prescriptions only. * postfix = (optional) The postfix needed to name the generated variables, e.g. _a for drug usage after the index date. * repeated_mode = Used for a number of periods with equal length, default is 0. * numperiods = The number of equal length of period, default is 1 and used in combination with repeated_mode = 1. * output_name_table = Export a lookup table for the generated variables and the original drug names, default is 1. * drug_var_table_nm = The name of the drug name look up table, default is drug_var_names.
Output dataset: * For each of the period, the duration, total dose in DDD and ever-use of each drug group are calculated. * The generated variables are: * ever_{drug group name}p{period order number}, * duration{drug group name}p{period order number}, * ddd{drug group name}p{period order number}. * period order number are determined according to the input order in the periods parameter.
Drug tree file (csv): search_for,group_under,active,notes,last_updated,updated_by C10AA01,simva,1,,2017-09-20, C10AA02,lova,1,,, C10AA03,prava,1,,, C10AA04,fluva,1,,, C10AA05,atorva,1,,, C10AA06,ceriva,1,,, C10AA07,rosuva,1,,, prava,hydro,1,,, rosuva,hydro,1,,, simva,hipoml,1,,, atorva,hipoml,1,,, fluva,lopoml,1,,, lova,lopoml,1,,, hydro,anystatin,1,,, hipoml,anystatin,1,,, lopoml,anystatin,1,,, C10AA06,anystatin,1,,,
Mode configuration file (csv): drug_class,full_mode,drug_label simva,0,“Simvastatin” lova,1,“Lovastatin” fluva,1,“Fluvastatin” prava,0,“Pravastatin” atorva,1,“Atorvastatin” ceriva,0,“Cerivastatin” rosuva,0,“Rosuvastatin”
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\drugs\find_drug_strength_in_description.sas
Find strength and unit information in the drug description variable. Modify the DPD data with the desired information.
%find_strength_in_description(input_dpdds = health.mhdp_udm_1998apr_v3, output_dpdds = dpd_cmb_metformin, output_existingatc = available_metformins, atc_patternds = cmb_metforminatc, keyword = metformin, DDD = 2, DDD_unit = “G”);
File location: P:\VDEC\source\data preparation\drugs\impute_DPIN_costs.sas
This macro imputes the drug costs for DPIN records, using a modified version of MCHP’s algorithm. It outputs the imputed cost (in reference year $).
The imputed cost is the sum of the drug ingredients cost and the dispensing fee. See in-code comments for details
CPI data can be prepared as below: proc sql noprint;
select Manitoba_all_items, Manitoba_prescribed_medicines into :Manitoba_base_CPI_all, :Manitoba_base_CPI_drugs from project.vdec_cpi_2000_2015 (Updated consumer price index data can be downloaded from statcan for all items and prescribed medicines) where year = &BASE_DOLLAR_YEAR;
create table cpi (label = “CPI ratios for comparison of health care costs”) as select year, &BASE_DOLLAR_YEAR as base_year label=“Base year for comparison”, &Manitoba_base_CPI_all./Manitoba_all_items as all_items_multiplier label = “Manitoba CPI ratio for all items”, &Manitoba_base_CPI_drugs./Manitoba_prescribed_medicines as drugs_multiplier label = “Manitoba CPI ratio for prescription drugs” from project.vdec_cpi_2000_2015;
quit;
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\health utilization\health_utilization.sas
This has four macros which imputes the physician visit & hospitalization, cost of visit & hospitalization. phys_visits: Number of physician claims for office visits(MH prefix=7). Multiple claims on one day is counted as 1 visit. phys_cost: Cost of physician claims. Multiple claims on one day is counted as 1 visit. Multiple conditions can occur during the same visit, so a distinct on scrphin, net_fee, servdt, md, prefix, tariff is applied. hosp_visits: Number of hospitalizations. Multiple conditions occuring during the same admission is counted as one. hosp_cost: Hospitalization cost.
phys_ds = The input Medical Services dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix nobs_phys), date_var_cohort = The index date for the person (e.g. index_date), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Medical Services dataset (e.g. %str( tariff not= 4344) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort.
phys_ds = The input Medical Services dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix cost_phys), date_var_cohort = The index date for the person (e.g. index_date), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Medical Services dataset (e.g. %str( tariff not= 4344) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort, CPI_db = The dataset with consumer price index (CPI) information. This set should at least include a variable year for the year and &CPI_var . e.g. cpi, CPI_var = The CPI-based multiplier, i.e. (all items CPI for &CPI_base_year)/(all items CPI for year of interest) e.g all_items_multiplier, CPI_base_year = The base year for the CPI comparisons. [This will only be used to generate the variable label]
hosp_ds = The input Hospital Abstract dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix nobs_hosp), date_var_cohort = The index date for the person (e.g. index_date), date_var_hosp = The relevant date in Hospital abstract (e.g. admdt), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Hospital Abstract dataset (e.g. %str( transact = ‘1’) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort.
hosp_ds = The input Hospital Abstract dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix cost_hosp), date_var_cohort = The index date for the person (e.g. index_date), date_var_hosp = The relevant date in Hospital abstract (e.g. admdt), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Hospital Abstract dataset (e.g. %str( transact = ‘1’) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort, costing_vars=Variables which have costing information (e.g. %STR(riw,inptriw) ).Always use %str()., cpwc_dollars=cost per weighted case or cost of a standard hospital stay(e.g. 6152),
The number of physician visits is restricted to just office visits i.e prefix=‘7’. The imputed cost for physician claim is sum of net_fee multiplied by CPI-based multiplier for each year. Time period of (0,1) includes both dates. The imputed cost for hospitalization is sum of Resource Intensive Weights(RIW) & Day procedure Groups(DPG) multiplied by cost of a standard hospital stay(cpwc). For physcian claim costs, the columns net_fee, servdt, md, prefix, tariff are required in physician dataset. Run MCHP %net_med macro on physician dataset (if not ran during data extraction or else) before using phys_visits & phys_cost macros
If date_var_cohort is 2012-03-25 and duration_period is (-1,1) and duration_unit is year then For physician claim visits or hospitalization: Duration will be 2011-03-25 <= date_var_cohort < 2013-03-25 i.e includes start of duration_period but less than end of duration_period. For physician claim costs or hospitalization costs: Duration will be 2011-03-25 <= date_var_cohort <= 2013-03-25 i.e includes both start of duration_period and end of duration_period.
CPI data can be prepared as below: proc sql noprint;
select Manitoba_all_items, Manitoba_prescribed_medicines into :Manitoba_base_CPI_all, :Manitoba_base_CPI_drugs from project.vdec_cpi_2000_2015 (Updated consumer price index data can be downloaded from statcan for all items and prescribed medicines) where year = &BASE_DOLLAR_YEAR;
create table cpi (label = “CPI ratios for comparison of health care costs”) as select year, &BASE_DOLLAR_YEAR as base_year label=“Base year for comparison”, &Manitoba_base_CPI_all./Manitoba_all_items as all_items_multiplier label = “Manitoba CPI ratio for all items”, &Manitoba_base_CPI_drugs./Manitoba_prescribed_medicines as drugs_multiplier label = “Manitoba CPI ratio for prescription drugs” from project.vdec_cpi_2000_2015;
quit;
cpwc_dollars can be found on CIHI website for specific year.
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\matching\create_case_control_sets-v3_0_0.sas
This file provides macros to perform cost-based risk set sampling to go from survival data to a nested case control setup
The current version calculates the cost based on index date and a caller-supplied list of variables.
For some reason SAS cannot centrally set a random stream. It needs to be done for each datastep. Because there are two randomization steps in the sampling process and one in matching risk set macro, three different random seeds need to be provided. Run the following code thrice to get random numbers for the seeds (ALWAYS RERUN THIS FOR NEW PROJECTS): %let MAX_SEED = %eval(2*31-1); %let TIME_NOW = %sysfunc( time() ); %let FRAC_PART_TIME = %sysevalf( &TIME_NOW - %sysfunc(floor(&TIME_NOW)) ); %let TIME_SEED = %sysfunc(round( %sysevalf(&MAX_SEED&FRAC_PART_TIME) )); %put; %put Your seed is: &TIME_SEED;
Data needs to be prepared as survival time data before using this macro. Each record should have a person identifier, a subject entry date, an exit date, and a flag indicating whether or not it is a case.
The follow-up time is sampled for potential control index dates based on the prepared survival dataset.
Example code: %sample_follow_up_time( input_set = person_survival_list, output_set = sampled_subjects, random_seed_A = &RANDOM_SEED_N_SAMPLES, random_seed_B = &RANDOM_SEED_SAMPLING, person_id_var = scrphin, time_per_sample = &DELTA_T, case_var = is_case, entry_var = subject_entry_date, exit_var = subject_exit_date, debug_mode = 0 );
input parameters: * input_set = Input survival dataset * output_set = Sampled output dataset * random_seed_A, random_seed_B = Two different random seeds (see above) * person_id_var = variable that uniquely identifies a person (default is scrphin) * time_per_sample = Number of days per sample (e.g., 365 means number of sampled index dates = years of follow-up [with random rounding]) * case_var = Variable for the case flag (default = is_case) * entry_var = Name of the entry date variable (default = subject_entry_date) * exit_var = Name of the exit date variable (default = subject_exit_date) * debug_mode = If set to 1, does not delete macro data-sets for troubleshooting. [optional]
The output are subjects with an index date which is either the case date (if case) or randomly sampled from the follow-up time (if not case). The output subjects are persons on a specific date. Only the sampled dates will be considered for matching.
Add whatever additional covariates (for matching) based on the person-index date combinations.
Gets the best matches based on the sampled index dates Example code: %match_risk_sets( input_set = sampled_subjects_with_match_vars, output_set = sampled_risk_set_list, random_seed_C = &RANDOM_SEED_SAMPLING_C, controls_per_case = &N_CONTROL, exact_match_vars = sex favourite_pet, max_cost = &max_cost, index_var = index_date, costing_vars = birthdt covered_days, case_var = is_case, set_var = _set, cost_var = _cost, person_id_var = scrphin, entry_var = subject_entry_date, debug_mode = 0 );
input parameters: * input_set = Input sampled subject list (%sample_follow_up_time output + additional vars) * output_set = Risk sets for case control study * random_seed_C = Different random seed from random_seed_A or random_seed_B (see above) * controls_per_case = Number of controls to sample per case * exact_match_vars = variables for which case and control values must be equal (separated by spaces) * max_cost = maximum cost allowed for matching (e.g., 365) * index_var = the index date variable, e.g. index_date * costing_vars = variables used to evaluate cost of a match (excluding index_date, which is added automatically). * case_var = variable in which the case flag is stored (default = is_case) * set_var = variable in which the set number is stored (default is _set) * cost_var = variable in which the match cost is stored (default is _cost) * person_id_var = variable that uniquely identifies a person (default is scrphin) * entry_var = variable that contains the subject’s entry date. * debug_mode = If set to 1, does not delete macro data-sets for troubleshooting. [optional]
The output are the sampled risk sets with matching costs.
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\matching\diagnostic.sas
This file provides diagnostics like controls per match type etc. for case control matching.
Example code: %diagnostic( indata=sampled_risk_set_list, case_var=is_case, set_var=_set, controls_per_case=5, type=match_type, exact_match_vars=sex, continuous_var=age, categorical_var=non_pan_flu_vaccine filename_and_location=P:\flu_eval\tests\diagnostic.txt);
input parameters: * indata = Input dataset with cases and controls * case_var = variable in which the case flag is stored (default = is_case) * set_var = variable in which the set number is stored (default is _set) * controls_per_case = Number of controls to sample per case * type = variable in which match type is stored like match done on 1) residence 2) Physician etc (optional) * exact_match_vars = variables for which case and control values must be equal (separated by spaces) (optional) * continuous_var = Variables for which one want to have mean, median and other stats. Should be continuous variable like age (separated by spaces)(optional) * categorical_var = Should be categorical variable like rha (separated by spaces)(optional) * title = Title of the table, default is “Diagnostics” * filename_and_location = provide location and filename. * suppress_number = Suppress the cells with individuals or events if they are less than a particular value. Default is 6.
The output is the rosewood generated text file * Number of cases & controls(regardless of set) * Number of sets(cases) by number of controls in each set i.e number of cases with 0,1,.. matches. * For continuous variable, average RMS(root mean square) value over all sets. * Stratified average RMS value by exact match variables * For categorical variable, calculate the number of set where 1,2,3,.. controls have concordant, discordant or missing values for controls.
Note: * Cases with no controls are excluded from continuous variable and categorical variable analysis.
TODO: * Handle suppression better
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\data preparation\process original databases\combine_ICU_data.sas
This macro combines the ICU records from both the CCIC database and the hospital abstract to generate one combined ICU dataset.
File location: P:\VDEC\source\data preparation\process original databases\identify_immunizations-v1_0_4.sas
This macro identifies immunization status as covariates. Currently it outputs the status for: * Influenza vaccine in the previous 12 month * Pneumococcal vaccine
File location: P:\VDEC\source\data preparation\process original databases\process_hospital_abstract.sas
This macro processes the original Hospital Abstracts database by: * Creating a combined hospital visit from multiple hospital records, when periods overlap or are sequential. * Getting the admitting and main hospital for the visit.
The output is the same as the input except * admdt is the admission for the combined visit, admdt_abstract is the admission date for the original abstract row * sepdt is the discharge for the combined visit, sepdt_abstract is the discharge date for the original abstract row * first_hosp, first_hosp_type are the hospital and type for the initial admission for the visit * main_hosp, main_hosp_type are the main hospital and type for the visit
File location: P:\VDEC\source\data preparation\process original databases\process_MH_registry.sas
This macro processed the full (2+ records per year per person) original Manitoba Health (MH) registry by: * Selecting the most latest birthday and gender by acquisition date from all entries for a person * Identifying MH coverage periods * Identifying address periods (an address is the combination of municipal code and postal code) * Calculating several additional variables (covariates) based on MCHP macros
The output is the same as the input except * birthdt and gender is picked of the latest acquisition date for a person and unique for each scrphin * coverage_start is the start date of the combined/unified coverage period * coverage_end is the end date of the combined/unified coverage period * address_start is the start date at the address * address_end is the end date at the address * rha_codeSee/use the MCHP format $rhal. format for code mapping, “PC” means public trustee or in CFS care * rhad_codeSee/use the MCHP format $rhal. format for code mapping, “PC” means public trustee or in CFS care * CA_codeSee/use the MCHP format $rhal. format for code mapping * NC_codeSee/use the MCHP format $rhal. format for code mapping * locality_residenceOutput as R(ural), U(rban), N(ot found) * income_quintileOutput as number 1-5 (or NF for not found) * sefi * material_deprivation * social_deprivation
File location: P:\VDEC\source\data preparation\vaccination completeness\vaccination.sas
It tells about immunization completeness status.
input_cohort_ds = The cohort for the study, input_ds = Input dataset e.g. MIMS or CPCSSN dataset. It should be in general form and atleast contain variables with names as scrphin, service_dt, code and coding_system. service_dt is the date variable for the date when vaccine is administered. code e.g. tariff(MIMS) or ATC code(CPCSSN) variable. coding_system e.g. ATC or MH-Tariff., vaccines_of_interest = Vaccine and duration to know the status for, input_vaccine_composition = Vaccine name linking to antigen. Default is vdec_vaccine_composition, input_vaccine_agent = Code linking to antigens. Default is vdec_vaccine_agent, input_childhood_immun_schedule = Schedule details when policy implemented etc.. Default is vdec_vaccine_childhood_schedule, input_childhood_immun_dosage = Contains doses reqd and at what age etc.. Default is vdec_vaccine_childhood_dosage, PHIN_var = Default is scrphin, birth_dt_var = Default is birthdt, gender_var = Default is gender. values as created by registry macro, output_widefmt_binary_ds = Output dataset with one row per phin with vaccines as columns and complete vaccination as 1=yes and 0=no, output_widefmt_date_ds = Output dataset with one row per phin with vaccines as columns and date when the vaccination completes, output_longfmt_ds = Output dataset with one row for each vaccine. days_gap = The max number of days between consecutive vaccine service dates to count as one visit. Default is 0. debug_mode = Default is 0, keep_intermediate_schedule_data = Keep immunization_schedule dataset. Default is 0.
Example: data indata; infile cards dlm=‘,’ dsd pad; length vaccine $20. duration $15.; input vaccine $ duration $; cards; DTaP,2 year DTaP,7 year PCV,2 year MMR,2 year Varicella,2 year ; RUN;
%vaccination_schedule( input_cohort_ds = &output_set_cohort., input_ds = &input_set_mims., vaccines_of_interest = indata, input_vaccine_composition = vdec_vaccine_composition, input_vaccine_agent = vdec_vaccine_agent, input_childhood_immun_schedule = vdec_vaccine_childhood_schedule, input_childhood_immun_dosage = vdec_vaccine_childhood_dosage, PHIN_var = scrphin, birth_dt_var = birthdt, gender_var = gender, output_widefmt_binary_ds = vaccine_widefmt, output_widefmt_date_ds =, output_longfmt_ds =, days_gap = 3, debug_mode = 0, keep_intermediate_schedule_data = 1 );
####TODO Lot of things like structure the output properly, HPV and Hep-B are two dose vaccine given after a certain time after 1st dose etc.
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\documentation\code_diary.sas
The main file and included scripts need to incorporate specially formatted comment blocks to be included in the output file. An example is: The special comment block is opened with:/** The special comment block is closed with:/ / An example of use inside the blocks is: @main :title The best documentation ever @main :authors Author One; Author Two; Author Three @main :org Our Glorious Institute @main :version 1.2.3 @main A general comment @def ABC = The alphabet @stat The p-values are calculated using the carrot test for bunnies @note I’m writing this example on a Friday afternoon… A line without keyword
Parsable single line comments are also supported: The line starts with: ** The line ends with: ;
The lines are all grabbed and can be organized by using @keyword for the same themes (e.g. @def in the example, will put all @def lines together). Use . as a seperator for multi-level keywords, e.g. @covar.cost. There are a few special tags (e.g. @main :title) that grab the document metadata. If a line within a comment block does not have a keyword it is seen as a continuation of the previous line (in the same block)
The output can be converted to any format (e.g. pdf [needs latex installed], word, html) using pandoc. (See Google for details) Go to the directory in the command window and use * markdown to pdf: pandoc infile.txt –toc –latex-engine=pdflatex -o outfile.pdf * markdown to word: pandoc -s -S infile.txt –toc -o outfile.docx * markdown to html: pandoc -s infile.txt –toc -o outfile.htm
The following example demonstrates the use of the macro, the section_: inputs should have the same variable names as the example: data work.alias_list; infile datalines; input short_keyword $1-10 long_keyword $11-50;
datalines; cond condition cov covariate covar covariate def definition descr descriptive incl inclusion excl exclusion exp exposure out outcome var variable ;
data work.order_list; infile datalines; input keyword $1-15 order_no 16-20;
datalines; todo -999 assert -998 main -100 definition -94 variable -92 note -90 inclusion -85 exclusion -80 condition -70 exposure -60 outcome -50 covariate -40 descriptive -30 rate -20 stat -10 no_keyword 0 ;
data work.header_list; infile datalines; input keyword $1-15 header $16-50;
datalines; condition Condition criteria covariate Covariates definition Definitions descriptive Descriptive tables exclusion Exclusion criteria exposure Exposures inclusion Inclusion criteria main Main no_keyword General note Notes outcome Outcomes rate Rate tables stat Statistics todo Task list variable Variables ;
data work.scrub_list; infile datalines; input keyword $1-15;
datalines; assert todo ;
%parse_comments(
input_main_file = &SOURCE_ROOT\main_sas.sas,
out_dir = P:\project
out_file = workplan_coding.txt,
out_file_scrubbed = workplan_output.txt,
section_aliases = work.alias_list,
section_order = work.order_list,
section_headers = work.header_list,
sections_scrubbed = work.scrub_list
);
Authors: Christiaan Righolt Copyright © 2016 Vaccine and Drug Evaluation Centre, Winnipeg.
File location: P:\VDEC\source\documentation\convert_markdown_to_html.sas
Macro to convert a markdown style document to html. This is not supposed to supplant proper tools like pandoc, but only to aid visualization on a closed system that does not allow the installation of software.
The tool automatically creates a table of contents
The tool supports the following markdown markup: * Pandoc-style metadata prefixed with % * Headers prefixed with N### depending on the level (e.g. #### is 4th level) * Unordered lists prefixed with * * Order lists prefixed with n.
Not supported is: * Nested lists * Bold/italic text * Links, images etc.
Authors: Christiaan Righolt Copyright © 2016 Vaccine and Drug Evaluation Centre, Winnipeg.
File location: P:\VDEC\source\documentation\generate_macro_documentation.sas Script to parse the macro and script documentation for the VDEC source folder
File location: P:\VDEC\source\documentation\macro_diary.sas
Macro to parse documentation for all sas macros and scripts in a root directory.
See sample_macro_documentation for markup of files.
Authors: Christiaan Righolt Copyright © 2016 Vaccine and Drug Evaluation Centre, Winnipeg.
File location: P:\VDEC\source\documentation\sample_macro_documentation.sas
This file is a sample/template for VDEC macro documentation that describes what sections should be included. This documentation is written in markdown.
General comments on how to use it. (e.g. macro’s to call before and or after this one)
A list of parameters (and their meaning) used in the macro. * parameter_1 = First parameter * parameter_2 = Second parameter [optional]
Other detailed information about how to use the macro properly (e.g. details on the format of certain parameters)
List of known issues/warnings/etc. for the macro. * This file is not really code, just documentation
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\general\rosewood-v1-style.sas
A macro to write output to Rosewood, a structured table language. The tables create by Rosewood are human readible. Use Carpenter to translate these tables into various other formats, including Word, HTML and Latex.
Rosewood is a macro that writes directly to a text file. The file and related content are referred to by Rosewood handles, which contains four global macro variables. These can be defined as follows: %global current_line_T1 control_T1 row_T1 column_T1; %let rw_handles = file_name = ‘C:\project dir\Table 1.txt’, current_line_macro = current_line_T1, control_macro = control_T1, row_num = row_T1, column_num = column_T1;
All options are shown with their relevant macro calls as examples.
There are two options: * Write a fulll data set directly to a Rosewood file. * Manually control what is written, in this case the order is: Open, title, header cells (cell, number ci), end_header, main cells (cell, number ci), footer, close. Also use next_row and control were appropriate.
%rosewood( &rw_handles, function = write, title = , footer = , dataset_name = work.dataset_to_write ); The input dataset_name should include the library name.
%rosewood(&rw_handles, function=open);
%rosewood(&rw_handles, function=title, title_text=“Table 1: Example of a table title”);
%rosewood(&rw_handles, function=cell, content=“Content of cell”, [options, see below]); Cell options: * rows = Number of rows for cell to span [optional, default is 1] * columns = Number of columns for cell to span [optional, default is 1] * width = Width of cell in characters (will not show up in control section) [optional, default is 32] * align = Alignment of the cell [optional: either centre, center, left or right] * indent = Indent the cell contents [optional set to 1 if yes] * bold = Bold the cell contents [optional set to 1 if yes] * emphasize = Emphasize the cell contents [optional set to 1 if yes] * plain = Have the cell contents formatted plain [optional set to 1 if yes]
%rosewood(&rw_handles, function=number, value=3.14, [options, see below]); Cell options: * All options for a string cell (see above) * fmt = Format to print the number
%rosewood(&rw_handles, function=ci, value=5, lower=4.3, upper=5.7, [options, see below]); Cell options: * All options for a string cell (see above) * fmt = Format to print the number and CI
%rosewood(&rw_handles, function=next_row);
%rosewood(&rw_handles, function=end_header);
%rosewood(&rw_handles, function=footer, footer_text=“Example footer text”);
%rosewood(&rw_handles, function=control, control_line=“Example of a control line”);
%rosewood(&rw_handles, function=close);
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\testing\case_control_check.sas
Record number of records/cases/controls change after each data/proc step. Check for duplicates in cases and controls.
macro_num= Start with 1 and provide number in the increment of 1 for each macro call. case_definition= Variables which defines a case i.e is_case = 1. case_change_reason= If you know that number of cases will change from last step, please put a reason for future reference. Use %str. case_primary_key= Variables which define a unique case i.e scrphin index_date. It will be used in conjunction with case_definition parameter. control_definition= Variables which defines a control i.e is_case = 0. Use %str. Optional. control_change_reason= If you know that number of controls will change from last step, please put a reason for future reference. Optional. control_primary_key= Variables which define a unique control i.e scrphin index_date. It will be used in conjunction with control_definition parameter. Optional. filenameprefix=Name of output file and should be same in each call of the macro within same project. Datetime timestamp will be suffixed automatically. No special characters, spaces, hyphen.”” is preferred. file_location=Location of the output file and should be same in each call of the macro within same project.
This macro will create a text file with following columns: Dataset = Name of the dataset No. of cases = Number of cases based on case_definition parameter Cases changed = Change in number of cases from previous step Cases change reason = Reason of change if provided in case_change_reason parameter Cases dups = Duplicates in cases using case_primary_key and case_definition parametres No. of controls = Number of controls based on control_definition parameter Controls changed = Change in number of controls from previous step Controls change reason = Reason of change if provided in control_change_reason parameter Controls dups = Duplicates in controls using control_primary_key and control_definition parametres
%case_control_check(macro_num=1,case_definition=%str(is_case = 1),case_change_reason=first run,case_primary_key=scrphin index_date,control_definition=,control_change_reason=,control_primary_key=,filename_prefix=chk,file_location=P:\VDEC\temp); %case_control_check(macro_num=2,case_definition=%str(is_case = 1),case_change_reason=,case_primary_key=scrphin index_date,control_definition=%str(is_case = 0),control_change_reason=first run,control_primary_key=scrphin index_date,filename_prefix=chk,file_location=P:\VDEC\temp);
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\testing\data_quality-v1_1_0.sas
This macro is used to produce a html file containing : 1) Proc contents output along with missing percentage & top 10 values of each variable. 2) If table has variable BIRTHDT, it will compare every other date variable with BIRTHDT to check if any date is less than BIRTHDT. 3) Give percentage of top 10 values. 4) Create fiscal year trend graphs for dates.
This macro can be used for a specific dataset or group of datests with same prefix.
Run it using SAS Enterprise Guide(SAS EG)
ds = Name of input dataset, could one or two level STUDY_START_YR = Start Year (Default is 1985) STUDY_END_YR = Study End year (Default is current year)
example: %let dq_dir=\ras-fileprint-1\users\gurpree2\Documents\Gurpreet; %data_quality(ds=project.tst, STUDY_START_YR=1995, STUDY_END_YR=2014);
Copyright © Vaccine and Drug Evaluation Centre, Winnipeg. All rights reserved.
File location: P:\VDEC\source\testing\test_framework.sas
From within a ‘main’ testing script:
%run_test_suite( test_suite_name=Extracted data validation, test_suite_file_path=‘U:\projects\mims_emr\source\testing\validate_extracted_data.sas’, output_file_path=U:\projects\mims_emr\results\testing\Data extraction validation.txt );
The test suite file will be a SAS script consisting of valid assert statements. For example, U:\projects\mims_emr\source\testing\validate_extracted_data.sas may contain the following assert statements:
%assert_not_empty(‘Population registry is not empty’, &DS_LIBRARY..&PROJECT_CODE._orig_reg);
%let two_years_prior_date = intnx(‘year’, &STUDY_START_DATE, -&YEARS_DATA_PRIOR_TO_STUDY_START, ‘same’); %assert_field_witin_range(‘Registry contains children born 2 years prior to study start date and up to study end date’, &DS_LIBRARY..&PROJECT_CODE._orig_reg, birthdt, &two_years_prior_date, &STUDY_END_DATE); %assert_field_witin_range(‘Registry coverage dates begin 2 years prior to study start date and go up to study end date’, &DS_LIBRARY..&PROJECT_CODE._orig_reg, covdt, &two_years_prior_date, &STUDY_END_DATE);
See specific assert macro signatures below for additional valid assert statements. Assert statements need to be on a single line all by itself. No other statements can follow an assert macro call.
Authors: Barret Monchka Copyright © 2018 Vaccine and Drug Evaluation Centre, Winnipeg.