index > Software References > RAS library reference > Parsed documentation

Analysis

Tabulation

create_descriptive_table

File location: P:\VDEC\source\analysis\tabulation\create_descriptive_table.sas

Summary

Creates a descriptive table in HTML format.

Usage

If writing to HTML: Must call %open_output_file(title=, file=) beforehand and %close_output_file(file=) afterwards. It uses offset-v0_1_0.css (or later) for styling, so copy these to the project folder as well.

Parameters

title = Title of the descriptives table. The title will appear directly above the table.
dataset = The fully qualified name of the dataset (e.g. work.my_data).
data_formats: The file path of the script containing all the formats used in the provided dataset. Do not enclose the file path in quotation marks. (e.g. P:\project_name\source_dir\formats.sas).
row_vars_title = The text that will be displayed in the header of the column that lists all the row variables. This text should be unquoted. (e.g. Disease Outcomes).
row_vars = The fully qualified name of a dataset containing a character column named row_var which lists all variables, in the provided dataset, that will be listed vertically in the descriptives table.
column_var_title = The text that will be displayed in the header above the column variable values. This text should be unquoted. (e.g. Income Quintiles).
column_var = The name of the column, in the provided dataset, that will be listed horizontally in the descriptives table.
percentage_column = Show percentages in a separate colum (0 or 1).
percentage_type = The type of percentage that will be dislayed (column_percent, row_percent).
percentage_precision = The amount of decimal places that percentages will be rounded to (e.g. 1). The default and VDEC standard is to display only a single decimal place.
suppression_threshold = Any frequency counts lower than this threshold value will be suppressed. The default value and MCHP standard is 6. (For example, if suppression_threshold is set to 4, then “< 4” will be displayed in every cell that has a frequency count of 0,1,2, or 3. The corresponding percentage will also be suppressed.)
show_total_counts_in_headers = The total for the column (if percentage_type=column_percent) or row (if percentage_type=row_percent) is shown in the header (0 or 1).
show_total_percentages = The total percentage (100%) id shown as well (0 or 1).
show_total_accross = Show a total column (if percentage_type=column_percent) or total row (if percentage_type=row_percent) (0 or 1). N.B. Only the column option is currently implented.
show_yes_no_both = Do not condens binary variables (Yes/No or 0/1) to a one-liner (set to 1 if wanted).
debug_mode = Set to 1 if you want to print notes and keep intermediate datasets.
file = Location where the output file will be written (e.g. P:\project_name\output_dir\descriptives.html”).
file_type = html for a html file, rw or rosewood for a Rosewood file

Known issues

This macro is in development and many features are partially implemented
Suppression is handled fully for the core content, but not all combinations of options are fully tested.
Some features are only implemented for either row or column percentage_type.

Requirements

This macro needs Rosewood to be available in the project to work

create_rates_table-v0_1_3

File location: P:\VDEC\source\analysis\tabulation\create_rates_table-v0_1_3.sas

Summary

Create a rates table in HTML format

Usage

Must call %open_output_file(title=, file=) beforehand and %close_output_file(file=) afterwards. It uses rates_offset-v0_1_0.css (or later) for styling, so copy this to the project folder as well.

Parameters

title = Title of the rates table. The title will appear directly above the table.
dataset = The fully qualified name of the input dataset (e.g. work.my_data). The columns are the row_vars, a numerator value column and a denominator value column.
data_formats = The file path of the script containing all the formats used in the provided dataset. Enclose the file path in quotation marks. (e.g. “P:\project_name\source_dir\formats.sas”).
row_vars_title = The text that will be displayed in the header of the column that lists all the row variables. This text should be unquoted. (e.g. Disease Outcomes).
row_vars = The fully qualified name of a dataset containing a character column named row_var which lists all variables, in the provided dataset, that will be listed vertically in the rates table.
numerator_var = The variable name in dataset that contains the numerator values (e.g. count)
denominator_var = The variable name in dataset that contains the denominator values (e.g. population, person_time)
precision = Number of digits to the right of the decimal point
show_total_rate = Add an entry with the totals for the full population [0=No, 1=Yes]
print_ratio = Print ratios [0=No, 1=Yes]. WARNING: Very limited implementation at the moment
suppression_threshold = Any frequency counts lower than this threshold value will be suppresed. The default value is 6. (For example, if suppression_threshold is set to 4, then “< 4” will be displayed in every cell that has a frequency count of 0,1,2, or 3. The corresponding rates will also be suppresed.)
denominator_multiplier = Rate per N. (e.g. 1000 means rate per 1,000 denominator units)
CI_value = Confidence interval size (e.g. 95 means 95% confidence intervals will be printed)
numerator_column_title = Title for the numerator column
denominator_column_title = Title for the denominator colum
denominator_units = Units used for denominator_var
crude_rate_title = Title used for the crude rate column
offset_row_var_labels =
file = Location where the output file will be written (e.g. “P:\project_name\output_dir\descriptives.html”).

Notes

There is a lengthy to-do list. Feel free to contribute to improve the macro.

Version history

v0.1.2; Christiaan Righolt, August 2016; Fix for binary rate ratio
v0.1.1; Christiaan Righolt, August 2016; Implemented documentation house style.
v0.1.0; Christiaan Righolt, June 2016; Generalization, confidence intervals
v0.0.1; Barret Monchka, 2016; Initial version

Data Preparation

perform_attrition-v0_2_2

File location: P:\VDEC\source\data preparation\perform_attrition-v0_2_2.sas

Summary

This macro allows the creation of a text-based attrition diagram and sets an inclusion flag to off in a dataset.

Parameters

dataset_in = The input dataset with all &flag_var set to 1
dataset_out = The output dataset with the attrition rows’ &flag_var set to 0
flag_var = The variable that indicated whether a certain row should be included in the final analysis
attrition_ds = The attrition steps (include library name)
inclusion_ds = The final included groups (can be empty ds; include library name)
table_file = he file name for the output markdown attrition diagram
table_title = Table title for the attrition diagram
units_name = Name of units counted (plural), e.g. cases, patients, etc..
debug_mode = Set to 1 to run in debug mode. (This does not delete macro data-sets for troubleshooting) [optional]

###Notes Ensure the input datasets (attrition_ds=work.attrition_steps and inclusion_ds=work.included_groups in example) look right before running the macro. They can be create using (look at actual source): * N.B. Use spaces instead of tabs for spacing; * N.B. Use parentheses around logical conditions; * Attrition steps; data work.attrition_steps; infile datalines truncover; input step_name $1-48 step_exclusion $49-150;

datalines; Main hospital is rural (main_hosp_type = ‘03’ or main_hosp_type = ‘04’ or main_hosp_type = ‘05’) Main hospital is out of province or unknown (main_hosp_type = ‘12’ or main_hosp_type = ‘99’) No valid C and S results exposure_status = ‘NDT’ No hospital pharmacy data exposure_status = ‘NHP’ No ET exposure_status = ‘NET’ ; * Included groups; data work.included_groups; infile datalines truncover; input group_name $1-48 group_inclusion $49-150;

datalines; Appropriate ET (exposure_status = ‘AET’ or exposure_status = ‘AAE’) Inappropriate ET (exposure_status = ‘IET’ or exposure_status = ‘AIE’) C and S does not cover ET exposure_status = ‘UET’ ;

Requirements

This macro needs Rosewood to be available in the project to work

Version history

v0.2.1; Christiaan Righolt, October 2016; Minor Rosewood export bugfixes
v0.2.0; Christiaan Righolt, October 2016; Output to Rosewood
v0.1.1; Christiaan Righolt, August 2016; Implemented documentation house style.
v0.1.0; Christiaan Righolt, July 2016; Initial version

sasfmts_to_stata_value_label

File location: P:\VDEC\source\data preparation\sasfmts_to_stata_value_label.sas

Summary

This macro is to convert sas formats to STATA value labels and label the values as they were in SAS dataset. This macro takes in formats from work and common library and create a .do file at the desired location with “label define” and “label val” statements.

Parameters

input_ds = Input dataset name output_file_loc = Location where one wants to have the output .do file. No quotes reqd. e.g. P:\flu_eval\tests , output_file_name = Output .do file name. No quotes reqd. e.g. label_vars.do,

Usage

Example code: %sasfmts_to_stata_value_label( input_ds=project.test ,output_file_loc=P:\VDEC\test suites\SAS macros ,output_file_name=test_labels.do );

Notes

SAS formats with ranges are not converted to STATA value labels. If SAS format has missing or other value, no labels will be created for those values.

Please use notsorted option in SAS proc format statement to have label define statment created in same sort order. e.g. proc format value $rha_f (notsorted) ‘WP’= ‘Winnipeg’ ‘IE’= ‘Interlake-Eastern’ ‘NO’= ‘Northern’ ‘SO’= ‘Southern’ ‘WE’= ‘Prairie Mountain’ ‘PC’= ‘Public Trustee / In CFS care’; run; Running this macro creates ouput as ‘label define rha_f_decode 0 “WP” 1 “IE” 2 “NO” 3 “SO” 4 “WE” 5 “PC”‘. If ‘notsorted’ is not used, output line will be ‘label define rha_f_decode 0 “IE” 1 “NO” 2 “PC” 3 “SO” 4 “WE” 5 “WP”’

Since STATA doesn’t allow value labels to character variables, so new variable will replace the old variable with numeric values and assign appropriate value labels.

To use value labels in your STATA script: Call the .do file in the script

TODO:

We may not want formats for few variables in STATA. One way is to remove format before calling this macro or we can create a new parameter with format names not required in the output.
Export the sas dataset.

Version history

v3.0.0; Gurpreet Pabla, Oct 2017; Bug fix and removed warning message
v2.0.0; Gurpreet Pabla, August 2017; Totally revamped
v0.1.0; Gurpreet Pabla, April 2017; Initial version

Conditions

import_condition_db

File location: P:\VDEC\source\data preparation\conditions\import_condition_db.sas

Summary

This macro imports the standard format condition database into sas from 3 csv files based on the standard spreadsheet format.

Parameters

conditions_csv = The csv file with conditions
conditions_criteria_csv = The csv file for condition criteria
conditions_match_patterns_csv = The csv file for pattern codes to match
db_name_cond = Should be project.{proj_name}_ref_cond
db_name_cond_criteria = Should be project.{proj_name}_ref_cond_criteria
db_name_cond_match_patterns = Should be project.{proj_name}_ref_cond_match_patterns

Notes

Due to SAS limits only the first 32,767 rows are evaluated to determine the format of the columns. This can lead to problems when your data has more observations.

Drugs

characterize_drug_usage

File location: P:\VDEC\source\data preparation\drugs\characterize_drug_usage.sas

Summary

New macro for characterizing drug use. Allows for both single ATC drugs and drug groups. Calculation of duration accounts for the full history of each person’s drug use.

Usage

Example code

Non-repeated mode - unequal length periods: %describe_drug_use( drug_filepath = “P:\VDEC\test suites\SAS macros\Test_drug_macro\docs\drug_trees3.csv”, periods = (0,1)(1,5)(5,10), periodunit = year direction_ = -1, cohortds = cohort, entry_datevar = subject_entry_date, exit_datevar = index_date, dpdds = health.mhdp_udm_1998apr_v3, dpinds = project.cll_epi_orig_dpin, outputds = test_output, mode_configfile = “U:\Documents\cll_epi\source\mode_config.csv”, postfix = _a );
Repeated mode - equal length periods %describe_drug_use( drug_filepath = “P:\VDEC\test suites\SAS macros\Test_drug_macro\docs\drug_trees3.csv”, periods = (0,5), periodunit = year direction_ = -1, cohortds = cohort, entry_datevar = subject_entry_date, exit_datevar = index_date, dpdds = health.mhdp_udm_1998apr_v3, dpinds = project.cll_epi_orig_dpin, outputds = test_output, mode_configfile = “U:\Documents\cll_epi\source\mode_config.csv”, postfix = _a, repeated_mode = 1, numperiods = 10 );

Input and output

Input parameters: * drug_filepath = The path of the CSV file specifying drug structures of interest. * periods_ = Periods of interest * Non-repeated mode: (period_1 start, period_1 end)(period_2 start, period_2 end)… * Repeated mode: (overall start, repeated period length) * periodunit = Unit of the periods, default is year. * direction_ = The direction of the periods, i.e. 1 for forward and -1 for backward. * cohortds = The name of the dataset containing the cohort of interest. * entry_datevar = The name of the entry date variable. * exit_datevar = The name of the index date variable. * dpdds = The name of the DPD dataset. * dpinds = The name of the DPIN dataset. * outputds = The name of the output dataset. * id_var = identification variable, default scrphin. * drugtree_toolpath = The path of the drugtree_tool macro. (Default path is P:\VDEC\source\data preparation\drugs\drug_tree_tools.sas) * mode_configfile = The configuration file path specifying the mode of each drug class. * 1: full mode –> number of prescriptions, ever-use, duration and dose information will be extracted. * 0: ever-use mode –> ever-use information and number of prescriptions only. * postfix = (optional) The postfix needed to name the generated variables, e.g. _a for drug usage after the index date. * repeated_mode = Used for a number of periods with equal length, default is 0. * numperiods = The number of equal length of period, default is 1 and used in combination with repeated_mode = 1. * output_name_table = Export a lookup table for the generated variables and the original drug names, default is 1. * drug_var_table_nm = The name of the drug name look up table, default is drug_var_names.

Output dataset: * For each of the period, the duration, total dose in DDD and ever-use of each drug group are calculated. * The generated variables are: * ever_{drug group name}p{period order number}, * duration{drug group name}p{period order number}, * ddd{drug group name}p{period order number}. * period order number are determined according to the input order in the periods parameter.

Examples of supplemantal files

Drug tree file (csv): search_for,group_under,active,notes,last_updated,updated_by C10AA01,simva,1,,2017-09-20, C10AA02,lova,1,,, C10AA03,prava,1,,, C10AA04,fluva,1,,, C10AA05,atorva,1,,, C10AA06,ceriva,1,,, C10AA07,rosuva,1,,, prava,hydro,1,,, rosuva,hydro,1,,, simva,hipoml,1,,, atorva,hipoml,1,,, fluva,lopoml,1,,, lova,lopoml,1,,, hydro,anystatin,1,,, hipoml,anystatin,1,,, lopoml,anystatin,1,,, C10AA06,anystatin,1,,,

Mode configuration file (csv): drug_class,full_mode,drug_label simva,0,“Simvastatin” lova,1,“Lovastatin” fluva,1,“Fluvastatin” prava,0,“Pravastatin” atorva,1,“Atorvastatin” ceriva,0,“Cerivastatin” rosuva,0,“Rosuvastatin”

Macro strucure

validate inputs datasets with required vars
read and structure input drug hierarchy
read mode configurations
for each active drug element
for each period
flag whether prescriptions are within the period
determine the effective portion of prescriptions
calculate ever-use, duration, dose and number of prescriptions
label generated vars

TODO

Search by DIN.

find_drug_strength_in_description

File location: P:\VDEC\source\data preparation\drugs\find_drug_strength_in_description.sas

Summary

Find strength and unit information in the drug description variable. Modify the DPD data with the desired information.

Parameters

imput_dpdds = The name of the original DPD dataset
output_dpdds = The name of the modified DPD dataset
output_existingatc = The name of a dataset containing ATC codes of interest available in the DPD dataset
atc_patternds = The dataset contains all the ATC codes/patterns of interest
keyword_ = Keyword for the ingredient of intest, e.g. metformin for combined metformin drugs
DDD = The manually defined DDD
DDD_unit = The manually defined DDD unit

Note

The DDD and DDD unit information are still missed for combined drugs. Manually setting these variables are required. For the metformin example, the DDD for pure metformin is used.

Outputs

output_dpdds: The modified DPD dataset, with strength and strength_unit filled
output_existingatc: A dataset containing ATC codes found in the DPD dataset

Example

%find_strength_in_description(input_dpdds = health.mhdp_udm_1998apr_v3, output_dpdds = dpd_cmb_metformin, output_existingatc = available_metformins, atc_patternds = cmb_metforminatc, keyword = metformin, DDD = 2, DDD_unit = “G”);

impute_DPIN_costs

File location: P:\VDEC\source\data preparation\drugs\impute_DPIN_costs.sas

Summary

This macro imputes the drug costs for DPIN records, using a modified version of MCHP’s algorithm. It outputs the imputed cost (in reference year $).

Parameters

dpin_ds = The input DPIN dataset
cohort_ds = The cohort for the study
output_ds = The output costing dataset
date_var_DPIN = The relevant date for the prescription in DPIN (e.g. prvddt)
date_var_cohort = The index date for the person (e.g. index_date)
start_dt_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(),
end_dt_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(),
selection_criteria = Put a condition on DPIN dataset (e.g. %str( din not= ‘00000001’) ).This macro variable resolves within a data step. Always use %str(),
CPI_db = The dataset with consumer price index (CPI) information. This set should at least include a variable year for the year and &CPI_var. e.g. cpi.
CPI_var = The CPI-based multiplier, i.e. (prescription drug CPI for &CPI_base_year)/(prescription drug CPI for year of interest) e.g drugs_multiplier.
CPI_base_year = The base year for the CPI comparisons. [This will only be used to generate the variable label]

Notes

The imputed cost is the sum of the drug ingredients cost and the dispensing fee. See in-code comments for details

CPI data can be prepared as below: proc sql noprint;

select Manitoba_all_items, Manitoba_prescribed_medicines into :Manitoba_base_CPI_all, :Manitoba_base_CPI_drugs from project.vdec_cpi_2000_2015 (Updated consumer price index data can be downloaded from statcan for all items and prescribed medicines) where year = &BASE_DOLLAR_YEAR;

create table cpi (label = “CPI ratios for comparison of health care costs”) as select year, &BASE_DOLLAR_YEAR as base_year label=“Base year for comparison”, &Manitoba_base_CPI_all./Manitoba_all_items as all_items_multiplier label = “Manitoba CPI ratio for all items”, &Manitoba_base_CPI_drugs./Manitoba_prescribed_medicines as drugs_multiplier label = “Manitoba CPI ratio for prescription drugs” from project.vdec_cpi_2000_2015;

quit;

Health Utilization

health_utilization

File location: P:\VDEC\source\data preparation\health utilization\health_utilization.sas

Summary

This has four macros which imputes the physician visit & hospitalization, cost of visit & hospitalization. phys_visits: Number of physician claims for office visits(MH prefix=7). Multiple claims on one day is counted as 1 visit. phys_cost: Cost of physician claims. Multiple claims on one day is counted as 1 visit. Multiple conditions can occur during the same visit, so a distinct on scrphin, net_fee, servdt, md, prefix, tariff is applied. hosp_visits: Number of hospitalizations. Multiple conditions occuring during the same admission is counted as one. hosp_cost: Hospitalization cost.

Parameters for Physician Visits

phys_ds = The input Medical Services dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix nobs_phys), date_var_cohort = The index date for the person (e.g. index_date), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Medical Services dataset (e.g. %str( tariff not= 4344) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort.

Parameters for Physician costs

phys_ds = The input Medical Services dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix cost_phys), date_var_cohort = The index date for the person (e.g. index_date), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Medical Services dataset (e.g. %str( tariff not= 4344) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort, CPI_db = The dataset with consumer price index (CPI) information. This set should at least include a variable year for the year and &CPI_var . e.g. cpi, CPI_var = The CPI-based multiplier, i.e. (all items CPI for &CPI_base_year)/(all items CPI for year of interest) e.g all_items_multiplier, CPI_base_year = The base year for the CPI comparisons. [This will only be used to generate the variable label]

Parameters for Number of Hospitalization

hosp_ds = The input Hospital Abstract dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix nobs_hosp), date_var_cohort = The index date for the person (e.g. index_date), date_var_hosp = The relevant date in Hospital abstract (e.g. admdt), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Hospital Abstract dataset (e.g. %str( transact = ‘1’) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort.

Parameters for Hospitalization costs

hosp_ds = The input Hospital Abstract dataset, cohort_ds = The cohort for the study, output_ds = The output dataset which will be all the variables from cohort_ds + new variables created(prefix cost_hosp), date_var_cohort = The index date for the person (e.g. index_date), date_var_hosp = The relevant date in Hospital abstract (e.g. admdt), start_dt_var_cohort=Start date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), end_dt_var_cohort=End date in cohort dataset (optional). If not blank, it should be the date variable or intnx statement using %str(), duration_unit = Units of time period for which number of visits are required (e.g. year, month, day). It accepts all values accepted by SAS INTNX function, duration_period = Time period for which number of visits are required. Time periods should be given within brackets (e.g. (0,1)(1,2) to calculate number of visits between 0 & 1{duration_unit} and 1 & 2{duration_unit}), selection_criteria = Put a condition on Hospital Abstract dataset (e.g. %str( transact = ‘1’) ).This macro variable resolves within a SQL query. Always use %str() and prefix variable name with “a.” if the same variable exists in the cohort, costing_vars=Variables which have costing information (e.g. %STR(riw,inptriw) ).Always use %str()., cpwc_dollars=cost per weighted case or cost of a standard hospital stay(e.g. 6152),

Notes

The number of physician visits is restricted to just office visits i.e prefix=‘7’. The imputed cost for physician claim is sum of net_fee multiplied by CPI-based multiplier for each year. Time period of (0,1) includes both dates. The imputed cost for hospitalization is sum of Resource Intensive Weights(RIW) & Day procedure Groups(DPG) multiplied by cost of a standard hospital stay(cpwc). For physcian claim costs, the columns net_fee, servdt, md, prefix, tariff are required in physician dataset. Run MCHP %net_med macro on physician dataset (if not ran during data extraction or else) before using phys_visits & phys_cost macros

If date_var_cohort is 2012-03-25 and duration_period is (-1,1) and duration_unit is year then For physician claim visits or hospitalization: Duration will be 2011-03-25 <= date_var_cohort < 2013-03-25 i.e includes start of duration_period but less than end of duration_period. For physician claim costs or hospitalization costs: Duration will be 2011-03-25 <= date_var_cohort <= 2013-03-25 i.e includes both start of duration_period and end of duration_period.

CPI data can be prepared as below: proc sql noprint;

quit;

cpwc_dollars can be found on CIHI website for specific year.

Version history

v1.0.0; Gurpreet Pabla, October 2016; Initial version

Matching

create_case_control_sets-v3_0_0

File location: P:\VDEC\source\data preparation\matching\create_case_control_sets-v3_0_0.sas

Summary

This file provides macros to perform cost-based risk set sampling to go from survival data to a nested case control setup

The current version calculates the cost based on index date and a caller-supplied list of variables.

Usage

Random seeds

For some reason SAS cannot centrally set a random stream. It needs to be done for each datastep. Because there are two randomization steps in the sampling process and one in matching risk set macro, three different random seeds need to be provided. Run the following code thrice to get random numbers for the seeds (ALWAYS RERUN THIS FOR NEW PROJECTS): %let MAX_SEED = %eval(2*31-1); %let TIME_NOW = %sysfunc( time() ); %let FRAC_PART_TIME = %sysevalf( &TIME_NOW - %sysfunc(floor(&TIME_NOW)) ); %let TIME_SEED = %sysfunc(round( %sysevalf(&MAX_SEED&FRAC_PART_TIME) )); %put; %put Your seed is: &TIME_SEED;

Data preparation

Data needs to be prepared as survival time data before using this macro. Each record should have a person identifier, a subject entry date, an exit date, and a flag indicating whether or not it is a case.

Sampling the follow-up time

The follow-up time is sampled for potential control index dates based on the prepared survival dataset.

Example code: %sample_follow_up_time( input_set = person_survival_list, output_set = sampled_subjects, random_seed_A = &RANDOM_SEED_N_SAMPLES, random_seed_B = &RANDOM_SEED_SAMPLING, person_id_var = scrphin, time_per_sample = &DELTA_T, case_var = is_case, entry_var = subject_entry_date, exit_var = subject_exit_date, debug_mode = 0 );

input parameters: * input_set = Input survival dataset * output_set = Sampled output dataset * random_seed_A, random_seed_B = Two different random seeds (see above) * person_id_var = variable that uniquely identifies a person (default is scrphin) * time_per_sample = Number of days per sample (e.g., 365 means number of sampled index dates = years of follow-up [with random rounding]) * case_var = Variable for the case flag (default = is_case) * entry_var = Name of the entry date variable (default = subject_entry_date) * exit_var = Name of the exit date variable (default = subject_exit_date) * debug_mode = If set to 1, does not delete macro data-sets for troubleshooting. [optional]

The output are subjects with an index date which is either the case date (if case) or randomly sampled from the follow-up time (if not case). The output subjects are persons on a specific date. Only the sampled dates will be considered for matching.

Add additional covariates

Add whatever additional covariates (for matching) based on the person-index date combinations.

Match selection

Gets the best matches based on the sampled index dates Example code: %match_risk_sets( input_set = sampled_subjects_with_match_vars, output_set = sampled_risk_set_list, random_seed_C = &RANDOM_SEED_SAMPLING_C, controls_per_case = &N_CONTROL, exact_match_vars = sex favourite_pet, max_cost = &max_cost, index_var = index_date, costing_vars = birthdt covered_days, case_var = is_case, set_var = _set, cost_var = _cost, person_id_var = scrphin, entry_var = subject_entry_date, debug_mode = 0 );

input parameters: * input_set = Input sampled subject list (%sample_follow_up_time output + additional vars) * output_set = Risk sets for case control study * random_seed_C = Different random seed from random_seed_A or random_seed_B (see above) * controls_per_case = Number of controls to sample per case * exact_match_vars = variables for which case and control values must be equal (separated by spaces) * max_cost = maximum cost allowed for matching (e.g., 365) * index_var = the index date variable, e.g. index_date * costing_vars = variables used to evaluate cost of a match (excluding index_date, which is added automatically). * case_var = variable in which the case flag is stored (default = is_case) * set_var = variable in which the set number is stored (default is _set) * cost_var = variable in which the match cost is stored (default is _cost) * person_id_var = variable that uniquely identifies a person (default is scrphin) * entry_var = variable that contains the subject’s entry date. * debug_mode = If set to 1, does not delete macro data-sets for troubleshooting. [optional]

The output are the sampled risk sets with matching costs.

Version history

v3.0.0; Gurpreet Pabla, June 2017; Added a “sort the input dataset” step in sample_follow_up_time macro, so output won’t vary by any sorting done by the user.
v2.0.0; Gurpreet Pabla, May 2017; Added a “sort the input dataset” step in match_risk_sets macro, so output won’t vary by any sorting done by the user.
v1.4.0; Gurpreet Pabla, April 2017; Assign random number to a control, so if the cost of two controls are same and we have to chose one; it will be chosen randomly.
v1.3.0; Patrick Niesink, March 2017; Added support for caller-specified costing variables.
v1.2.0; Gurpreet Pabla, February 2017; Increased efficiency
v1.1.0; Christiaan Righolt, January 2017; Bugfixes
v1.0.0; Christiaan Righolt, January 2017; Initial version, cost based on index date and birth date only.

diagnostic

File location: P:\VDEC\source\data preparation\matching\diagnostic.sas

Summary

This file provides diagnostics like controls per match type etc. for case control matching.

Usage

Diagnostics

Example code: %diagnostic( indata=sampled_risk_set_list, case_var=is_case, set_var=_set, controls_per_case=5, type=match_type, exact_match_vars=sex, continuous_var=age, categorical_var=non_pan_flu_vaccine filename_and_location=P:\flu_eval\tests\diagnostic.txt);

input parameters: * indata = Input dataset with cases and controls * case_var = variable in which the case flag is stored (default = is_case) * set_var = variable in which the set number is stored (default is _set) * controls_per_case = Number of controls to sample per case * type = variable in which match type is stored like match done on 1) residence 2) Physician etc (optional) * exact_match_vars = variables for which case and control values must be equal (separated by spaces) (optional) * continuous_var = Variables for which one want to have mean, median and other stats. Should be continuous variable like age (separated by spaces)(optional) * categorical_var = Should be categorical variable like rha (separated by spaces)(optional) * title = Title of the table, default is “Diagnostics” * filename_and_location = provide location and filename. * suppress_number = Suppress the cells with individuals or events if they are less than a particular value. Default is 6.

The output is the rosewood generated text file * Number of cases & controls(regardless of set) * Number of sets(cases) by number of controls in each set i.e number of cases with 0,1,.. matches. * For continuous variable, average RMS(root mean square) value over all sets. * Stratified average RMS value by exact match variables * For categorical variable, calculate the number of set where 1,2,3,.. controls have concordant, discordant or missing values for controls.

Note: * Cases with no controls are excluded from continuous variable and categorical variable analysis.

TODO: * Handle suppression better

Version history

v3.0.0; Gurpreet Pabla, August 2017; Small bug fix and new parameter added “suppress_number”.
v2.0.0; Gurpreet Pabla, July 2017; New parameter added “number of controls to sample per case”.
Geng Zhang, July 2017; Format adjusted, create control parameters for column widths.
v1.0.0; Gurpreet Pabla, June 2017; Initial version.

Process Original Databases

combine_ICU_data

File location: P:\VDEC\source\data preparation\process original databases\combine_ICU_data.sas

Summary

This macro combines the ICU records from both the CCIC database and the hospital abstract to generate one combined ICU dataset.

Parameters

input_ccic_ds = The original CCIC dataset
input_hosp_ds = The original hospital abstract
output_icu_ds = The output combined ICU dataset
n_scu_records = The number of different SCU records in the hospital abstract [optional]
debug_mode = Set to 1 to run in debug mode. (This does not delete macro data-sets for troubleshooting) [optional]

Notes

Ensure SCU info is included in the hospital abstract for your project.
See @def comment for merging details.

Version history

v1.0.1; Christiaan Righolt, October 2016; Fix bug, add debug mode
v1.0.1; Christiaan Righolt, August 2016; Implemented documentation house style.
v1.0.0; Christiaan Righolt, July 2016; Initial version

identify_immunizations-v1_0_4

File location: P:\VDEC\source\data preparation\process original databases\identify_immunizations-v1_0_4.sas

Summary

This macro identifies immunization status as covariates. Currently it outputs the status for: * Influenza vaccine in the previous 12 month * Pneumococcal vaccine

Parameters

dataset_in = The input dataset that should at least include the scrphin and index_date
dataset_out = The output dataset with immunization status added
data_set_MIMS = The MIMS dataset for the project
index_date = The index date (date of interest on which the status is determined)
influenza_var = Output variable name for the influenza immunization status [optional]
pneumococcal_var = Output variable name for the pneumococcal immunization status [optional]

Version history

v1.0.4; Gurpreet Pabla, January 2017; Removed condition_id variable. Code reviewed as well.
v1.0.3; Christiaan Righolt, August 2016; Implemented documentation house style.
v1.0.2; Christiaan Righolt, July 2016; Added/changed labels, pneumococcal label includes start year of data
v1.0.0; Christiaan Righolt, June 2016; Initial version

process_hospital_abstract

File location: P:\VDEC\source\data preparation\process original databases\process_hospital_abstract.sas

Summary

This macro processes the original Hospital Abstracts database by: * Creating a combined hospital visit from multiple hospital records, when periods overlap or are sequential. * Getting the admitting and main hospital for the visit.

Parameters

input_hosp_ds = The original Hospital Abstracts database
output_hosp_ds = The filtered verion of Hospital Abstracts database
hosp_visit_max_days_gap = The max number of days between seperation and discharge to count as one visit (1 means next day)

Notes

The output is the same as the input except * admdt is the admission for the combined visit, admdt_abstract is the admission date for the original abstract row * sepdt is the discharge for the combined visit, sepdt_abstract is the discharge date for the original abstract row * first_hosp, first_hosp_type are the hospital and type for the initial admission for the visit * main_hosp, main_hosp_type are the main hospital and type for the visit

process_MH_registry

File location: P:\VDEC\source\data preparation\process original databases\process_MH_registry.sas

Summary

This macro processed the full (2+ records per year per person) original Manitoba Health (MH) registry by: * Selecting the most latest birthday and gender by acquisition date from all entries for a person * Identifying MH coverage periods * Identifying address periods (an address is the combination of municipal code and postal code) * Calculating several additional variables (covariates) based on MCHP macros

The output is the same as the input except * birthdt and gender is picked of the latest acquisition date for a person and unique for each scrphin * coverage_start is the start date of the combined/unified coverage period * coverage_end is the end date of the combined/unified coverage period * address_start is the start date at the address * address_end is the end date at the address * rha_codeSee/use the MCHP format $rhal. format for code mapping, “PC” means public trustee or in CFS care * rhad_codeSee/use the MCHP format $rhal. format for code mapping, “PC” means public trustee or in CFS care * CA_codeSee/use the MCHP format $rhal. format for code mapping * NC_codeSee/use the MCHP format $rhal. format for code mapping * locality_residenceOutput as R(ural), U(rban), N(ot found) * income_quintileOutput as number 1-5 (or NF for not found) * sefi * material_deprivation * social_deprivation

Parameters

input_reg_ds = The original MH registry. This dataset name must be fully qualified by prefixing it with the library name (ie. work.project_registry).
output_reg_ds = The processed version of the MH registry
rha_macro = The version of the MCHP rha macro to use (e.g. _rha12)
rha_district_macro = The version of the MCHP rha district macro to use (e.g. _rhad12)
census_year = The census year for which the income quintile, SEFI, material and social depreviation are calculated. Use MCHP format (e.g. 2006 = 06). *Must be between 79 and 13, inclusive.
get_summary = Run the code to determine the birthdays and periods [Optional, set to 0 if not required.]
get_address_info = Run the code to determine the locality information derived from the birthday [Optional, set to 0 if not required.]
debug_mode = Set to 1 to run in debug mode. (This does not delete macro data-sets for troubleshooting) [optional]

Version history

Gurpreet Pabla, Jan 2017; Bug fix around address dates. Impacted population due to this change is .01%. Gender is to be picked by latest acquire date.
v2.0.5; Gurpreet Pabla, July 2017; Changed the way how coverage end is defined when there is coverage after cancel code. Bug fix for complex address changes.
v2.0.4; Gurpreet Pabla, March 2017; Bug fix: phins with registration changes leads to multiple rows with different address start and end dates when there is no change in address.
v2.0.3; Patrick Niesink, March 2017; added warning if census_year is invalid (must be between 79 and 13)
v2.0.2; Gurpreet Pabla, December 2016; Code review done.
v2.0.2; Christiaan Righolt, September 2016; Coverage can only start at birth, add RHA districts, minor items
v2.0.1; Christiaan Righolt, September 2016; Fix issue with coverage dates not matching between records for one person.
v2.0.0; Christiaan Righolt, September 2016; Made code modular. Change options
v1.1.4; Christiaan Righolt, August 2016; Bug fixes: Error when there are no duplicate birth days. Allow skipping of coverage search. Only get areas in Winnipeg for WRHA.
v1.1.3; Christiaan Righolt, August 2016; Implemented documentation house style.
v1.1.0; Christiaan Righolt, June 2016; Added income quintiles, sefi, mat and soc depreviation. Added rural/urban flag (v1.1.1). Label changes (v1.1.2).
v1.0.0; Christiaan Righolt, May 2016; Initial version

Vaccination Completeness

vaccination

File location: P:\VDEC\source\data preparation\vaccination completeness\vaccination.sas

Summary

It tells about immunization completeness status.

Parameters for Physician Visits

input_cohort_ds = The cohort for the study, input_ds = Input dataset e.g. MIMS or CPCSSN dataset. It should be in general form and atleast contain variables with names as scrphin, service_dt, code and coding_system. service_dt is the date variable for the date when vaccine is administered. code e.g. tariff(MIMS) or ATC code(CPCSSN) variable. coding_system e.g. ATC or MH-Tariff., vaccines_of_interest = Vaccine and duration to know the status for, input_vaccine_composition = Vaccine name linking to antigen. Default is vdec_vaccine_composition, input_vaccine_agent = Code linking to antigens. Default is vdec_vaccine_agent, input_childhood_immun_schedule = Schedule details when policy implemented etc.. Default is vdec_vaccine_childhood_schedule, input_childhood_immun_dosage = Contains doses reqd and at what age etc.. Default is vdec_vaccine_childhood_dosage, PHIN_var = Default is scrphin, birth_dt_var = Default is birthdt, gender_var = Default is gender. values as created by registry macro, output_widefmt_binary_ds = Output dataset with one row per phin with vaccines as columns and complete vaccination as 1=yes and 0=no, output_widefmt_date_ds = Output dataset with one row per phin with vaccines as columns and date when the vaccination completes, output_longfmt_ds = Output dataset with one row for each vaccine. days_gap = The max number of days between consecutive vaccine service dates to count as one visit. Default is 0. debug_mode = Default is 0, keep_intermediate_schedule_data = Keep immunization_schedule dataset. Default is 0.

Example: data indata; infile cards dlm=‘,’ dsd pad; length vaccine $20. duration $15.; input vaccine $ duration $; cards; DTaP,2 year DTaP,7 year PCV,2 year MMR,2 year Varicella,2 year ; RUN;

%vaccination_schedule( input_cohort_ds = &output_set_cohort., input_ds = &input_set_mims., vaccines_of_interest = indata, input_vaccine_composition = vdec_vaccine_composition, input_vaccine_agent = vdec_vaccine_agent, input_childhood_immun_schedule = vdec_vaccine_childhood_schedule, input_childhood_immun_dosage = vdec_vaccine_childhood_dosage, PHIN_var = scrphin, birth_dt_var = birthdt, gender_var = gender, output_widefmt_binary_ds = vaccine_widefmt, output_widefmt_date_ds =, output_longfmt_ds =, days_gap = 3, debug_mode = 0, keep_intermediate_schedule_data = 1 );

Notes: Use P:\VDEC\ref\load_vaccination_lookup_tables.sas to load vaccine reference files.

Version history

v2.0.0; Gurpreet Pabla, August 2018; Updated tomake it more generalized, add warnings and output options.
v1.0.0; Gurpreet Pabla, October 2017; Initial version

####TODO Lot of things like structure the output properly, HPV and Hep-B are two dose vaccine given after a certain time after 1st dose etc.

Documentation

code_diary

File location: P:\VDEC\source\documentation\code_diary.sas

Summary

This macro parses a main file for include statements and will read specifically formatted comments to create code-generated documentation.
It does reads include files recursively as long as they have the format: %include “C:\dir\file.sas”;
The output is formatted as a markdown file, which can be transfered/translated in your file format of choice.
The macro also reads included Stata files as long as the stata main is called using SAS’ x command window and all relative path macro variables are identically defined in both SAS and Stata.

Usage

Comment block

The main file and included scripts need to incorporate specially formatted comment blocks to be included in the output file. An example is: The special comment block is opened with:/** The special comment block is closed with:/ / An example of use inside the blocks is: @main :title The best documentation ever @main :authors Author One; Author Two; Author Three @main :org Our Glorious Institute @main :version 1.2.3 @main A general comment @def ABC = The alphabet @stat The p-values are calculated using the carrot test for bunnies @note I’m writing this example on a Friday afternoon… A line without keyword

Single line comments

Parsable single line comments are also supported: The line starts with: ** The line ends with: ;

Document structure

The lines are all grabbed and can be organized by using @keyword for the same themes (e.g. @def in the example, will put all @def lines together). Use . as a seperator for multi-level keywords, e.g. @covar.cost. There are a few special tags (e.g. @main :title) that grab the document metadata. If a line within a comment block does not have a keyword it is seen as a continuation of the previous line (in the same block)

The output can be converted to any format (e.g. pdf [needs latex installed], word, html) using pandoc. (See Google for details) Go to the directory in the command window and use * markdown to pdf: pandoc infile.txt –toc –latex-engine=pdflatex -o outfile.pdf * markdown to word: pandoc -s -S infile.txt –toc -o outfile.docx * markdown to html: pandoc -s infile.txt –toc -o outfile.htm

Parameters

input_main_file = Is the main file for the sas project tree, all files/scripts called from this main will be read recursively. (e.g P:\project\source\main.sas)
out_dir = Is the output folder (e.g. P:\project)
out_file = Is the resulting markdown file in which the results are written with script and line information. (e.g. workplan_coding.txt)
out_file_scrubbed = Is the resulting markdown file in which the comments without script and line information is written. [Optional (e.g. workplan_output.txt’)]
debug_mode = Set to 1 to run in debug mode. (This does not delete macro data-sets for troubleshooting) [optional]
section_aliases = This is the dataset with keyword aliases to cause multiple keywords to map to the same section [optional]. See example for structure.
section_order = This is the dataset to overwrite the order of sections (all values should be negative, with the lowest order number coming first) [optional]. See example for structure.
section_headers = This is the dataset to determine section headers [optional]. See example for structure.
sections_scrubbed = This is the dataset with a list of sections to omit from the scrubbed file [optional]. See example for structure.

Notes

There is a practical limit to the number of include files because of the creation of the dataset includes&curr_script_no_text. When this exceeds 32 characters it will cause an error, because of internal sas limits.
The maximum of several fields is hard-coded under the comment “Define character lengths” with several %let statements. Adjust these if needed for longer comments.
The warning
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation marks.
Is turned off for the duration of the macro.
Keyword comments are all saved in their own intermediate datasets, which need to meet SAS 32 character limit. This should not cause any issues as long as keywords are 20 chars or less. (Use section_headers entries to define longer headings in the created report.)

Example

The following example demonstrates the use of the macro, the section_: inputs should have the same variable names as the example: data work.alias_list; infile datalines; input short_keyword $1-10 long_keyword $11-50;

datalines; cond condition cov covariate covar covariate def definition descr descriptive incl inclusion excl exclusion exp exposure out outcome var variable ;

data work.order_list; infile datalines; input keyword $1-15 order_no 16-20;

datalines; todo -999 assert -998 main -100 definition -94 variable -92 note -90 inclusion -85 exclusion -80 condition -70 exposure -60 outcome -50 covariate -40 descriptive -30 rate -20 stat -10 no_keyword 0 ;

data work.header_list; infile datalines; input keyword $1-15 header $16-50;

datalines; condition Condition criteria covariate Covariates definition Definitions descriptive Descriptive tables exclusion Exclusion criteria exposure Exposures inclusion Inclusion criteria main Main no_keyword General note Notes outcome Outcomes rate Rate tables stat Statistics todo Task list variable Variables ;

data work.scrub_list; infile datalines; input keyword $1-15;

datalines; assert todo ;

%parse_comments( input_main_file = &SOURCE_ROOT\main_sas.sas, out_dir = P:\project
out_file = workplan_coding.txt, out_file_scrubbed = workplan_output.txt, section_aliases = work.alias_list, section_order = work.order_list, section_headers = work.header_list, sections_scrubbed = work.scrub_list );

Final

convert_markdown_to_html

File location: P:\VDEC\source\documentation\convert_markdown_to_html.sas

Summary

Macro to convert a markdown style document to html. This is not supposed to supplant proper tools like pandoc, but only to aid visualization on a closed system that does not allow the installation of software.

Usage

The tool automatically creates a table of contents

The tool supports the following markdown markup: * Pandoc-style metadata prefixed with % * Headers prefixed with N### depending on the level (e.g. #### is 4th level) * Unordered lists prefixed with * * Order lists prefixed with n.

Not supported is: * Nested lists * Bold/italic text * Links, images etc.

Parameters

in_file_md = = The input markdown file with documentation (e.g. “C:\source\source_documentation.md”“)
out_file_html = = The output html file with documentation (e.g. “C:\source\source_documentation.htm”“)
debug_mode = Use debug mode or not [0 or 1, optional]

Final

generate_macro_documentation

File location: P:\VDEC\source\documentation\generate_macro_documentation.sas Script to parse the macro and script documentation for the VDEC source folder

macro_diary

File location: P:\VDEC\source\documentation\macro_diary.sas

Summary

Macro to parse documentation for all sas macros and scripts in a root directory.

Usage

See sample_macro_documentation for markup of files.

Parameters

source_dir = The source root directory (e.g. C:\source), the folder &source_dir._archive is skipped
out_file_md = The output markdown file with documentation (e.g. ‘C:\source\source_documentation.txt’)
debug_mode = Use debug mode or not [0 or 1, optional]

Final

sample_macro_documentation

File location: P:\VDEC\source\documentation\sample_macro_documentation.sas

Summary

This file is a sample/template for VDEC macro documentation that describes what sections should be included. This documentation is written in markdown.

Usage

General comments on how to use it. (e.g. macro’s to call before and or after this one)

A list of parameters (and their meaning) used in the macro. * parameter_1 = First parameter * parameter_2 = Second parameter [optional]

Notes

Other detailed information about how to use the macro properly (e.g. details on the format of certain parameters)

Known issues

List of known issues/warnings/etc. for the macro. * This file is not really code, just documentation

Version history

v1.1.1; Programmer Two, August 2016; Implemented documentation house style.
v1.1.0; Programmer Two, June 2016; A way better version with added functionality XYZ
v1.0.0; Programmer One, May 2016; Initial version

General

rosewood-v1-style

File location: P:\VDEC\source\general\rosewood-v1-style.sas

Summary

A macro to write output to Rosewood, a structured table language. The tables create by Rosewood are human readible. Use Carpenter to translate these tables into various other formats, including Word, HTML and Latex.

Descriptions

Rosewood is a macro that writes directly to a text file. The file and related content are referred to by Rosewood handles, which contains four global macro variables. These can be defined as follows: %global current_line_T1 control_T1 row_T1 column_T1; %let rw_handles = file_name = ‘C:\project dir\Table 1.txt’, current_line_macro = current_line_T1, control_macro = control_T1, row_num = row_T1, column_num = column_T1;

Syntax

All options are shown with their relevant macro calls as examples.

There are two options: * Write a fulll data set directly to a Rosewood file. * Manually control what is written, in this case the order is: Open, title, header cells (cell, number ci), end_header, main cells (cell, number ci), footer, close. Also use next_row and control were appropriate.

Write a full dataset to Rosewood

%rosewood( &rw_handles, function = write, title = , footer = , dataset_name = work.dataset_to_write ); The input dataset_name should include the library name.

Open a Rosewood table

%rosewood(&rw_handles, function=open);

Create the table title

%rosewood(&rw_handles, function=title, title_text=“Table 1: Example of a table title”);

Write a string to the next cell in the table

%rosewood(&rw_handles, function=cell, content=“Content of cell”, [options, see below]); Cell options: * rows = Number of rows for cell to span [optional, default is 1] * columns = Number of columns for cell to span [optional, default is 1] * width = Width of cell in characters (will not show up in control section) [optional, default is 32] * align = Alignment of the cell [optional: either centre, center, left or right] * indent = Indent the cell contents [optional set to 1 if yes] * bold = Bold the cell contents [optional set to 1 if yes] * emphasize = Emphasize the cell contents [optional set to 1 if yes] * plain = Have the cell contents formatted plain [optional set to 1 if yes]

Write a number to the next cell in the table

%rosewood(&rw_handles, function=number, value=3.14, [options, see below]); Cell options: * All options for a string cell (see above) * fmt = Format to print the number

Write a value with confidence interval to the next cell in the table

%rosewood(&rw_handles, function=ci, value=5, lower=4.3, upper=5.7, [options, see below]); Cell options: * All options for a string cell (see above) * fmt = Format to print the number and CI

Go to the next row in the table

%rosewood(&rw_handles, function=next_row);

End the header of the table

%rosewood(&rw_handles, function=end_header);

%rosewood(&rw_handles, function=footer, footer_text=“Example footer text”);

Write any line to the control section of the table

%rosewood(&rw_handles, function=control, control_line=“Example of a control line”);

Close a Rosewood table

%rosewood(&rw_handles, function=close);

Version history

v1.2.2; Geng Zhang, May 2017; Bug fixed for space_needed < left_space
v1.2.0; Christiaan Righolt, November 2016; Change column mergers
v1.1.0; Gurpreet Pabla, November 2016; Bug fixes; Add title & footer arguments to rosewood write option
v1.0.0; Christiaan Righolt, October 2016; Initial version

Testing

case_control_check

File location: P:\VDEC\source\testing\case_control_check.sas

Summary

Record number of records/cases/controls change after each data/proc step. Check for duplicates in cases and controls.

Parameters

macro_num= Start with 1 and provide number in the increment of 1 for each macro call. case_definition= Variables which defines a case i.e is_case = 1. case_change_reason= If you know that number of cases will change from last step, please put a reason for future reference. Use %str. case_primary_key= Variables which define a unique case i.e scrphin index_date. It will be used in conjunction with case_definition parameter. control_definition= Variables which defines a control i.e is_case = 0. Use %str. Optional. control_change_reason= If you know that number of controls will change from last step, please put a reason for future reference. Optional. control_primary_key= Variables which define a unique control i.e scrphin index_date. It will be used in conjunction with control_definition parameter. Optional. filenameprefix=Name of output file and should be same in each call of the macro within same project. Datetime timestamp will be suffixed automatically. No special characters, spaces, hyphen.”” is preferred. file_location=Location of the output file and should be same in each call of the macro within same project.

Notes

This macro will create a text file with following columns: Dataset = Name of the dataset No. of cases = Number of cases based on case_definition parameter Cases changed = Change in number of cases from previous step Cases change reason = Reason of change if provided in case_change_reason parameter Cases dups = Duplicates in cases using case_primary_key and case_definition parametres No. of controls = Number of controls based on control_definition parameter Controls changed = Change in number of controls from previous step Controls change reason = Reason of change if provided in control_change_reason parameter Controls dups = Duplicates in controls using control_primary_key and control_definition parametres

Example

%case_control_check(macro_num=1,case_definition=%str(is_case = 1),case_change_reason=first run,case_primary_key=scrphin index_date,control_definition=,control_change_reason=,control_primary_key=,filename_prefix=chk,file_location=P:\VDEC\temp); %case_control_check(macro_num=2,case_definition=%str(is_case = 1),case_change_reason=,case_primary_key=scrphin index_date,control_definition=%str(is_case = 0),control_change_reason=first run,control_primary_key=scrphin index_date,filename_prefix=chk,file_location=P:\VDEC\temp);

Version history

v1.0.0; Gurpreet Pabla, July 2017; Initial version

data_quality-v1_1_0

File location: P:\VDEC\source\testing\data_quality-v1_1_0.sas

Summary

This macro is used to produce a html file containing : 1) Proc contents output along with missing percentage & top 10 values of each variable. 2) If table has variable BIRTHDT, it will compare every other date variable with BIRTHDT to check if any date is less than BIRTHDT. 3) Give percentage of top 10 values. 4) Create fiscal year trend graphs for dates.

This macro can be used for a specific dataset or group of datests with same prefix.

Usage

Run it using SAS Enterprise Guide(SAS EG)

ds = Name of input dataset, could one or two level STUDY_START_YR = Start Year (Default is 1985) STUDY_END_YR = Study End year (Default is current year)

example: %let dq_dir=\ras-fileprint-1\users\gurpree2\Documents\Gurpreet; %data_quality(ds=project.tst, STUDY_START_YR=1995, STUDY_END_YR=2014);

Version history

v1.1.0; Gurpreet Pabla, November 2016; Check if value of any date is less than BIRTHDT.
v1.0.0; Gurpreet Pabla, October 2016; Initial version

test_framework

File location: P:\VDEC\source\testing\test_framework.sas

Summary

This test framework executes a test suite and displays the results in a SAS output window. Callers also have the option of saving the test results to a Rosewood table.

Parameters

test_suite_name = The test suite name that will be displayed in the outputted results.
test_suite_file_path = A string containing the file path of the test suite to execute. This test suite must contain valid assert statements.
output_file_path = The location to save the test results in Rosewood format [optional].

Notes

In order to display the line numbers of the assert statements in the outputted results, an additional parameter is needed at the end of each assert signature. For example, if a desired_nobs assert statement is %assert_not_empty(‘Population registry is not empty’, project.prefix_orig_reg) then the actual macro signature must have an additional line number parameter appended: %macro assert_not_empty(test_description, dataset_name, source_line_number). However, the calling statement should ignore the source_line_number parameter as this softare automatically determines the correct value for this parameter depending upon the assert call’s location in the calling script. In order for the assert line numbers to work, it’s necessary to have a ‘main’ script with a series of %run_test_suite calls.

Usage

From within a ‘main’ testing script:

%run_test_suite( test_suite_name=Extracted data validation, test_suite_file_path=‘U:\projects\mims_emr\source\testing\validate_extracted_data.sas’, output_file_path=U:\projects\mims_emr\results\testing\Data extraction validation.txt );

The test suite file will be a SAS script consisting of valid assert statements. For example, U:\projects\mims_emr\source\testing\validate_extracted_data.sas may contain the following assert statements:

%assert_not_empty(‘Population registry is not empty’, &DS_LIBRARY..&PROJECT_CODE._orig_reg);

%let two_years_prior_date = intnx(‘year’, &STUDY_START_DATE, -&YEARS_DATA_PRIOR_TO_STUDY_START, ‘same’); %assert_field_witin_range(‘Registry contains children born 2 years prior to study start date and up to study end date’, &DS_LIBRARY..&PROJECT_CODE._orig_reg, birthdt, &two_years_prior_date, &STUDY_END_DATE); %assert_field_witin_range(‘Registry coverage dates begin 2 years prior to study start date and go up to study end date’, &DS_LIBRARY..&PROJECT_CODE._orig_reg, covdt, &two_years_prior_date, &STUDY_END_DATE);

See specific assert macro signatures below for additional valid assert statements. Assert statements need to be on a single line all by itself. No other statements can follow an assert macro call.

Dependencies

Assumes a macro variable MACRO_ROOT is set with the location of the VDEC SAS library.
Requires SAS Rosewood library
Requires general SAS library

Final

Files without documentation

P:\VDEC\source\analysis\graphing\kmplot-v1_1_0.sas
P:\VDEC\source\analysis\graphing\kmplot_failure_template-v1_1_0.sas
P:\VDEC\source\analysis\graphing\kmplot_survival_template-v1_1_0.sas
P:\VDEC\source\analysis\graphing\Work in Progress\kmplot_diff_functionality.sas
P:\VDEC\source\analysis\graphing\Work in Progress\suvivalplot.sas
P:\VDEC\source\analysis\graphing\Work in Progress\suvivalplot_v2.sas
P:\VDEC\source\analysis\modeling\cox-v1_0_0.sas
P:\VDEC\source\analysis\modeling\create_cox_models-v1_0_0.sas
P:\VDEC\source\analysis\modeling\create_cox_models_foreach_var-v1_0_0.sas
P:\VDEC\source\data preparation\conditions\identify_conditions.sas
P:\VDEC\source\data preparation\conditions\prepare_data_for_condition_identification.sas
P:\VDEC\source\data preparation\covariates\create_sibling_covar.sas
P:\VDEC\source\data preparation\drugs\drug_tree_tools.sas
P:\VDEC\source\data preparation\exposures\identify_exposures.sas
P:\VDEC\source\data preparation\exposures\prepare_data_for_exposure_identification.sas
P:\VDEC\source\data preparation\maternal\clean_babyfirst_ds.sas
P:\VDEC\source\data preparation\process original databases\process_cancer.sas
P:\VDEC\source\data preparation\size for gestational age\size_gest_age.sas
P:\VDEC\source\general\general.sas
P:\VDEC\source\interactive command environment\ice-v0_1_0.sas
P:\VDEC\source\interactive command environment\Work in Progress\temp.sas
P:\VDEC\source\sassy\sassify-v0_1_0.sas
P:\VDEC\source\testing\test_framework_window_footer_template.sas
P:\VDEC\source\testing\test_framework_window_header_template.sas
P:\VDEC\source\testing\test framework usage examples\main_testing.sas
P:\VDEC\source\testing\test framework usage examples\validate_extracted_data.sas

Analysis

Tabulation

create_descriptive_table

Summary

Usage

Parameters

Known issues

Requirements

create_rates_table-v0_1_3

Summary

Usage

Parameters

Notes

Version history

Data Preparation

perform_attrition-v0_2_2

Summary

Parameters

Requirements

Version history

sasfmts_to_stata_value_label

Summary

Parameters

Usage

Notes

TODO:

Version history

Conditions

import_condition_db

Summary

Parameters

Notes

Drugs

characterize_drug_usage

Summary

Usage

Example code

Input and output

Examples of supplemantal files

Macro strucure

TODO

find_drug_strength_in_description

Summary

Parameters

Note

Outputs

Example

impute_DPIN_costs

Summary

Parameters

Notes

Health Utilization

health_utilization

Summary

Parameters for Physician Visits

Parameters for Physician costs

Parameters for Number of Hospitalization

Parameters for Hospitalization costs

Notes

Version history

Matching

create_case_control_sets-v3_0_0

Summary

Usage

Random seeds

Data preparation

Sampling the follow-up time

Add additional covariates

Match selection

Version history

diagnostic

Summary

Usage

Diagnostics

Version history

Process Original Databases

combine_ICU_data

Summary

Parameters

Notes