index > Analyst Resources > Internal guidelines > Data checklist

Data checklist

Data process/checklist

Working notes

Edit->preferences for set up, can change appearance, fonts, etc.
- General colour scheme
- Results, e.g., classic
- Can save preferences (Edit->Preferences->Save preference set)
Use tabs for variable completion when possible
Use PgUp and PgDn for going through command history
Use compress
Short and descriptive names for variables/folders/files
- old, tmp
Log exploration in log file log using abc-12may2021, text replace
Limit to one version of the data
Check variables
- codebook
- duplicates report
- Evaluate duplicates/reasonable or not (context dependant)
- Watch for stacked files
- missings report or obsolete nmissing
- Histogram hist of continuous vars
- Check time trends -> Fix simple date errors in processing file
- Use summarize and table (by covariates)
- Always browse to ensure what you think actually happened in the data
- collapse dataset as relevant (e.g., weekly counts)
- Critical are the exposure and outcome variables and variable determining the included population
Create small utility programs/snippets/templates to help
- Use setup file to set everything you need to do exploration
- Know difference between do and ado
- See syntax for how to
- In editor: CTRL+A and then Edit->Advanced->Re-indent for proper indenting of the selected code
- Use which to identify which version of ado programs you’re using/ program dir for programs loaded from do files
Document all choices
- Reason for dropping variables (e.g., all missing)