Data checklist
Data process/checklist
Working notes
- Edit->preferences for set up, can change appearance, fonts, etc.
- General colour scheme
- Results, e.g., classic
- Can save preferences (Edit->Preferences->Save preference set)
- Use tabs for variable completion when possible
- Use PgUp and PgDn for going through command history
- Use
compress
- Short and descriptive names for variables/folders/files
- Log exploration in log file
log using abc-12may2021, text replace
- Limit to one version of the data
- Check variables
codebook
duplicates report
- Evaluate duplicates/reasonable or not (context dependant)
- Watch for stacked files
missings report
or obsolete nmissing
- Histogram
hist
of continuous vars
- Check time trends -> Fix simple date errors in processing file
- Use
summarize
and table
(by covariates)
- Always
browse
to ensure what you think actually happened in the data
collapse
dataset as relevant (e.g., weekly counts)
- Critical are the exposure and outcome variables and variable determining the included population
- Create small utility programs/snippets/templates to help
- Use setup file to set everything you need to do exploration
- Know difference between do and ado
- See
syntax
for how to
- In editor: CTRL+A and then Edit->Advanced->Re-indent for proper indenting of the selected code
- Use
which
to identify which version of ado programs you’re using/ program dir
for programs loaded from do files
- Document all choices
- Reason for dropping variables (e.g., all missing)