Data checklist

Data process/checklist

Working notes

  • Edit->preferences for set up, can change appearance, fonts, etc.
    • General colour scheme
    • Results, e.g., classic
    • Can save preferences (Edit->Preferences->Save preference set)
  • Use tabs for variable completion when possible
  • Use PgUp and PgDn for going through command history
  • Use compress
  • Short and descriptive names for variables/folders/files
    • old, tmp
  • Log exploration in log file log using abc-12may2021, text replace
  • Limit to one version of the data
  • Check variables
    • codebook
    • duplicates report
    • Evaluate duplicates/reasonable or not (context dependant)
    • Watch for stacked files
    • missings report or obsolete nmissing
    • Histogram hist of continuous vars
    • Check time trends -> Fix simple date errors in processing file
    • Use summarize and table (by covariates)
    • Always browse to ensure what you think actually happened in the data
    • collapse dataset as relevant (e.g., weekly counts)
    • Critical are the exposure and outcome variables and variable determining the included population
  • Create small utility programs/snippets/templates to help
    • Use setup file to set everything you need to do exploration
    • Know difference between do and ado
    • See syntax for how to
    • In editor: CTRL+A and then Edit->Advanced->Re-indent for proper indenting of the selected code
    • Use which to identify which version of ado programs you’re using/ program dir for programs loaded from do files
  • Document all choices
    • Reason for dropping variables (e.g., all missing)
12/05/2021