Standard dir structure for VDEC data analysis projects

Issue: currently all SAS, Stata and even supporting files are typically stored in one folder. For large projects, this makes it harder for reviewers and collaborator to understand the code.

Proposal: - organize source dir using the following convention.

source\
    main.sas
    main.do
    prep\  [holds all scripts and supporting files needed for importing, cleaning and merging data]
        \sas
        \st
        \config [holds lookup files, etc]
    analyze\  [holds all scripts and supporting files needed for analyzing and producing results]
        \sas
        \st
        \config
    shared\  [holds all generic macros and ados specific to this projects and shared between prep and analyze]
        \sas
        \st
        \config
    lib\  [holds all external (imported) macros and ados]
    \alima
        \ras-lib
        \other-awasome-lib
    older\   [work versions and no longer used code] 
        \prep
            \sas
            \config
        \analyze
            \st
            \config
        \shared
            \sas
            \config
  • the language subdirs (eg sas, st, r) are optional (ie no need to create ‘st’ folder if there are no Stata files) and added as needed.
  • main.sas, main.do etc are saved directly under source.

  • advantages:

    • cleaner, easier to understand predictable structure leading to easier to remember and use file names, eg, source\prep\sas\registry.sas instead of eg source\prep_registry.sas
    • the use of language-specific subdir simplifies tooling, eg one could just move, doc, lint just the sas, st subdir, and do not need to check each file’s extension.
  • disadvantages:

    • slightly more verbose file paths in source files. Could be improved by using variables eg global prep= $root\prep\st

      then run $prep\registry