Programming guidelines
Philosophy
- No change for change’s sake
- Readable code
- Spec requirements especially for reusable code
- Use GitHub issues for discussions and prior to any work on developing reusable code (macros, programs, etc.)
- Test driven development: Document desired results and assumptions of the data through test cases
- Tests should be easily reproducible (not commented out in the code)
- Reproducible: running the main script should reproduce all data for a project
- Easy for others to take over/assist on project
- Use generic components as much as possible (reduce redundancy, easier maintenance, more reliable software)
- Review any results from data analysis internally with Christiaan and/or Salah before presenting to other collaborators
Best practices
- Use GitHub issues for discussions and prior to any work on developing reusable code (macros, programs, etc.)
- Use common directory structure between all projects. See “P:\VDEC\sample project” on RAS for example.
- Have a
main.sas
/main.do
script for SAS and/or Stata. Running these scripts should reproduce all datasets/results.
- Have a
formats.sas
file in the source directory
- Use git version control (Tortoise git on RAS)
- Use VDEC macros and reference lists from the
source
and ref
repositories
- Comment only on interesting/complicated non-standard functionality (i.e., algorithms), do not comment trivial coding tasks.
- Write code that is easy to understand, avoid “smart” solutions
- Write modular code when possible (reduce code redundancy for easier code maintenance).
- No user-specific subfolders, use git branches if you work on a project together.
Coding standards
The following standards should be followed unless there is good reason to deviate from it
General
- Use lower case for field names and markup
- Use short descriptive names
- Use snake case (spaces allowed in dir names)
- Normalize databases when reasonable
- Do not put initials on datasets.
- Label all permanent datasets with descriptive labels.
- Give variables descriptive labels and include units (weeks, grams, etc.)
- Use constants (do not leave hard-coded values throughout the code without proper labels)
- Declare constants centrally at the beginning of a script or macro/function
- Use Code Diary and its notation to generate documentation for a project.
- Minimize the use of block comments (/* */) in the middle of a function/script/macro. (This allows for easier commenting out of large sections of code for debugging purposes.)
- If you encounter this type of code in SAS it can be circumvented by writing “%macro junk();” at the start of the section and “%mend;” at the end of section you want to comment out.
- Use TODOs to document any tasks that still need to be done (makes it easy for someone else to take over or assist on a project). Preferably use
@todo
tags in Code Diary comment blocks to generate this task list in the workplan as well.
- Use tabs instead of spaces
- Indent code in code blocks
- Document the order scripts run in, this can be by running all from a central main.
- Use white space around assignment operators.
RAS
- Create as few permanent datasets as necessary in the project library. Use temp datasets in the work library instead.
SAS
- Use SAS date functions (INTNX, YRDIF) instead of manually performing date math
- Use blank lines between code blocks
- Macros should be defined in a separate script file (unless it is a very small, specialized macro)
- Declare includes at top of a script
- Always end a data/proc step with a run/quit statement
- Always specify the dataset a proc or data step is using
- Do not use function-style macro calls (always use
%
sign in macro call)
- Do not place macros in auto-call library
- Use the date11. format
- Do not use statement-style macros (
IMPLMAC
). Turn off IMPLMAC for enhanced performance. Makes it difficult to read and recognize the macro call, and slows down the system significantly.
SAS macros
- The project macro folder should follow the same structure as the VDEC library (automatic if this is a git clone)
- Scripts and macros should delete temporary datasets, in the work library, after use (especially true for callable macros)
- Comment header block on all macros (
/*~
~*/
) (explain parameters and purpose of macro, according to the guidelines in P:\VDEC\source\documentation\sample_macro_documentation.sas on the RAS): make sure other programmers can call it based on the information provided here.
- Create submacros instead of large macros or macros with nested macros. Separating the code allows the pieces to be tested more thoroughly.
Stata
- Use common Stata syntax when writing programs
- Use
assert
statements often
- Use modular code using
do
(or if required run
) instead of include
: Rationale, examples and tips