Programming guidelines

Philosophy

  • No change for change’s sake
  • Readable code
  • Spec requirements especially for reusable code
  • Use GitHub issues for discussions and prior to any work on developing reusable code (macros, programs, etc.)
  • Test driven development: Document desired results and assumptions of the data through test cases
    • Tests should be easily reproducible (not commented out in the code)
  • Reproducible: running the main script should reproduce all data for a project
  • Easy for others to take over/assist on project
  • Use generic components as much as possible (reduce redundancy, easier maintenance, more reliable software)
  • Review any results from data analysis internally with Christiaan and/or Salah before presenting to other collaborators

Best practices

  • Use GitHub issues for discussions and prior to any work on developing reusable code (macros, programs, etc.)
  • Use common directory structure between all projects. See “P:\VDEC\sample project” on RAS for example.
  • Have a main.sas/main.do script for SAS and/or Stata. Running these scripts should reproduce all datasets/results.
  • Have a formats.sas file in the source directory
  • Use git version control (Tortoise git on RAS)
  • Use VDEC macros and reference lists from the source and ref repositories
  • Comment only on interesting/complicated non-standard functionality (i.e., algorithms), do not comment trivial coding tasks.
  • Write code that is easy to understand, avoid “smart” solutions
  • Write modular code when possible (reduce code redundancy for easier code maintenance).
  • No user-specific subfolders, use git branches if you work on a project together.

Coding standards

The following standards should be followed unless there is good reason to deviate from it

General

  • Use lower case for field names and markup
  • Use short descriptive names
  • Use snake case (spaces allowed in dir names)
  • Normalize databases when reasonable
  • Do not put initials on datasets.
  • Label all permanent datasets with descriptive labels.
  • Give variables descriptive labels and include units (weeks, grams, etc.)
  • Use constants (do not leave hard-coded values throughout the code without proper labels)
  • Declare constants centrally at the beginning of a script or macro/function
  • Use Code Diary and its notation to generate documentation for a project.
  • Minimize the use of block comments (/* */) in the middle of a function/script/macro. (This allows for easier commenting out of large sections of code for debugging purposes.)
    • If you encounter this type of code in SAS it can be circumvented by writing “%macro junk();” at the start of the section and “%mend;” at the end of section you want to comment out.
  • Use TODOs to document any tasks that still need to be done (makes it easy for someone else to take over or assist on a project). Preferably use @todo tags in Code Diary comment blocks to generate this task list in the workplan as well.
  • Use tabs instead of spaces
  • Indent code in code blocks
  • Document the order scripts run in, this can be by running all from a central main.
  • Use white space around assignment operators.

RAS

  • Create as few permanent datasets as necessary in the project library. Use temp datasets in the work library instead.

SAS

  • Use SAS date functions (INTNX, YRDIF) instead of manually performing date math
  • Use blank lines between code blocks
  • Macros should be defined in a separate script file (unless it is a very small, specialized macro)
  • Declare includes at top of a script
  • Always end a data/proc step with a run/quit statement
  • Always specify the dataset a proc or data step is using
  • Do not use function-style macro calls (always use % sign in macro call)
  • Do not place macros in auto-call library
  • Use the date11. format
  • Do not use statement-style macros (IMPLMAC). Turn off IMPLMAC for enhanced performance. Makes it difficult to read and recognize the macro call, and slows down the system significantly.

SAS macros

  • The project macro folder should follow the same structure as the VDEC library (automatic if this is a git clone)
  • Scripts and macros should delete temporary datasets, in the work library, after use (especially true for callable macros)
  • Comment header block on all macros (/*~ ~*/) (explain parameters and purpose of macro, according to the guidelines in P:\VDEC\source\documentation\sample_macro_documentation.sas on the RAS): make sure other programmers can call it based on the information provided here.
  • Create submacros instead of large macros or macros with nested macros. Separating the code allows the pieces to be tested more thoroughly.

Stata

  • Use common Stata syntax when writing programs
  • Use assert statements often
  • Use modular code using do (or if required run) instead of include: Rationale, examples and tips
24/08/2018