— Red Hen Coding Standards
Red Hen Lab is a collaborative with a long-term vision. We want you to be able to build on our existing code without having to spend a lot of time having to figure out the idiosyncracies of the previous coder. It should be enough for you to familiarize yourself with your coding standards, spelled out below. Similarly, if you adhere to these standards, it will be so much simpler for others to extend your code and make it part of a great and ongoing project of incremental improvements. We therefore ask you to take care to write clear and transparent code that follows our standards and is clearly documented.
These standards are themselves in an incipient state. We want your suggestions for improvements, simplifications, and elaborations.
Last updated 2018-05-16.
- Command-line tools
- Creating a bulk website downloader
- Machine learning
- SDLE Tutorials
- Statistics in Red Hen
- SyntaxNet and Parsey McParseface
- Singularity application container
Python Coding Standards
- Follow the official Python Style Guide
- Verify that your code plays nice with these standards by running it through yapf
File and data management
Keep your home directory free of data files, downloads, uncompressed file contents, test files, etc. Instead, place them in appropriately named directories. We should be able to tell what these directories are for, and when they were created, by their names. For example, "audio_test_20170101" is a good name. Please also keep data, downloads, tests, experiments, etc., out of code directories. Code directories should be GitHub ready.
Keep third party code trees separate from Red Hen directory trees. When customizing third party code, make new files instead of editing existing files if possible. For example, "install.sh" can be copied to "install_red_hen.sh" or "install_case_hpc.sh" and code changes can be made in the new files.
Keep your imports alphabetized, and sensibly grouped with blank lines if the list gets long. Please keep "from X import Y" and "import Z" forms separate.
For the names of directories, files, Python modules, Python functions / methods, function arguments, and Python variables / attributes, use the following guidelines.
- Use "media_name" to represent the Red Hen base name (slug) of a video and its associated files, without extensions. For instance "2015-08-07_0050_US_FOX-News_US_Presidential_Politics".
- Use variables with names like "*_file" for file objects, not file names. The form "*_file_name" can be used for system file names, whether or not they include paths.
- Use a trailing slash when passing paths as arguments.
In general, make names as consistent as possible, both within code files and across code files - use the principle of least surprise.
Please avoid abbreviations in names except for common ones like "dir" (directory) and "ext" (extension), or if a name is very long and the abbreviations will be easily understood. Please try to choose between abbreviations and full words as consistently as possible, both within files and across files. A legend of abbreviations may be useful.
Please make names as long as needed to be unambiguous, but no longer. A person with domain knowledge but only a little knowledge of programming should be able to tell what a named entity does. For example, two good names are dia_to_speaker_file() and dia_to_speaker_data(). They use an abbreviation, but the abbreviation is necessary, readable, and consistent. The two names make the difference between "file" and "data" clear.
Use English that is as simple as possible, but no simpler. Please use singular and plural forms properly.
If two or more variables are similar, consider putting nouns before adjectives. For example, use "time_start" and "time_end" instead of "start_time" and "end_time". This might not make sense when working with files.
Please use CamelCaseWithInitialUpperCase for class names and lower_case_with_underscores for directory, file, module, function / method, argument, and variable / attribute names. An exception is if part of a module name is a domain name without dashes (use lowercasewithoutunderscores).
If an acronym occurs in CamelCase, please CAPITALIZE all the letters of the acronym, as in "HTTPResponse". Remember that this only applies to CamelCase, not the lower case naming standards, in which all letters of an acronym should be lower case.
Please use UPPER_CASE_WITH_UNDERSCORES for constants, symbols, states, and other "special values".
HTML IDs should be in lower-case-with-dashes if possible.
EXCEPTIONS: The Red Hen Lab uses CamelCaseWithInitialUpperCase for GitHub repository names and data directory names, but not code directory names. Red Hen coders sometimes use capitalized abbreviations for imported module names, such as "import speaker.recognition as SR".
We realize that Python itself is not 100% consistent in its use of capitalization and underscores, especially with C like function abbreviations, third party modules such as logging and threading, and multiple word function names that are still relatively short, especially operator like functions.
Use a blank line after (at most) every two or three related lines of code. Even a single code statement, multiple line or otherwise, can stand on its own when it does a lot, contains many function arguments, or contains nested parentheses. Make a liberal but sensible and consistent use of blank lines.
Put a space on both sides of infix operators (including = and ==, unless they are in function arguments), and after separators, unless the separator is at the end of a line.
Use a space after the # in a comment line to keep the text readable. Use a blank line before and after comments.