—Red Hen Coding Standards

Introduction

Red Hen Lab is a collaborative with a long-term vision. We want you to be able to build on our existing code without having to spend a lot of time having to figure out the idiosyncracies of the previous coder. It should be enough for you to familiarize yourself with your coding standards, spelled out below. Similarly, if you adhere to these standards, it will be so much simpler for others to extend your code and make it part of a great and ongoing project of incremental improvements. We therefore ask you to take care to write clear and transparent code that follows our standards and is clearly documented. 

These standards are themselves in an incipient state. We want your suggestions for improvements, simplifications, and elaborations.

Last updated 2018-05-16.

Related pages

General
  • Follow the official Python Style Guide
  • Verify that your code plays nice with these standards by running it through yapf
Red Hen naming conventions
  • Use "media_name" to represent the Red Hen base name (slug) of a video and its associated files, without extensions.  For instance "2015-08-07_0050_US_FOX-News_US_Presidential_Politics".
  • Use variables with names like "*_file" for file objects, not file names.  The form "*_file_name" can be used for system file names, whether or not they include paths.
  • Use a trailing slash when passing paths as arguments.

 

 

From here on, "naming" and "names" refer to the names of directories, files, Python modules, Python functions / methods, function arguments, and Python variables / attributes, unless otherwise specified.

 

In general, make names as consistent as possible, both within code files and across code files - use the principle of least surprise.

 

Please avoid abbreviations in names except for common ones like "dir" (directory) and "ext" (extension), or if a name is VERY long and the abbreviations will be easily understood.  Please try to choose between abbreviations and full words as consistently as possible, both within files and across files.  A legend of abbreviations might even be a good idea.

 

Please make names as long as needed to be unambiguous, but no longer.  A person with domain knowledge but only a little knowledge of programming should be able to tell what a named entity does.  For example, two good names are dia_to_speaker_file() and dia_to_speaker_data().  They use an abbreviation, but the abbreviation is necessary, readable, and consistent.  The two names make the difference between "file" and "data" clear.

 

Use English that is as simple as possible, but no simpler.  Please use singular and plural forms properly.

 

If two or more variables are similar, consider putting nouns before adjectives.  For example, use "time_start" and "time_end" instead of "start_time" and "end_time".  This might not make sense when working with files.

 

Please use CamelCaseWithInitialUpperCase for class names and lower_case_with_underscores for directory, file, module, function / method, argument, and variable / attribute names.  An exception is if part of a module name is a domain name without dashes (use lowercasewithoutunderscores).

 

If an acronym occurs in CamelCase, please CAPITALIZE all the letters of the acronym, as in "HTTPResponse".  Remember that this only applies to CamelCase, not the lower case naming standards, in which all letters of an acronym should be lower case.

 

Please use UPPER_CASE_WITH_UNDERSCORES for constants, symbols, states, and other "special values".

 

HTML IDs should be in lower-case-with-dashes if possible.

 

EXCEPTIONS: The Red Hen Lab uses CamelCaseWithInitialUpperCase for GitHub repository names and data directory names, but not code directory names.  Red Hen coders sometimes use capitalized abbreviations for imported module names, such as "import speaker.recognition as SR".

 

Some people like to see as much code as possible on one screen, but some people have trouble reading a lot of code at once.  Please use a blank line after (at most) every two or three related lines of code.  Even a single code statement, multiple line or otherwise, can stand on its own when it does a lot, contains many function arguments, or contains nested parentheses.  In other words, PLEASE MAKE LIBERAL BUT SENSIBLE AND CONSISTENT USE OF BLANK LINES.

 

Please put a space on both sides of infix operators (including = and ==, unless they are in function arguments), and after separators, unless the separator is at the end of a line.

 

Please keep your imports alphabetized, and sensibly grouped with blank lines if the list gets long.  Please keep "from X import Y" and "import Z" forms separate.

 

LESS IMPORTANT: MP recommends the use of tabs rather than spaces so coders can set their own indentation sizes, but if the project you're working on uses spaces, it's best to conform.

 

LESS IMPORTANT: MP recommends keeping all tokens with the same function at the same level of indentation, but as close to the left as possible, and using shorter lines, as in:

 

def align(

audio_file_name, transcript_file_name, alignment_file_name, format='lines', num_threads=1, conservative=False, disfluency=True, disfluencies=disfluencies

):

 

... and:

 

entry = \

'|' + end_time_str + '|' + \

primary_tag + '|' + attribute + speaker + \

'|Log Likelihood=' + str(log_likelihood) + '\n'

 

We realize that Python itself is not 100% consistent in its use of capitalization and underscores, especially with C like function abbreviations, third party modules such as logging and threading, and multiple word function names that are still relatively short, especially operator like functions.

 

No set of guidelines covers all cases, and some of these guidelines differ from the Python style guide, but if you make exceptions to what is accepted on a project, please have a good reason for doing so, and be prepared to explain it if asked :)

 

Please keep your home directory free of data files, downloads, uncompressed file contents, test files, etc.  Please put things like this in appropriately named directories.  We should be able to tell what these directories are for, and when they were created, by their names.  For example, "audio_test_20170101" is a good name.  Please also keep data, downloads, tests, experiments, etc., out of code directories.  Code directories should be GitHub ready.

 

Please keep third party code trees separate from Red Hen directory trees.  When customizing third party code, make new files instead of editing existing files if possible.  For example, "install.sh" can be copied to "install_red_hen.sh" or "install_case_hpc.sh" and code changes can be made in the new files.

 

Please document code clearly and well.  Use a space after the # in a comment line to keep the text readable.  Use a blank line before and after comments.