Guidelines for people who join or create a project team, especially one involving coding.
The entire Red Hen dataset of nearly 500,000 video and text files from television news recordings is available inside the Case HPC cluster, mounted on /mnt/rds/redhen/gallina/tv.
Files are organized by date -- that is to say, by year, month, and day. Each individual recording of a television news program will have a series of files with the same name and different extensions, for instance:
drwxr-xr-x 2 tna tna 57344 May 22 06:49 2018-05-22_0600_US_KMEX_Noticias_34_Edición_Nocturna.img
-rw-r--r-- 1 tna tna 380158 May 22 06:49 2018-05-22_0600_US_KMEX_Noticias_34_Edición_Nocturna.jpg
-rw-r--r-- 1 tna tna 124433419 May 22 06:43 2018-05-22_0600_US_KMEX_Noticias_34_Edición_Nocturna.mp4
-rw-r--r-- 1 tna tna 13980 May 22 07:01 2018-05-22_0600_US_KMEX_Noticias_34_Edición_Nocturna.ocr
-rw-r--r-- 1 tna tna 47207 May 22 06:30 2018-05-22_0600_US_KMEX_Noticias_34_Edición_Nocturna.txt
For a detailed description of the data, see Red Hen data format.
To navigate in the Gallina tree, add this function to your ~/.bashrc file:
# Move to the main tv storage directory N days ago and list the contents
function day () {
if [ -z "$1" ] ; then DAY=0 ; else DAY=${1:0:10} ; fi
if [ "$( echo "$1" | egrep '^[0-9]+$' )" ] ; then DAY="$1"
elif [ "${#1}" -eq "7" ] ; then cd /mnt/rds/redhen/gallina/tv/${1%-*}/$1 ; DAY=""
elif [ "$1" = "here" ] ; then DAY="$( pwd )" DAY=${DAY##*/} DAY="$[$[$(date +%s)-$(date -d "$DAY" +%s)]/86400]"
elif [ "$1" = "+" ] ; then DAY=`pwd` ; DAY=${DAY##*/}
DAY="$[$[$(date +%s)-$(date -ud "$DAY" +%s)]/86400]" ; DAY=$[DAY-$2]
elif [ "$1" = "-" ] ; then DAY=`pwd` ; DAY=${DAY##*/}
DAY="$[$[$(date +%s)-$(date -ud "$DAY" +%s)]/86400]" ; DAY=$[DAY+$2]
elif [ "${#DAY}" -eq "10" ] ; then DAY="$[$[$(date +%s)-$(date -ud "$DAY" +%s)]/86400]"
else echo "$1?"
fi #; echo "DAY is $DAY ; 1 is $1 ; 2 is $2"
if [ -n "$DAY" ] ; then DIR="/mnt/rds/redhen/gallina/tv/$(date -ud "-$DAY day" +%Y)/$(date -ud "-$DAY day" +%Y-%m)/$(date -ud "-$DAY day" +%F)"
if [ -d $DIR ] ; then cd $DIR ; else echo "No $DIR" ; fi
fi
}
Save the file and issue "source ~/.bashrc" to activate. To go to a particular day, issue "day" with the date or the number of days ago:
day 2018-02-04
day 4
To navigate between dates, use
day + 5
day - 30
You can also use this in a loop, for instance:
module load ffmpeg
for DAY in {08..31} ; do day 2018-01-$DAY ; for FIL in *_CN_*.txt ; do echo $FIL ; grep 'DUR|' $FIL ; ffprobe ${FIL%.*}.mp4 ; done ; done
We have limited storage capacity in your home directory, but ample space on gallina, which is to say, /mnt/rds/redhen/allina. Please create a directory on gallina where you can store your output and possibly your code and symlink to it from your home directory.