WebGrab++

Introduction

WebGrab+Plus is a multi-site incremental xmltv epg grabber. It collects tv-program guide data from selected tvguide sites. Red Hen uses it to download broadcast schedule data for multiple countries.

No cookie.txt problem

cannot find /home/csa/wg++/elcinema.com_cookies.txt !

loadcookie failed! ... cannot update this channel

solution - created an empty file /home/csa/wg++/elcinema.com_cookies.txt on vila and it works

setting the offset for wg++ (copied from csa@dola crontab)

# Time offset can be set by editing on vila /nest/cfg/wg++/CC.config, where CC = cz, pl, ru, us

# Examples and instructions on how to set the offset are on vila in /nest/cfg/wg++/cz.config.xml

# Also,excellent instructions on editing wg++ config are at http://webgrabplus.com/node/30

# The cz schedule offset is currently (Jan 2017) set to +1

# which means that 1 hour is added to cz programs start times

# The pl schedule does not currently require setting an offset

# To test that the offset is set correctly, perform the following steps

# 1. edit on vila /nest/cfg/wg++/CC.config, where CC = cz, pl, ru, us

# 2. as csa@vila issue grab-tv-schedule.sh CC (you can check the format by issuing crontab -l)

# 3. wait until the grabbing process is finished (it may take a few minutes)

# 4. as csa@dola issue xmltv-convert-04.sh CC (you can check the format by issuing crontab -l)

# 5. as csa@dola issue, for example, schedule ČT1 "Události" 0 888 "National evening news" 19:00-20:00 s

# 6. check whether the schedule is correct. If not - perform steps 1-5 again

Fixing tv schedule grab on Vila

Well OK - here's the procedure of tv schedule grab fixing (if Jacek is unavailable). It's so simple and clever I can't help myself.

Just go to csa@vila and look inside crontab to find, for ex:

00 01 * * * grab-tv-schedule.sh us

30 01 * * * grab-tv-schedule.sh ru

00 02 * * * grab-tv-schedule.sh pl

30 02 * * * grab-tv-schedule.sh cz

00 03 * * * grab-tv-schedule.sh be

20 03 * * * grab-tv-schedule.sh de

40 03 * * * grab-tv-schedule.sh fr

00 04 * * * grab-tv-schedule.sh it

20 04 * * * grab-tv-schedule.sh es

40 04 * * * grab-tv-schedule.sh pt

Run the "suspect" manually (for example: grab-tv-schedule.sh es).

If it's not grabbing properly (you'll see from the output)- run the exquisite help Francis made (and I helped): grab-tv-schedule.sh -h

And ALL will be well.

Yes, blowing this horn, as always.

Best,

J

schedule troubleshooting. problem: schedule does not find a program it should find

solution: make sure the ID parameter in the lineup is the same as xmltv_id in vila:/nest/cfg/wg++ config file for a given country.

At Portugal Capture Station, for example, there was no need to use WebGrab++. But in Poland we ran into some difficulties obtaining the tv data and WG++ proved useful. Just Google it - the first link should work. I used http://www.webgrabplus.com/, followed the instructions there. Had to modify them a little (I will explain in a moment) and it worked. The output was not quite the right format so Francis wrote a converter program to (in his words) 'massage the output xmltv data a little'. He can do things like that. He looks like an ordinary respectable professor but don't be fooled. Be cautious around him- one moment you are not focused and he will write you an elegant script for anything, on the spot.

Webgrab++ captures tv schedule from selected internet sites (in any country not only Poland) and puts it in in a file in a special format called xmltv.

To make it work for Polish Internet sites, I had to modify the instructions on http://webgrabplus.com/node/324 in the following way:

"Next in the .channels.xml file you will see that there are <channel ...... > entries for every channel on that site. Just copy the channel lines you want, into the WebGrab++.config.xml file. " - Don't do it. You have to edit the lines from channels.xml before you put them into the WebGrab++.config.xml file.

Here's an example:

you have to change the line:

<channel update="i" site="programtv.interia.pl" site_id="stacja-tvp-1,page,pl-name,cid,20823112" xmltv_id="TVP 1">TVP 1</channel>

into:

<channel site="programtv.interia.pl" site_id="stacja-tvp-1,page,pl-name,cid,20823112" update="i" xmltv_id="TVP 1">TVP 1</channel>

and then put it into WebGrab++.config.xml file.

It may look complicated but when you follow all the instructions on http://www.webgrabplus.com/ one by one- it becomes rather simple.

We have managed yesterday (07.10.2016) to install WebGrab+Plus on a Raspberry and it works great- now we can capture Polish, Czech and Russian (possibly more) tv schedules automatically. The only difficult part when installing it on RPI was getting the right version of an application called "mono" that allows to run windows programs in linux environment (and how clever does that sound?- I can barely understand myself). And the problem was solved by Francis in one mighty lash of his incredible intellect (accompanied by 4 pages of well organized explanation) - all done in 20 minutes or so. His excellent solution to the mono problem is HERE . And now, on RPI, I can just issue

sudo mono WebGrab+Plus.exe "$(pwd)"

or put it in crontab or put it in a script and then in crontab (the scheduler). And that's what I did today (dummy prouuud). I can even I think explain what it means (or as it seems to me): 'sudo' gives you super-power- if you can sudo, you can do anything (and they let ME use it, how careless can you get? must be the American way- living on the edge, Wild West, etc.); 'mono' is described above- it allows a Windows program (WeBgRaB+PlUs.exe - this irregular size lettering is driving me nuts already) to run on RPI; and I very much suspect the "$(pwd)" is some path parameter (I learned already that pwd means "print" working directory and the dollar sign says that we want the value of something not just the form of it). Had Ferdinand de Saussure known this convention, he would have used cat (signifiant) and $cat (signifié) instead of wasting time on drawing a cat.

Anyway- by way of a summary of the installation procedure- apart from the 'mono problem', you do on RPI everything you do in Windows. You can even use the same config file and it works. So perhaps if (like myself) you don't feel very confident in the little black linux box (shell, bash, psshhh, plush, hush, whatever they call it) - install WebGrab++ in Windows and make it work there first and then you'll know what to do on RPI.

Well I have also learned recently that excessive sudoing is unhealthy and if you put the script in a proper directory, etc, you don't have to so now we just have

mono WebGrab+Plus.exe "*the directory webgrab++ exe is in*" (in our case: /home/csa/wg++/)

I do realize that using Windows program in linux environment is sort of taboo, certainly frowned upon by the linux elite - repairing a Swiss watch with a hammer sort of thing - but we have a working system and soon I'll be playing my favourite Witcher 3 (do you know it is made in Poland?) on a Raspberry (what? - not enough processing power? - pish posh, details).

We also wanted to run WebGrab with different config files - for different coutries, etc. Thoretically webgrab can be run with a directory parameter specifying the location of the config file but it requires creating multiple directories which have to contain more than just the config so Francis suggested much more elegant solution called symbolic link. In the script it looks like this:

# Create a symbolic link to the Polish configuration file ('#' is SHIFT 3 on keyboard and means it's a comment)

ln -sf /home/csa/wg++/pl.config.xml /home/csa/wg++/WebGrab++.config.xml (no idea what -sf stands for; will Google it one day)

# Run the grabber

mono /home/csa/wg++/WebGrab+Plus.exe "/home/csa/wg++"

So now WebGrab is fooled into running on pl.config.xml thinking it is its default config file WebGrab++.config.xml. Is this brilliant or what?.

And again - if Ferdinand de Saussure had been as clever as I am now- he would have simply said:

ln -sf /home/CAT /home /home/$CAT

which would have symbolically linked CAT with the value of CAT (assuming of course his cat was at /home/ at the time).

So let me finish this WebGrab saga by show-casing (showing off) a script I wrote today, 09.10.16 (of course mostly copying Francis's ideas):

#!/bin/bash

# Country to grab program from (adjective)

COUNTRY=Czech

# Country code (like us, pl, en, ru, etc.)

CC=cz

echo "Jacek's first ever shell script!"

echo "Well sort of anyway - Francis corrected and improved it"

echo "grabbing $COUNTRY tv program"

# Output file

FIL=/home/csa/wg++/guide-$CC.xml

echo "The output xmltv file is $FIL"

echo "The channels to grab, the number of days, etc. can be modified by editing /home/csa/wg++/$CC.config.xml"

echo "Greetings to Francis and Mark"

# Create a symbolic link to the configuration file

ln -sf /home/csa/wg++/$CC.config.xml /home/csa/wg++/WebGrab++.config.xml

# Run the grabber

mono /home/csa/wg++/WebGrab+Plus.exe "/home/csa/wg++"

# Receipt

echo -e "\n\tThe $COUNTRY schedule is available in $FIL\n"

And it sits in crontab now grabbing a fresh Czech tv schedule daily, shame it can't grab some of the famous Czech Budweiser (have to ask Francis and Mark how to write a separate script for Bud- grab_some_budweiser.sh )

Building WebGrab++

WebGrab++ is a mono application, see http://www.webgrabplus.com/. For the first build, we used this mono repository:

http://plugwash.raspbian.org/mono4

In July 2017, Raspbian upgraded from Jessie to Stretch. We may be able to use this mono repository with stretch:

http://download.mono-project.com/repo/debian/dists/raspbianstretch/