How to automatically update a list of publications from Pure on a research group’s website?
Official route
Ask Pure people for the UUID of the research group and use UvA Dare which looks like:
http://dare.uva.nl/search?org-uuid=1f129de9-e2f4-41a0-a223-94f32e993ac1&smode=iframe
but only validated publications will appear which I find unsatisfactory. Fortunately there is an unofficial route.
Unofficial route
Make a report in Pure, schedule it to be emailed, receive it, and process it. This will include not-yet-validated publications, but it gets messy:
– Make a report
See the video and use report type ‘Listing’:
http://www.atira.dk/en/pure/screencasts/how-to-get-familiar-with-reporting-4.12.html
– Schedule it to be emailed
Schedule the report to be send in HTML format to a gmail account dedicated for this purpose:

– Receive it
Set your gmail-name and gmail-password in the script below and use it to install and configure ‘fetchmail’, ‘procmail’ and ‘mpack’. Only tested on Ubuntu Linux.
# based on: https://outhereinthefield.wordpress.com/2015/06/14/scripting-gmail-download-and-saving-the-attachments-with-fetchmail-pro
cmail-and-munpack/
####################################
email=my-gmail-name
password=my-gmail-password
####################################
### install software
sudo apt-get install fetchmail procmail mpack
### config fetchmail
echo "poll pop.gmail.com
protocol pop3
timeout 300
port 995
username \"${email}@gmail.com\" password \"${password}\"
keep
mimedecode
ssl
sslcertck
sslproto TLS1
mda \"/usr/bin/procmail -m '$HOME/.procmailrc'\"" > $HOME/.fetchmailrc
chmod 700 $HOME/.fetchmailrc
### config procmail
echo "LOGFILE=/home/${USER}/.procmail.log
MAILDIR=/home/${USER}/
VERBOSE=on
:0
Maildir/" > $HOME/.procmailrc
mkdir -p $HOME/Maildir/process
mkdir -p $HOME/Maildir/process/landing
mkdir -p $HOME/Maildir/process/extract
mkdir -p $HOME/Maildir/process/store
mkdir -p $HOME/Maildir/process/archive
Then use this script in a cron job to copy the ‘.html’ attachment to the target file (email with report is expected around 1:00 am):
#!/bin/bash
####################################
targetfile=/var/www/publications.html
####################################
DIR=$HOME/Maildir
LOG=$HOME/Maildir/getpublications.log
date +%r-%-d/%-m/%-y >> $LOG
fetchmail
mv $DIR/new/* $DIR/process/landing/
cd $DIR/process/landing/
shopt -s nullglob
for i in *
do
echo "processing $i" >> $LOG
mkdir $DIR/process/extract/$i
cp $i $DIR/process/extract/$i/
echo "saving backup $i to archive" >> $LOG
mv $i $DIR/process/archive
echo "unpacking $i" >> $LOG
munpack -C $DIR/process/extract/$i -q $DIR/process/extract/$i/$i
find $DIR/process/extract/$i -name '*.html' -exec cp {} ${targetfile} \;
done
shopt -u nullglob
echo "finishing.." >> $LOG
mv $DIR/process/extract/* $DIR/process/store/
echo "done!" >> $LOG
– Process it
Add this to the script above to clean up the report and add links:
# remove header and footer
perl -i -0pe 's/<h1 class="ReportTitle">.*?<br>//igs' ${targetfile}
perl -i -0pe '$datestring = localtime(); s/<span class="body">.*?<br>.*?<br>/<span class="body">updated $datestring<\/span>/igs' ${targetfile} # insert update time
perl -i -0pe 's/<h2 class="ReportElementTitle">.*?<\/h2>//igs' ${targetfile}
perl -i -0pe 's/<p class="reportdescription">.*?<\/p>//igs' ${targetfile}
# remove paragraph counts
perl -i -0pe 's/(<h3 class="ListGroupingTitle1">).*?\. /$1/igs' ${targetfile}
# add links
perl -i -0pe 's/(?<total><strong>(?<title>.*?)<\/strong>)/<a href="https:\/\/www.google.nl\/#q=%22$+{title}%22">$+{total}<\/a>/igs' ${targetfile}