Pure publication lists

How to automatically update a list of publications from Pure on a research group’s website?

Official route

Ask Pure people for the UUID of the research group and use UvA Dare which looks like:

http://dare.uva.nl/search?org-uuid=1f129de9-e2f4-41a0-a223-94f32e993ac1&smode=iframe

but only validated publications will appear which I find unsatisfactory. Fortunately there is an unofficial route.

Unofficial route

Make a report in Pure, schedule it to be emailed, receive it, and process it. This will include not-yet-validated publications and looks like:

http://amlab.science.uva.nl/publications/

but the links now simply forward to a google search. It should be possible to instead use an XLS report that includes the UID for a link to UvA Dare but this is future work. Here follows the steps:

– Make a report

See the video and use report type ‘Listing’:
http://www.atira.dk/en/pure/screencasts/how-to-get-familiar-with-reporting-4.12.html

– Schedule it to be emailed

Schedule the report to be send in HTML format to a gmail account dedicated for this purpose:
SchedulePureReport

– Receive it

Set your gmail-name and gmail-password in the script below and use it to install and configure ‘fetchmail’, ‘procmail’ and ‘mpack’. Only tested on Ubuntu Linux.

# based on: https://outhereinthefield.wordpress.com/2015/06/14/scripting-gmail-download-and-saving-the-attachments-with-fetchmail-pro
cmail-and-munpack/
####################################
email=my-gmail-name
password=my-gmail-password
####################################

### install software
sudo apt-get install fetchmail procmail mpack

### config fetchmail
echo "poll pop.gmail.com
protocol pop3
timeout 300
port 995
username \"${email}@gmail.com\" password \"${password}\"
keep
mimedecode
ssl
sslcertck
sslproto TLS1
mda \"/usr/bin/procmail -m '$HOME/.procmailrc'\"" > $HOME/.fetchmailrc

chmod 700 $HOME/.fetchmailrc

### config procmail
echo "LOGFILE=/home/${USER}/.procmail.log
MAILDIR=/home/${USER}/
VERBOSE=on

:0
Maildir/" > $HOME/.procmailrc

mkdir -p $HOME/Maildir/process
mkdir -p $HOME/Maildir/process/landing
mkdir -p $HOME/Maildir/process/extract
mkdir -p $HOME/Maildir/process/store
mkdir -p $HOME/Maildir/process/archive

Then use this script in a cron job to copy the ‘.html’ attachment to the target file (email with report is expected around 1:00 am):

#!/bin/bash
####################################
targetfile=/var/www/publications.html
####################################

DIR=$HOME/Maildir
LOG=$HOME/Maildir/getpublications.log
date +%r-%-d/%-m/%-y >> $LOG
fetchmail
mv $DIR/new/* $DIR/process/landing/
cd $DIR/process/landing/
shopt -s nullglob
for i in *
do
  echo "processing $i" >> $LOG
  mkdir $DIR/process/extract/$i
  cp $i $DIR/process/extract/$i/
  echo "saving backup $i to archive"  >> $LOG
  mv $i $DIR/process/archive
  echo "unpacking $i" >> $LOG
  munpack -C $DIR/process/extract/$i -q $DIR/process/extract/$i/$i
  find $DIR/process/extract/$i -name '*.html' -exec cp {} ${targetfile} \;

done
shopt -u nullglob
echo "finishing.." >> $LOG
mv $DIR/process/extract/* $DIR/process/store/ 
echo "done!" >> $LOG

– Process it

Add this to the script above to clean up the report and add links:

# remove header and footer
perl -i -0pe 's/<h1 class="ReportTitle">.*?<br>//igs' ${targetfile}
perl -i -0pe '$datestring = localtime(); s/<span class="body">.*?<br>.*?<br>/<span class="body">updated $datestring<\/span>/igs' ${targetfile} # insert update time
perl -i -0pe 's/<h2 class="ReportElementTitle">.*?<\/h2>//igs' ${targetfile}
perl -i -0pe 's/<p class="reportdescription">.*?<\/p>//igs' ${targetfile}

# remove paragraph counts
perl -i -0pe 's/(<h3 class="ListGroupingTitle1">).*?\. /$1/igs' ${targetfile}

# add links
perl  -i -0pe 's/(?<total><strong>(?<title>.*?)<\/strong>)/<a href="https:\/\/www.google.nl\/#q=%22$+{title}%22">$+{total}<\/a>/igs' ${targetfile}