August 06, 2003

Here you go then

After a bit of messing about (and realising that it would take longer for me to remember how variables in regular expressions worked than it would to do it by hand) I present the following sample document for your delight:

House of Commons Hansard Debates for 8 Jul 2003 (pt 3) WITH kludged working permalinks

Posted by Tom Dolan at August 6, 2003 10:21 AM

So getting that done would be the 'easy' bit. And would be a step in the right direction.

However, if we want comments and trackback we're going to need (girds loins) an XML feed. Which is where James Crabtree et al's knowledge of the goverment machine would need to come in.

Anyway, this is distracting from the serious business of a) job applications, b) my usual frivolous blog entries about jam I like and stuff.

Posted by: Tom Dolan at August 6, 2003 10:36 AM

Oh that's brilliant! I'd forgotten how entertaining Dennis Skinner was.

Posted by: pHile at August 7, 2003 03:42 AM

A few years back I tried to scrape Hansard in an effort to index when different MPs spoke -- so you could be notified when *your* MP spoke (the notification idea coming from http://www.byliner.com ). The problem I had then was that Hansard's formatting was very non-standard -- an MP might be referred to by their full name, or part of their name, or by their ministerial title for example. Also, the text of speeches is broken up by dates and column numbers and split over different pages -- all adding to the fun!

But your plan, sensibly, sounds more do-able -- good luck!

Posted by: Phil Gyford at August 11, 2003 12:43 PM

So the idea would be to import the whole thing into Movable type or similar? and have each excerpt you've assigned a permalink to in your example as a post? and each post assigned to multiple categories (MP's name, Debate title) along with permalinks, comments and trackback? And then some cleverly archtected MT templates to make the whole thing usable?

This could really be interesting...

Posted by: Robin Grant at August 11, 2003 12:56 PM

Ideally I'd like to have something that works as a fully functioning blog, with trackback and comments and all that, but I suspect that might be a little too much to do in one go. And it's tons of work without much knowledge of the final userbase.

Comments will take years to achieve - it throws open the whole editorial authority of the publication. I remember all the problems the beeb had with messageboards. Trackback will probably be easier editorially, but harder technically. Adn probably have to go to EU procurement. :-)

So, in as much as there is a plan, it's to get Hansard bloggable (a few hours work - figuratively), and then to show that it is being blogged. And then an cost/benefit exercise on what features could be delivered next.

Posted by: Tom at August 11, 2003 04:17 PM

Cool idea! And I like that extract... they are actually talking about real piracy (robbery at sea) rather than what everyone believes to be piracy (which is actually just copyright infringement).

Posted by: Tom Morris at August 11, 2003 05:19 PM

Great site guys... Keep up the good work :)

Posted by: Mitch Bruke at May 12, 2004 04:22 PM

