Paper Cuts: 2011

Monday, 5 December 2011

The End of DITA

So the last DITA session was today and its farewell to this part of the programme.

We were looking at Information Architectures today and it brought back a lot of memories in a strange way. Websites that were permanently Under Construction, dozens of broken links on one page, no way to find what you were looking for even though you knew it was there. Ah, the internet of the mid to late 1990’s, we shall never see your like again.

Other than a few exceptions, back then no one seemed to have any idea how to build a website but everyone was. I remember to build your free one on AOL you were invited to post them pictures via Royal Mail, they would scan and upload them and post them back. Remarkable.

These days due to a better, maybe wider, understanding of architecture born from a trillion rubbish pages, it is less likely a viewer will have the same issues. Yes, things can still be hard to find, yes there can be too many bells and whistles (MySpace being a good example in its early iterations) but on the whole things are much better and more uniform in a good way. Drop someone onto a random website and they will probably be able to navigate it successfully and with some nuance eight times out of ten.

Websites like SourceForge, Tesco and Amazon all have the ability to sign in and - presumably – personalise your experience to a greater or lesser degree. An interesting element is where and how prominent the ‘log in / sign in’ link is.

Amazon and SourceForge are seemingly designed for the casual browser. They’re not asking you to sign in right away as they encourage browsing. Stroll around, see if you see anything you like. Tesco on the other hands puts sign in front and center. They want you to sign in as every purchase builds up a greater and more in depth picture of who you are and what you want. Amazon has a similar function but as Tesco has such a wide range of products, including day to day products, the data it has on you is far richer. And to them that means better adverts which keep the cash rolling in.

And what of the mystery vegetable? It looks very familiar, like something that I would be forced to eat by a wife. I name it Pak Choi or Chinese Cabbage. To find this I looked at what the picture was called – mysteryvegetable. By looking for that I found a website which dealt in answering questions about what veg was what. In fact the more I look, the more websites dedicated to veg-identification I found. Is this really what the WWW is here to do?

Websites for groceries seemed to have trouble helping me identify it as I didn’t really know what I was after. But I suppose that the need for tuber identification is already taken care of by the crackpot community, so no need for the corporation to spend money on it.

When it comes to designing information architectures then, a clear plan is necessary...

You need to know what you need to know (do you need to know where your visitors are coming from? Do you need to know their age? Do you need to know anything which would be considered personal?).

You need to know what they need to know (what will they be looking for? Do they need a quick contact page? Or an FAQ? If they need an FAQ is it because your site is too complicated?).

You need to know what you need to know about them (should they be able to set up an account? How much detail will you need? How much of the site can they personalise? How individual will their experience be?).

This is really the first 10 feet of the iceberg as well, the subtle things, the nuances all will be changed by who and what you are selling (or what you are discussing / presenting) and to whom. As a collective experience humanity (not all of it of course, just the long term wired parts) seems to have evolved a generally better innate sense of information architecture and now the complexity lies not in a website that works for all, but one that works for you as an individual surfer.

Monday, 28 November 2011

Open This

Searching on the Royal Society website was a little trickier than I thought it would be. It was like being given a key to an extensive and ancient archive but not really knowing what the archive held, where the shelves were, what order things were kept, etc.

As they have a biological component I looked for any articles on GRIDS or Gay Related Immune Deficiency Syndrome / Symptoms which was a forerunner to the HIV/AIDS in that before we knew what the latter was, epidemiology suggested that there was a specific set of symptoms associated with a set of people which pointed to an underlying medical condition.

The RS seem to have nothing though. Maybe I’m not searching right and perhaps they don’t cover it but either way I couldn’t find anything. So I went to their website to see what kind of material they have and looked up Darwin instead. Plenty of hits there.

For the Open Source Software part of the session I looked at Freecode and GitHub. GiitHub sounds like a dating website for people who self confessed gits or a nastier version of mysinglefriend.com and lo and behold there is a picture of a smug, self satisfied bloke on the main page. He even describes himself as a ‘git instructor’. Remarkable.

Anyhoo, I preferred the older Freecode site as it wasn’t full of meaningless pictures. I understand they are trying to humanise technology, especially social media related, but Freecode felt more professional and listed itself well. Plus the gits seem to be trying to flog you something straight away instead of letting me see if I need training, etc.

The names of the software – vifm, burp, dbeaver, kwave, sunflower et al, all peppered with 3, 4 or 5 digit version numbers of course– is meaningless. They could be nuclear release codes for all I know or ways to hack into my bank and add £45,000 to my account risk free, but it all appears to be so much frippery.

In the interests of fairness I went over to SourceForge as well and that was much nicer. Set up like an ‘older’ system, it presented options in far easier ways. So out with the new, in with the old I cry!

For the open data mashup I looked at borough data and the rateof male hospital admissions attributable to alcohol per hundred thousandpopulation (2005-06). Tower Hamlets came in at a respectable 1,130 (rounded up). Less than Islington (1,218) and Hounslow (1,194)but more than Richmond Upon Thames (785) and Sutton (692).

If this data was mapped over to cover male homelessness and demographics as well we could perhaps see two things. First of all, we could see if boroughs with high homelessness had higher rates of male hospital admissions attributable to alcohol.

Secondly if we took the demographic data, sorted by age so we could see the 16 – 22 age group and then looked at male hospital admissions attributable to alcohol we may get an idea if it is ‘students’ (used as a catch most term here) in the population that result in high levels of admission due to binge drinking.

By looking at Alcohol Non Consumption Zones we may then be able to see if areas with these zones had lower rates of alcohol abuse resulting in hospital admission and homelessness than areas without.

With this data we may then be able to see if ANCZ's would be useful in cutting down male admissions due to alcohol in two key groups, one vulnerable and the other, often, just a bit silly.

Just put the MBE in the post, I don't have to to come to the Palace to pick it up at the moment.

Monday, 21 November 2011

Some Antics, Some Antics, they’re up to Semantics!

The semantic web clearly has great usage in areas like epidemiology. Being able to have database type information provided through a more intelligent system can vastly speed up the process of identifying correlations (though not causes) and associated information.

More than that though, the time and money cost of implementing semantic ideals outweigh the benefits. When I was in the Quality Team at Nokia, one issue that came up was - no surprises – quality. A reason that the company was continually late in delivering software to schedule was that the engineers wanted it to be perfect. All well and good, but this was unacceptable if this dedication caused manufacturing deadlines to slip and product release delays.

So a VP came up with a campaign based around ‘Good Enough is Good Enough’. The simple message was to finish and ship the product. On time. Not when it was ‘ready’, but according to the schedule. Clients, partners and customers wanted a product by the contracted date, not a perfect product at some future point. The former can be altered, the latter cannot.

And I think this is a major argument and force against the semantic web having the momentum to rally pick up outside of limited areas. Who’s going to want to spend the time creating a far ‘better’ WWW when the one we have now works so well? Only those with a very clear and present need for what it offers.

As a quick aside, perhaps if the adult entertainment / pornography business got behind the idea it would take off. That multi-billion dollar industry has arguably created the internet as a place where you can securely buy products, have high quality video, steam live feeds and driven broadband access just as they led the charge in the move from film to VHS format in the years before.

Back to DITA and away from the filth merchants, the below is what some of us came up with in relation to RFD triples, taxonomy and ontology...

If X does Y then Z.

In relation to a library the formula could be expressed as...

IF library patron BORROWS library book #765 THEN book status changed to issued.

Clear? Jolly good.

Monday, 14 November 2011

I, Library Science Robot.

"...create a design for a mobile application to support your learning on DITA."

The App which struck me as most useful for my needs given my sheer lack of time is one which will act as a ‘runner’ to scout out locations to visit, suggest and where possible retrieve information and continually be scanning information depositories to assist me in assignment creation.

It is based on several key elements and each of the elements has a number of tasks that the App will do. Relational databases will play a large role in what it does as it will be interrogating a large number of information depositories on a regular basis.

Mapping

Generate the location of libraries within my vicinity and display them with blue markers.

Generate the location of libraries within my vicinity which have relevant collections and display them with red markers.

Suggest libraries 25 miles and over which have collections of great relevance.

Within the libraries, the location of relevant material will be mapped based on the App reading the library OPAC online, matching this to the digital reading list, seeing what is in and out, putting reservations where appropriate and guiding you to the right shelf location for those that are in.

Information

App allows essay titles and questions to be inputted and App will then look for matches to the question and related to keywords across databases and linked OPAC’s.

The APP will read RSS feeds and scan blogs for pre-set keywords creating a list of pages to view. It will also scan for likely matches based on other searches and present them in an ancillary list.

The App will be logged into specific journals and databases, scanning them for new articles which would be of use or related to key words.

It will read previous essays and dissertations which are stored online (such as the British library EThOS service) to suggest likely works, paragraphs or sections to read.

Using the mobile device camera on a page will copy the information to the App, reorder it in a word processing format and tag the bibliographic details to ensure that these are not lost for referencing purposes later.

Assessment

The App will allow essay to be uploaded and will check for plagiarism and copyright infringement.

Overview

In this way the ‘runner’ will allow the user to set up parameters and will then go off and work on these in the background. When a critical mass of information has been retrieved it will be presented to the user and e-mailed to them as well.

This information will consist of places to visit, suggested books, suggested short readings (blogs, essays dissertations, etc), suggested articles and suggested other (databases, new information resources unclassified, newspaper articles, etc).

When visiting the suggested locations the App will guide to user directly to the right shelf or location for the material and the light beam service combined with the camera will allow easy capture of bibliographic data.

Finally, the assignment can be assessed by the App to check for plagiarism, copyright and associated academic standards.

....

I could also imagine this App being banned by every academic institution in the world shortly after it was created.

Tuesday, 8 November 2011

Ketchup and Mash

So, what has happened since the last update?

As you can see the first assignment was handed in. I passed so that's all good. Some useful feedback from the marker as well. I could have sworn I'd put references in and what I'd read but apparently not. Ho hum. Not a terrible mark but no prizes will be won.

DITA session 5 was on Web 2.0. Even that term sounds slightly old fashioned now. Soon we may move into having named that are iterations rather than numbers as the version number upgrades so much. Web Social, Web Primitive, Web Multimedia, Self Created Web, etc, etc.

We talked of things such as Facebook and Friends Reunited. I made some excellent points in the discussion in the labs, all of which I have forgotten other than that FR made people look like a happy success whereas FB shows that all is chaos and failure.

We had reading week here and the records for that period must remain sealed and not appear on this blog until 25 years after my death.

Session 6 was on API's and web services. We were back to some of our light programming And I saw just how much of it I remembered. Thank heaven for copy and paste.

I did knock up something of a mashup page so I too can start polluting the WWW. I won't link to it here as it is too tragic. The fact that the University IT systems seemed to go down around the time the class was doing this is doubtless a coincidence. Behold our XML skills and despair!

Sunday, 30 October 2011

DITA Assignment

No matter how extensive a library collection is, if the required information cannot be retrieved then it swiftly becomes of much lower value or even useless. Any librarian has to evaluate technologies and manage data to get the best results for their users.

Evaluation

Vannevar Bush’s comment in 1945 that “...The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear...”[1] has become even truer as the World Wide Web (WWW) first became popular and then an unstoppable force on the Internet.

For libraries, the Internet – but more so the WWW - have been deeply disruptive technologies (first changing and then overturning how they previously held, shared and presented data) and while it makes more information more available to more people, the nature in which the WWW has done this has made it often difficult to find relevant information.

Even though the WWW was intended to have a clear structure the 11.45 billion web pages, as of 25 October 2011[2], presents a baffling amount of data. The URL of a page contains a clear structure and can sometimes reveal information about the page – what nation the web page is in, if it is a university, a brand or company name, etc – but it is imperative to use appropriate search methods and techniques and not rely on decoding protocols.

Via search engines Broder’s Taxonomy[3] gives us assistance with Navigational (I want to go to this one place), Transactional (I want to buy this) and Informational (what do you have on this) queries but we also have the option of using keywords (‘attack 911 New York’), natural language (‘the attack on the trade towers’) and the Boolean techniques of AND / OR / NOT. The more specific a library user can be the better and informational retrieval queries such as Known Item (I want this one thing), Fact (what is this one thing), Subject (all examples of this thing) and Exploratory (what things do you have) all help narrow the field.

In a database system like SQL many of the same concepts apply. A library user on a catalogue database may require resources about Subject A but are open to whether it is in book or journal form. Any effective search system must be able to check all the available resources in all formats to see if the search keywords match and then display them effectively. A relational database using SQL is a solution to this as they allow users to search for a variety of linked topics or keywords.

Within the database, there will be tables of information each representing a different ‘thing’ such as author name, ISBN, etc. By matching a Primary Key(s) against a Foreign Key(s) the database (hidden behind a Graphical User Interface or GUI) searches through the different tables and presents the results which match the search query. As each bit / thing of data is kept in a separate table, inconsistencies should be kept to an absolute minimum as the same data (author name) exists only in one table instead of several. In this way, complex searches can be completed and the right book found from a catalogue containing millions of volumes.

For example, in a (fictional) SQL database for a library the command

select * from author where name = "kumar"

would return all the books with the author Kumar.

For a more complex search the command

select title, author from titles where title like "%medicine%"

would show the title and author for all books with medicine anywhere in the title.

Tim Berners-Lee said “One of the things computers have not done for an organization is to be able to store random associations between disparate things...[4]” Information retrieval systems like SQL databases allow complex and multifaceted searches between related things (publisher, author, year of publication) but they do require effective interaction, instructions and strictures to perform these tasks. Hence the need for the searcher to use the most effective search techniques.

Managing

Despite the advantages and opportunities that have been identified above, search results always must be assessed rather than being taken at face value. Similarly, the techniques and technology available need to be managed correctly to get the best results.

Within an SQL database for example, a robust entity relationship is vital. If there are not unique identifiers for the key(s) then a search will fail or contain unnecessary results. If the tables contain more than one ‘thing’ (e.g. author title and publisher) then searches will fail or return invalid results – all books published by Elsevier regardless of author as opposed to all Elsevier books by Kumar for example.

Similarly, an SQL database must be asked the right questions in the right places. If the system is being accessed with a basic GUI then unless the grammar and syntax are perfect unreadable, incorrect or irrelevant results will occur.

On the WWW the question of relevance is even more pronounced and the value of what is returned is critical, as what the user is searching for may not have a Yes / No answer. A simple calculation[5]

Relevant documents retrieved

divided by

Total documents retrieved =

quantitative evaluation of Relevance.

can provide a quantitative evaluation of how successful the search was and allows us to compare search techniques and engines.

The web pages themselves must also be carefully managed if they are to be accessible and searchable with the protocol HTTP (WWW) , the DNS (city.ac.uk) and a path to the server folder (/library) making a clear and logical structure. Within the web pages Hyper Text Markup Language (HTML) allows the user to move between pages and out onto the WWW. Efficient use of HTML is how a librarian can create a more relational feel to web pages by linking in a more ‘human’ fashion

The library user must always evaluate what has been returned and not just accept anything. There can be a binary answer for relevance, but a graded return is more likely as results will be objectively or subjectively relevant.

The WWW and HTML have limitations however and despite all the efforts made a true universal standard does not yet exist and due to differences in HTML different browsers can interpret things differently, an example being problems with viewing a Internet Explorer designed page on Firefox.

As WWW search techniques and engines become more complex and subtle there may continue to be a place for Boolean and other techniques, but perhaps this will shrink as search engines become more sophisticated. Although it will often depend on what field the library user is working in, natural language and keyword searches can and do return highly relevant information.

Blog address: http://atthebookface.blogspot.com/

[1] Bush, V. (1947) As We May Think, Atlantic Magazine (Online) Available at: http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/3881/?single_page=true (Accessed 24 October 2011).

[2] The size of the World Wide Web (The Internet)(Online) Available at: http://www.worldwidewebsize.com/ (Accessed 24 October 2011)

[3] Broder, A. (2002). A taxonomy of web search SIGIR Forum .(Online) Available at: http://www.sigur.org/forum/F2002/broder.pdf (Accessed 23 October 2011)

[4] Berners-Lee, T. (1998)The World Wide Web: A very short personal history. (Online) Available at: http://www.w3.org/People/Berners-Lee/ShortHistory.html (Accessed 28 October 2011)

[5] MacFarlane, A. (2011) Session 04 – Information Retrieval INM348: Digital Information Technologies and Architectures. (Online) (Accessed 28 October 2011)