Technology Treason: web

Showing posts with label web. Show all posts

Sunday, 26 July 2009

Case Study: Django + Agile = Sportsgirl redevelopment

I've decided to write this one up because there isn't much on large scale and high speed Django development as yet and this is all still fresh in my head so it's worth getting down on paper (or screen as it were).

The agency I work for, Citrus, works with Sportsgirl, an iconic Australian Fashion Retailer and we were commissioned to help them build a community component to their site to help create a social shopping experience. The store was already there and was built as a bespoke Flash / .NET application and we had the opportunity to sit this on a different box in the data centre anyway. We thought this would be a fantastic opportunity to use Django and is exactly what it's designed for.

Architecturally we are using a LAMP stack using RHEL 5, Apache 2, mySQL (yes I know but it's to do with hosting) and obviously Django. Process wise we actually use an agency version of Agile that allows a collaborative effort between Designers, Application Developers and User Interface Developers.

Overall we were on a fixed deadline that meant the production phase was less than 8 weeks from sign off to go live including production of the site, interface and design then lock downs for content population and testing.

To make this work, everything was based around the platform - we chose as a base Django 1.0 and then layered into it a stripped down version of Pinax (we currently use v0.5.1 - the current official release, with updated apps) that has user profiles, avatar and gravatar functionality, photologue photo / image management, blog, pyBB forums, user voting and commenting.

With an established platform all three teams could start working concurrently much more effectively. This is one of the biggest benefits of Django and working to a framework and platforms like it because code can be prototyped so fast to a development build that everyone can see what they have to play with - Thanks to @jtauber and team at Pinax for that as well.

From there it was a case of lots of designing, interface creation, development and review to get it into it's final state ready for testing.

During this time we also worked on the flash home page produced by our flash master complete with nice collision detection, and full modularity so maintenance on this is all about creative not about development every time there's a refresh (very often on this brand). We'll cover this in more detail at some point.

The final phase saw deployment to the live environment which we did in Amazon EC2 for launch. We did this primarily for scalability reasons as the launch was going to be pretty large and promoted both on and off line.

As part of our final testing we also performed a lot of optimisation, this was based around optimising queries Django was making to the DB on both ends and we also then rolled out our delivery optimisations.

The first part of this was to implement memcached which is simply one of the best pieces of software presently available for data driven applications. On launch day we had a cache hit rate of over 80% which meant only 20% of all possible queries were going through to the database. With a couple of hundred thousand people visting the site during the launch phase this was instrumental in keeping particularly RAM usage low on the DB server as well as removing any bottlenecks to the Database due to latency.

We used nginx alongside apache to deliver all the static files on the site (not least because the imagery is so hi-res it was killing Apache to serve it!!). I'd often wondered how well this would work with a reasonably trafficked site but I wasn't disappointed. nginx dropped the load off the apache server which struggles for both CPU and memory (even with static files served outside of Django) from peaks on pre-live at 90% CPU and 70% available RAM + SWAP to 25% peaks on CPU and 30% RAM which is what Django was using to deliver pages with Apache's overhead.

The site went live on July 8, 2009 coinciding with a very large in store, off line and online campaign that drove quite a bit of traffic to the site. The server functioned exactly as required and with the optimisations peaked at only about 60% utilisation.

Overall this was a great project to work on not least because of the Agile process coupled with a technical foundation that allowed us to work even more collaboratively. 8 weeks for a major site launch is hard work for everyone at all levels no matter what their involvement. A great team helps with this but having the benefit of fantastic Open Source platforms to get our clients into market makes this even more achievable. Even less than 2 years ago I'm not sure I'd have attempted what the team achieved.

Wednesday, 22 July 2009

The Golden Age of mobile? Soon maybe...

Some would say that it's already been in the heady days of GSM Data and WAP, some would say it stalled when European clients pulled all funding from mobile internet apps in the post-dot-com-crash GPRS days, some would say that with the advent of the iPhone we're there in all it's shiny-coverflow-enabled-finger-waggling-goodness.

It seems like every second person is now weilding some kind of internet enabled device and in Europe and the US the penetration is even higher than Oz although we are racing for a frontline position showing that reasonable access is more important than either coverage or cost.

Half way through 2009 it's interesting to look at some of the predictions for this year - particularly where mobile is concerned and take a quick stock.

The big 3 (Apple, MS, RIM) of last year are now well and truly the big 6 with highly competitive offerings from Android who we all knew had aspirations, the re-emergence of Palm with a life-recharging elixr known as Pre and of course Nokia firmly touting its Maemo platform that's been in development for many years and is arguably the most stable and feature rich of all.

Costs for data access across the globe are plummeting with Vodafone in the UK offering the first truly unlimited data packages on phones, showing we live in a commodity market that is almost free. EU laws limiting the charges for call and data roaming will see uptake rise as people start using their phones across countries as well.

Applications obviously make up a huge part of what our mobile experiences are like and I think if anything 2009 will go down in history as the year of the widget or micro app. Whilst iPhone still only supports Objective-C and Cocoa and their iron control is starting to hinder their progress on this front there are enough people keen to try and make a buck that the ecosystem around applications is phenomenal with over 50,000 available at last count.

Nokia, MS, RIM and Palm all have app stores however these are fledgling compared to Apple's and of all the other players Android is the only one that can be considered a contender with approaching 20,000 apps available, the vast majority of which are free. Android has a very hands off approach to this so its interesting to see what makes it through compared to Apple's more militant approach. Being Java based is also helping Android be the largest growing development community too as it's super quick to get up and running.

So where will we be in another 6 months? Will we look back and think 2009 is where it all started?

I think it's a little premature. We are really at the start right now. Much of what we are doing on phones right now isn't much more than we were doing 5-6 years ago just with a bigger screen and prettier graphics.

My money's on 2010 when we see a real rise of Augmented reality applications hit the phones. This is the area that will truly show what carrying the entire Internet around in your pocket can do and has been the spur for this part of computer science this year where it had languished for over a decade.

When my phone can alert me when my friends are nearby, interact with environmental sensors, buzz me when a store within 500m is having a sale on an item I'd previously shown interest in, automatically adjust its settings dependent on where I am and the privacy level I want to adopt and filter all of the information on the Internet into a 3 inch screen in a way that is contextual and meaningful then I think we'll consider the Golden Age has started.

Monday, 28 April 2008

Easy product or class rating system

So you've got a lovely little ratings system going on your site. All of a sudden though you get slashdotted, dugg or just your marketing starts working and you have thousands of users all rating your products / services / systems / posts / videos etc and your pages start to creak.

"It's the shared web space you're on," say your techies, "it can't handle the users" and duly bounce you to a better hosting environment at triple the cost along with the migration charges.

From time to time I come across this problem when I've either picked up code from someone else or else a techie asks me how to optimise a page that's running really slowly. In this particular instance it was caused by a ratings system in the style of Amazon or YouTube - basically a user is displayed a product and then people rate it as to whether it's any good. The real problem came when they had a list of products, each of which had it's individual ratings displayed.

The cause of this very slow page however had nothing to do with shared hosting or otherwise or direct server load - it was all down to some naive coding executing what my old CS lecturer would call an O(n)² process.

What the coder had done was get a list of products, then for each product gone back to the database and got a list of all the rankings ever made and then averaged them out. Nice and simple but frightfully inefficient and that which caused the problem I've highlighted.

This isn't the first time I've seen this and I've been asked how to build them numerous times as well so here's a well optimised method of doing it in general terms.

Consider first that calculating the average when you insert into the database is going to be computationally less expensive than calculating it every time you perform a select when a user hits the page. This sounds obvious but it's stunning how often it's overlooked.

Make two extra fields for your product table, one called average and the other called user_count or something. On your insert of the rating into the ratings table, run a trigger or else add some code that will update the product table with the updated count and a new average calculated from the ratings info.

Now when you select the product data you pull down the average and user count as part of that select and they are just simple static fields, thus adding no more computational load than the original select or view does already.

This gives you a nice little rating system that's not heavy in terms of processor load. However we can improve things once step further if you aren't interested in the data.

The option I'm providing below is good if you are just after a running average and don't care about the individual ratings being kept. I did a project recently where we weren't worried about keeping individual ratings data because the site wasn't going to be up for very long and it didn't add anything to our system to have it.

This option uses a running weighted average in order to just update the data in the product table without requiring a ratings table at all.

Some useful background maths though:

If I have a set {3, 4, 4} and take it's average I need to add the numbers and divide by the number of entries. Thus this set's average is (3+4+4)/3 = 3.67

Now suppose I've precalculated this average as I've suggested above and stored it without the individual ratings, I now want to add another rating, 2 to the set.

Intuition says to do something like this: (2 + 3.67)/2 = 2.83 which is actually wrong. Looking at the set {3, 4, 4, 2} we can guestimate that the average is going to be somewhere more between 3 and 4 than it is 2 and 3 as we've calculated above.

Thankfully a technique from statistics gives us an option here which is to use a weighted average instead. This is useful for adding sets together that have different numbers of elements within them but maintain the averages by skewing the data using proportional averages (or a weighted average).

The general formula for this is:

Avg_w = (Avg₁ * (n₁ / (n₁+n₂))) + (Avg₂ * (n₂/ (n₁+n₂)))

Where:

Avg₁ is the average of the first set
Avg₂ is the average of the second set
n₁ is the number of elements in the first set
n₂ is the number of elements in the second set

In our example this simplifies even further because our second set is actuall only one item. So let's work this through:

Avg_w = (3.67 * (3/(3+1))) + (2 * (1/(3+1)))

= (3.67 * 3/4) + (2 * 1/4)

= 2.75 + 0.5

= 3.25

Which is the answer we're after for our average.

As we know all the base line average data in the product table and we know the value of the rating we're tracking, it's a very simple function to update this instead of doing another insert into a ratings table and we just keep on doing it for every rating that has been added.

Computationally this is a very inexpensive process and whilst I'm more than happy to be shown otherwise I think this is about as good as it gets in terms of optimisation.

The key thing is we've now reduced an O(n)² operation to O(n) which is a drastic improvement as n tends towards infinity.

Tuesday, 22 April 2008

Phorm over function?

Phorm is, and will continue to be for some time I think a hugely divisive issue online. BBC have another story today about it, this time having spoken to the various security companies like F-Secure, McAffee etc about whether they will flag a message to the user about whether Phorm has been enabled or not.

Phorm management have come out saying "it's only a cookie", the same as many other sites use to provide tracking (such as Google Analytics), interactivity (such as shopping carts or ID maintenance on numerous retail sites), or a small amount of memory (configuration information for the BBC home page for example).

The difference, though, is that the information is being used differently because data is being shared.

This is what got the Information Commissioners Office's back up because sharing data between companies without users opting in is a breach of the Data Protection Act - "But not if it's anonymous data" say the legal eagles from Phorm - and technically they are correct. This is a case of adopting the letter of the law rather than the spirit of it.

Tim Berners-Lee came out saying he would move ISP if he found out they were using Phorm and whilst I admire his line I fear the vast majority of consumers won't care or rather just won't be bothered to switch - just see how many people actually switch bank or utilitiy companies.

For me this is a case of the slow erosion of privacy at the hands of our ISPs. In a massively competitive market where margins are being squeezed ever tighter, the sale of their user data to Phorm must have seemed like the proverbial golden goose.

It won't take long for someone to cotton onto the flip side of this and market aggressively on the privacy front. Talk Talk made huge inroads as an ISP on the back of their "The Internet should be free" campaign with regard to price (being bundled as it was with other services). Who will be the first to play the "Internet should be private" card and sign up to a deal not using Phorm or other tracking software?

In my cynical world view, I think the security firms have realised this and it is 99% of the reason for why they are looking at it all as the anti-spy, -mal and -virus software is worth billions.

In real terms Phorm isn't actually that clever a piece of technology - most of what has been achieved is in the brokering of deals between ISPs and content owners and then a bit of clever gluing in the middle.

In the end Phorm will either be a great white elephant and just slip off the radar the way many technologies and companies have done or else it may actually be a spur to drive privacy legislation forward in line with our digital behaviour - how long it will take to do this however is the question as government is typically a long way behind technology in terms of law-making.

Monday, 21 April 2008

Can Yahoo really get things so wrong?

Update - The guys at Yahoo came to our rescue after tracing through the "network" somewhat to find someone that knows someone at Yahoo to help us out. Unfortunately their techies couldn't explain why we'd been bloack listed either but we are now officially on their whitelist so big thanks to the guys for helping us out.

Yahoo are one of the original dotcoms. They've been around for a long time so they should know their business. Imagine my surprise when one of my clients starts complaining that their confirmation emails to yahoo email accounts are permanently being binned as is everything else they send - including personal communications.

Like most mail providers, free or otherwise, Yahoo have a spam policy that will look at an inbound email and then drop it in your inbox or spam folder depending on how it is classified.

As with most techies I have about a dozen email addresses at various providers in order to test exactly these sorts of issues. Especially given that the goalposts are changing all the time.

Sure enough even a personally addressed confirmation email was killed as it came into my yahoo account. "Ah ha," said I, "they've been blacklisted". So off one goes and checks the various blacklisting sites and there's nothing there. Hmmm.

It transpires that yahoo have just taken it on themselves to block that domain. Weirdly though, a personally addressed mail to me from the client with only the word "test" in the subject line is still considered Spam yet an email from some random address that doesn't reply, containing several instances each of the words "penis", "cock", "viagra" and "cialis" made it through to my inbox completely unscathed. At this point the phrase about arses and elbows definitely comes to mind.

Trying to get Yahoo to do anything about this issue is similarly problematic as there are no feedback channels to deal with this problem at all.

So overall we've just had to advise people to not use Yahoo or to check their junk mail periodically and read the mail there.

Tuesday, 12 February 2008

Why industries can still be revolutionised on the web

I'm a bit of a cynic really. Anyone that's trawled through the depths of this blog will know that I have a fairly acid tongue when it comes to technology. I am a walking example of the phrase "familiarity breeds contempt".

One of the projects I've been involved in rececntly has started generating press just by virtue of it being better than anything that has preceded it in this particular industry - I personally would have preferred them to be commenting about the content but any press is good press as they say.

By rights I should have a nice warm fuzzy feeling about having a site people talk about and it's always great to receive recognition for a job well done - especially for my more junior staff who have worked damned hard on the site - however it is disappointing that we still exist in an age online where just applying some good design, good information architecture and some well balanced technology is enough to turn an entire sector on it's head.

Apologists will hold up their hands and say "we're a young form of media - it's going to take time". I however am not in this camp - how much time do we need?

Personally I find it untenable that there are still sites being built using non-standards based HTML and CSS, that sites beyond a couple of holding pages are built using things like Dreamweaver and not content managed, that good structural web design is something that still amazes people rather than being the norm and that information architecture still hasn't found its way to the hearts of 95% of the digital agencies that service the web.

I am constantly lamenting the state of most industries' websites generally. Take a tour around the leisure industry and find a website for a hotel anywhere in the world. Look at most ecommerce sites for even big retailers and certainly go anywhere online in the government, volunteering or political sectors and you are sure to be assaulted by bad design, bad technology and most importantly bad information architecture.

Even five years ago there were excuses that bore merit - changing web standards and platforms, variation of internet connection speeds and different levels of web penetration in different markets. These excuses don't exist any more. And to be honest why was it when I was learning my craft as a developer all those years ago that I was told about things like usability, information design and later information architecture but the junior developers and designers now are not...

This is why there are still industries to revolutionise if you have the contacts, the desire or the contracts to do it. Here is my short list of the biggest problem industries:

1. Tourism and leisure - get some good design and photos, don't use bog stanard templates and for goodness sake stop sending my credit card details in unencrypted email.

2. Holiday / travel booking - get some fuzzy logic in your scripting. If I can't fly tomorrow but I can fly the next day tell me without making me guess. Also make it easy for me to bounce back and forth between different trips without having to start again. Remember all those lectures about how to maintain the state of a system in Computer Science... this is what they were for.

3. Retail - Keep your site updated with accurate stock levels. I also shouldn't have to go to the end of the check out process to find out what the shipping charges are. Do a detection on my regional settings or IP address and take a best guess and say it's a guess. 95% of the time you'll be right and I'll stop having to go back and forth.

4. Service Media - When will you learn that a flash site turns off most people as does a splash page. At least have an alternative HTML site so I can find your phone number / contact email or address. Also remember that table based design was around in 1997 - time to get with the times guys.

5. Volunteering / politics - Yes I know you are on a budget but just because someone you know or your favourite intern just happens to have a copy of dreamweaver doesn't make them a professional web designer or developer. More harm than good is done by casual development - find some budget, find someone aligned to your cause and they'll do it cheaper or for kudos value and develop a site worth looking at.

6. Government - Just because a turd is shiny doen't make it worth anything. Above all make sure someone in the procuring department knows the difference between HTML and CSS and you won't get shafted. Government expenditure online is extortionate for the value achieved. Given the amount of paperwork done for any bit of government work it is amazing that Information Architecture isn't put right to the centre of the brief... how many people using direct.gov.uk would that help?

So get stuck in and lets see some other industries and sectors turned on their head. It's about time the biggest information resource in history got a bit of a spit polish and had all the kinks straightened.

Thursday, 31 January 2008

The state of Oz technology

Well rarely does an entire country entice me to start ranting (and at this point I'll point out I am in fact Australian) but by crikey Australian technology hasn't really moved in the last 5 years.

Now I appreciate this is a sweeping statement and I'll point out that the technology I'm talking about primarily is media based - mobile / web / internet. I have also had the benefit of living in London for the better part of 10 years so I've been at the hub of what is going on.

What I don't understand is why is it that for a nation that was at the forefront of new media ten years ago are we now in a position where nothing has shifted for the last 5. SMS is still massively underutilised and the idea of an SMS shortcode in Australia is a joke - 8 digits is only 2 shorter than a mobile number so is hardly short! Indeed everything to do with mobile is still more expensive, slower and less polished than we are used to in Europe. I went to Vodafone when I got here and asked for a pay as you go sim card for my phone that had pay as you go data on it... I was met with blank stares - Telstra and Optus were both the same.

General Internet access is similarly expensive and slow compared to what we are used to in Europe. Given a relatively modern telecommunications infrastructure, why telcos are flogging the ADSL route instead of fibre / cable begs the question of why so many roads were dug up in the capital cities to facilitate this in the late 80s and early 90s.

What is also interesting is the lack of FOSS out here. Linux is relatively popular but no where like it is in Europe. Indeed corporate America has it's laser telescopic sight firmly trained on the Australian market and even getting Linux hosting is no where as simple as getting a site hosted on a windows server. Linux certification and knowledge is still seen as a specialist skill.

Overall I'm disappointed that Australia hasn't maintained it's lead in internet technologies. In part people like me are to blame for starting our careers here and then being drawn to the brighter lights of the UK and the US where visas are easily come by, pay levels are higher and the ability to work on cutting edge technologies are plentiful.

Perhaps we are on the verge of a change in Australia and I hope that some of the ground lost can be regained over the next five years.

Wednesday, 9 January 2008

The warm glow of site launch

I've been in this game a long time but there is still nothing sweeter than launching a site after spending a months building it with your team and the client. As a TD, site launch brings a mix of emotion - fatigue from the lack of sleep for the 10 days prior to launch, relief that the site is launching on time and on budget and the client seems happy with it all and finally worry about whether the thing will work as expected, what will everyone else think about it and by god I hope the server doesn't fall over on Day 1 under load...

My grandfather was an engineer for Philips and he described to me the same feelings when they were launching a new product so I have a sense that irrespective of discipline, team based endeavours in engineering always foster the same heady mix of emotion fuelled by relief, adrenaline and fatigue.

Whilst I am an old hand at this within this industry these days, having been here since the dawning, it is great to watch members of the team for whom this is the first of many site launches in their career and their happiness that it is done and their complete pride in their work.

Having seen photos of workers completing railways and other major constructions in the 19th and early 20th century one can't help notice the parallels of young engineers completing a job regardless of whether they are working with steel, glass or lines of code.

Friday, 21 December 2007

My top 5 jQuery seasonal wishes

I've waxed lyrical about jQuery before, I've been using it a lot to do worker code which I just can't be bothered to hand write any more. Not least because jQuery handles all the little browser inconsistencies for me so the code I actually call into a page is infinitely more maintainable, especially if someone follows behind who maybe isn't so up to speed with JavaScript as I am.

However, use a tool for long enough and closeness breeds contempt as they say. In this vein (and regular readers will know I don't do complimentary very often) and in the spirit of seasonal "Listing programmes" of every style, these would be the top 5 things I'd like to see incorporated into jQuery in the next year.

5. Documentation - Starting off slowly and easily I'd definitely like to see some better documentation. Ideally I'd like to say that new sublibraries aren't included until their documentation is properly up to scratch. Some areas are very well documented other areas are sketchy at best.

4. Wait(msecs, callback) - part of the effects sublibrary, we have all kinds of effects to enable objects to slide, fade and animate but we don't have a wait command. What I would give to have a command that you can just append to a sequence of animations and then wait for a period of time before calling another function or stepping to the next instruction.

As you can see from my jQuery Slideshow the common way to do this is to call animate() with the same instruction as your last step with a callback. It's not big or clever but it does the job.

3. fadeToggle(speed) - again part of the effects sublibrary; we have slideToggle which is a great bit of code, call it and the object either slides open or shut depending on it's state. It would be great to have the same thing with fade rather than writing detection code and then calling fadeIn or fadeOut.

2. State detection - Another worker function would be really useful here to actually determine the state of an object as to whether it is on or off in display terms. I am fully aware I can use document.getElementById(objname).style.display or equally $().css.display() however this will return "none" if it's off, but it could also return "block inline table table-cell list" etc depending on what it is.

Ideally I'd like $().displayState() and it would return "on or off" or indeed true or false as a boolean so it would make display code even easier logic wise.

And finally,

1. Cast to DOM object. One of the best things about jQuery is it's query language. Using elements from the CSS and Xpath specifications pulling objects out of the document is so much easier than using DOM traversal methods.

However sometimes the jQuery functions just aren't enough and we need to cast an object to real JavaScript to play with it - a simple method of doing this would mean the power of a great interrogation language along with the ability to cast to a real DOM object.

I fully expect someone to come kick me now telling me I can do some or all of these things and indeed the functions I'm asking for exist aleady however the documentation as mentioned in number 5 is lacking in some areas so it isn't obvious if it is doable.

Obviously this is a little toungue-in-cheek as if I was that worried about these issues I'd write the code myself and submit it to the team for inclusion in the next version. Indeed perhaps that could form the basis of one of my New Year's technology resolutions.

Happy Holidays all.

Wednesday, 19 December 2007

SMS Bamboozlement...

I'm doing some work for a client at the moment who's industry is particularly technophobic. The absolute cutting edge is a bit of YouTube video thrown willy nilly into a page. I'd also point out that design is something that rarely makes an appearance in this particular industry.

So it was pretty refreshing when we went to them with a series of ideas from the more commercial sectors of New Media and one of the things they latched onto was SMS. Queue annoyance though when we had already got everything ready to go other than to push the big green "launch" button and another company got involved and started talking about location aware services and high end data capture etc.

At this point the client dissolved into a mess of indecision - "Why weren't we doing all of this?" was the question, to which the answer was "Because you don't need to - primarily because your text messaging service is built around raising revenue through donations!"

I've had this happen in the past, notably with SEO companies. I do pity the poor clients who get stuck in these situations where they've finally decided to push their technology base along but then get waylaid by all the glittery, flashing and hypnotic LEDs.

At the end of the day it is important to remember why you are doing something and not get sidetracked (and not get ripped off). Once a strong foundation of technology is laid there is always something new you can build - you don't have to have every shiny present under the tree to have a great christmas.

Tuesday, 11 December 2007

.NET / XSLT and how to import an external XML document

I work with XML and XSLT every day of the week. Indeed working for a company called XML Infinity you can imagine how much we use it. I had one of those incredibly frustrating moments this afternoon that one typically when dealing with badly documented parts of .NET or XSLT.

The annoyance in question was to do with loading a document in to an XSL template on the fly. 99.9% of the time you don't bother with this as you have a master XML document which you transform according to the XSL template that is assigned to it. All your XML processing is usually done before you get to this point.

There is an xsl function though called document() which you can use to load in an external XML doc to the XSL template and then do work on it. I've used this before but the damn thing wouldn't work. Why not? Because our Transformation Engine wasn't using a loose enough resolver to be able to deal with externally referenced files... grrr. I know why MS did this because it's so the parsing engine doesn't go loading every document under the sun and potentially crashing.

That's great but they could have documented it a bit better.

The resolution by the way is to create an XmlUrlResolver, give it some credentials (in my case setting it to DefaultCredentials which allows you to access http::, file:: and https:: protocols) and then pass that into your Transform() method.

Job done.

Not quite.

Having finally been given access to an external XML document I then had to contend with XSL's arcane methods of dealing with XML fragments. Again documentation was the issue here.

Looking online there are some ridiculously complex ways of parsing an external document when by rights it should be as simple as just dropping the doc in a variable and then processing according to the variable. People were using recursive templates using xsl:copy and all kinds of things.

Turns out the way to do it is a little known second parameter.

If you do this:

<xsl:variable name="var1" select="document('http://example.com/file.xml')"/>

All you'll end up with is the text nodes. Not very useful.

If you do this, however (note the second parameter):

<xsl:variable name="var1" select="document('http://example.com/file.xml', /)"/>

You'll end up with a full fledged XML document complete with nodes and everything put into your $var1 variable and you can then use it to select data according to standard XPATH constructs.

If you don't want the whole document you can pass the second argument as an XPATH query and it will just return that nodeset - much easier to deal with.

In all the time I've been dealing with XML / XSL I didn't know about this and it was a great pain to figure out. Typically the only reason I was doing this was to mock something up for a client quickly and it then turned into a mammoth effort. Knowing now though will save time subsequently I guess...

Monday, 19 November 2007

Fuzzy logic could book more flights

I've talked about fuzzy logic for use by the retail sector in the past and the project I'm involved in there is maturing nicely. This week I've really realised how as software engineers we need to grasp the nettle and move a lot of service based software toward fuzzy systems for usability reasons.

Nearly everyone these days has booked a flight online and when it came time to booking a holiday to Australia this winter, the first thing I did was fire up a browser and head to expedia and travelocity.

If I was planning to fly on specific dates I would be well catered for and I could get a list of prices and book a flight in a few easy steps.

I wasn't planning on flying on a specific date though. I work for myself so can take time off whenever I want in a general sense. Really what I wanted was the cheapest flight from London to Sydney in December.

After typing a few different dates in manually I did the sensible thing and called a human travel agent who was very helpful. Unfortunately, as helpful as she was, she only had access to the same systems I did so couldn't tell me the info I needed to know. Mentioning this to friends had the usual "you can't do that" response. Can't do it?! I'm the customer I can book when I want.

All airlines operate through the SABRE booking network which is basically a masive database of flights from point to point with availability and prices per leg on it. It sits on top of a nice mature API which makes it easy to program against and so that's where the developers leave it.

But as a customer this doesn't fulfill my requirements and this is where engineers need to spend more time thinking fuzzy.

In these days of multi-processor and multi-threaded OSes it is not that difficult to build offline agents that could go and find this information out for a customer and then email it back to them. Indeed I wouldn't mind registering to use this sort of service so now the company has my personal details and they can market to me.

The agent wouldn't even need to respond with all the availability. It could just give me the cheapest 10 or 20, all from a specific operator etc or those flights routing through Hong Kong as a stop over for example. It also doesn't need to be fast. A deprioritised thread could take a day to get this sort of information and if I'm being that vague then time is hardly an issue.

If someone reads this from the travel industry please ask your techies to build this feature. If you are a venture capitalist then give me a call and we can revolutionise the online travel sector!

The web has brought us an always on, on-demand, serviced-based method of interacting with our information but the casuality of this has been flexibility. The days of fuzzification are soon to be upon us and coupled with automated agents some amazing new systems will become available that will give us back our flexibility.

Monday, 5 November 2007

Bye bye OpenMoko

Google announced today that they would be partnering up with a load of other companies including Samsung, Motorola and LG to produce a new phone "software stack". For those of us in teh technology game this basically means Google plans to release mobile phone operating system to rival that of Microsoft, Symbian and the various Linux flavours out there already.

What I find most annoying about this is that Google has for years now feasted upon the fruits of the Open Source Community, using many of their projects to enable additional features and indeed their core search facilities to work. While it may be argued that the Summer of Code gives back to that community, there is a sense that rather than sponsoring an existing project like openMoko (a Linux based, open source version of what Google has announced) they've decided to go out on their own and start from scratch.

Given Google's tremendous resources it won't be long before we see the platform hit the market.

Within the commercial market there is already Maemo (nokia's Internet Tablet platform which they actually open sourced) and QTopia, a commercial package available on the GreenPhone which is a development kit and is mostly open source too.

My guess as to why Google didn't run with any of these options is that there are already thriving communities surrounding them and trying to work with these existing communities makes it difficult for the Google techies to throw their weight around.

Hey ho. As a developer, mobile development is already a nightmare having to support various versions of Symbian, MS Windows Mobile, BlackBerry as well as smaller (but vocal) numbers os Maemo users we are now having to think about iPhone from Apple so adding "Google Phone OS" isn't that much more work.

For me, having had a mobile phone for the better part of 15 years and having had a data capable phone for nearly 10 years I've watches OSes come and go, killer apps be talked about every 6 months and watching the market mature the only two things ever to take off properly on a mobile was SMS and now e-mail.

I've got an E65 nokia and it is the best phone I've ever owned. Why? Because the web browser works seamlessly on standard web sites and the email is easy to use, even without a full keyboard. Oh, and it doesn't crash as do most of the rest.

Spending all this time and money in my opinion by Google is absolute folly, but then they have virtually limitless cash reserves and they have a staff of many thousands across the world that they have to retain doing something - they may as well be making a phone OS as anything else.

Who knows this might end speculation that we are about to have Google OS on our desktop next year as well.

Saturday, 3 November 2007

CSS Structure - what a mess

James posted a message on my blog some weeks ago and it's only now that a penny has dropped in my mind about what we need to deal with the issue of structure in CSS - the problem is we have none. As James points out you end up with a flat mess that with all the best will in the world definitions are hard to find.

I've ranted before about the annoyances of CSS - particularly to do with the lack of variables or constant definitions without recourse to server side scripting and about the nature of the W3C CSS working group not being well represented by techies - especially as CSS is nearly a language in its own right the same way Regular Expressions are.

As the web building fraternity finally weans itself off of dreamweaver and table based design and adopts a more semantic, HTML-lite way of building sites, the CSS files are getting bigger and bigger all the time.

At the moment, to get a degree of specificity one has to redeclare selectors:

div#header div.nav ul {}
div#header div#logo img {}

for example.

Many CSS zealots would say "But you can get rid of div#header altogether" and I can in this instance but what happens if my div#logo doesn't appear in div#header on a page and certainly it's not uncommon to have navigation in a header as well as a sidebar.

As can be seen, in order to get specificity we increase verbosity. Anyone that is fully converted to CSS design will tell you this, it's the casual "div stackers" who just declare a new class for every element in the document which ruins the HTML.

My solution then W3C if you're listening is this. Cascade in the style sheet, cascade in the CSS file.

In many programming languages there is a keyword to get you down to the level of an object that you are going to manipluate numerous properties of in one go for example "with" in VB. Thus I could say:

with myobject
.property1 = x;
.property2 = y;
endwith

The CSS equivalent would be:


div#header {
  div#logo img { css }
  div#nav {
    css
    ul { css }
  }
}

This gives you the specificity required, removing the redundancy and creates a cascade like structure to the document that would also make things much easier to debug what is going on.

Structural CSS along with variables would make a massive contribution to the developmental side of CSS as a language that could revolutionise the way we use CSS with the web.

Sunday, 14 October 2007

JQuery Slideshow

It seems JQuery is definitely gaining some traction as a useful library - not least because of the development of the ThickBox Gallery library by Cody Lindley which is seeing huge amounts of use around the web at the moment as a means for displaying galleries for product or photos without being constrained by the page template you are building for and by maintaining the semantic integrity of the HTML you have put into the page. The last cool feature is that you don't have to use the dreaded pop up which brings into play the whole pop-up-blocker issues.

It seems redundant to talk about the ThickBox stuff other than to say it's a great bit of kit and well worth checking out if you need gallery display functionality, I've got my own little bit of JQuery code to document here.

This came about due to a client wanting a gallery then not wanting a gallery because they didn't want to maintain all the thumbnails etc and so it evolved into a "slideshow". They didn't want to use flash due to the cost, but they were already using JQuery for other parts of their site anyway. As such I decided to have a go with building a JQuery slideshow with the animation API.

For this example I'm assuming some degree of javascript familiarity so I can get to the guts of the code.

Obviously you'll need the JQuery library - I'm using the current 1.2.1 version that is compressed so it's a light download.

Next up we need a page with an image in it with an an id called "bigimage".

We also need some javascript to set up an array with the image names in it that we want to load so let's do that:

var imagearray = new array("image1.jpg", "image2.jpg", image3.jpg");We need to trap the moment the document becomes ready to work with so we set up the special document ready function:

$(document).ready(function(){
// now we get the image and attach an onload function to it.
$("#bigimage").css({opacity: 0});
var theimage = document.getElementById("bigimage");
addEvent(theimage, 'load', anim, false);
});

What this function does is set the opacity of the image to 0 (ie invisible) then we get a reference to it in standard javascript and finally attach an event to it which fires on the onLoad event for the image (more about this in a minute).

The addEvent function is given below and is a worker function to add an event handler for a particular object.

function addEvent(elm, evType, fn, useCapture) {
if (elm.addEventListener) {
elm.addEventListener(evType, fn, useCapture);
return true;
} else if (elm.attachEvent) {
var r = elm.attachEvent('on' + evType, fn);
return r;
} else {
elm['on' + evType] = fn;
}
}

Why do we want to add an event for the onLoad of the image? The answer to this lies in how we want to do the animation. Potentially we could have hundreds of images in an array. This slideshow fades an image in, displays it for several seconds, fades out, loads the next image and starts again.

By trapping the onLoad event of the image we can use this event to start the animation sequence which finished with an instruction to load the next image. Only once the image is fully loaded does the sequence begin again.

So our Document Ready method sets up the onLoad event handler, anim() which is listed below:

function anim() {
$("#theimage")
.animate({opacity: 1.0}, 1500)
.animate({opacity: 1.0}, 5000)
.animate({opacity: 0}, 1500, "linear", animNext);
}

This function is called every time a new image has finished loading, bringing the image from 0 opacity to 100% over a 1500 msec interval. Next it holds the opacity at 100% for 5 seconds and finally fades out over 1.5 seconds after which is calls the function animNext().

animNext is a function that deals with determining the next image in the sequence (in my case, wrapping back to the start if we get to the end) and then displaying it purely by changing bigimage's SRC property. This is pretty straightforward JavaScript so I'll leave it for the reader to do.

The key thing here is that by adding an event handler onto a low level object in the document along with a couple of animation commands a reasonable slideshow effect was created which works well for the users and was good for the client as it is maintainable and didn't cost a huge sum as it would have done in flash.

It's the ability of JQuery to expose enough variety of basic features to allow you to do this very quickly and easily. I have no doubt that after 10 years of writing javascript that I'd be able to do this all by hand. The questions are "Do I want to?" and "Is it good value for the client if I do?" - my answer to both of these is "not on your nelly".

Saturday, 25 August 2007

JQuery saves the day?

If you haven't come across it yet there is a javascript library called JQuery which is being developed as an open source project, designed to give us better control over our web pages and the things we can do with them.

Thankfully John Resig, Karl Sedburg and the others have steered slightly away from the profligacy of AJAX libraries doing the rounds at the moment and produced a library that actually deals with some of the problems you face as a web developer or a designer - namely things like clients saying "I'd really like the first paragraph after each header to be blue instead of black".

Now before I get shot down in a burst of "you can do that using classes in your p-tags" I'll say this - I don't want to, I shouldn't have to and it makes for ugly and unmaintainable code. Doing this just papers over the gaping holes left in CSS and makes your HTML even less semantic than it already is.

This is where jQuery comes in. The biggest area of development in this library has been in developing "content selectors" similar to the CSS selector specification. The brilliant thing about these selectors is that we don't have to wait until browsers with CSS 3 in them turn up before we can use them - thus saving us about 5-6 years of waiting time.

I'm a fan of Javascript in small doses - I'm not a fan of large scale AJAX where it is pointless to be loading information that you can get on a click anyway, 99% of my clients want to ensure accessibility and often Javascript breaks that. On a UI where responsiveness is key then AJAX is 100% appropriate but for the majority of sites it's a gimmick.

However in this context we have a javascript library that can add depth to the interface and add consistency and markers that could only be achieved by a lot of proprietary hacks. This benefits usability without sacrificing accessibility and portability. If JavaScript is switched off you lose nothing that wasn't there before anyway; if it is then you get a whole lot more texture to the site.

Watch this space as I think there will be a lot of development on this library over the next 12 months.

Technology Treason