Technology Treason: November 2007

Tuesday, 27 November 2007

Adding Cron Jobs to a QNAP server

If you haven't come across them yet QNAP make these amazing little NAS boxes that are perfect for home or SME use. I've got mine running as a home server but might get one for the office as our old server is on it's last legs and a fully tricked out 1U dell server is a bit of overkill for a glorified file server.

The best thing about these devices though it that they run Linux OS utilising Debian Essential and as such they can be configured to do almost anything you want. Out of the box they already come with file serving, media serving, database and web servers.

One slight problem though is that the boot up process is not disimilar to that of a live CD. This is great in that it makes the system highly robust and it boots to a known state each time. The problem is that short of rewriting the firmware you can't introduce things into the boot process. What I don't want to do is have to re-run a load of scripts to configure the server how I want it after a power failure or forced reboot.

The boys over on the QNAP forums are really on the case and one of the chaps has created a nice little framework script which hooks into the boot process and allows the execution of a series of scripts. You can see his work here.

After installing this workaround you can add scripts to the scripts folder and take control of your server.

One of the things I wanted to do was add items to my cron list and this process is explained below.

1. SSH into your QNAP box

2. Install the custom scripts files at http://www.qnap.box.cx/ as per the directions there.

3. CD to your scripts directory in custom and make a file called joblist.txt in VI (Vi is the only editor you have on the QNAP drive).

# vi joblist.txt

When in vi make your list of cron jobs using the standard CRON syntax.

Mine was the following:

25 1 * * * /share/backup/script.sh

This will run a backup script I had written at 1:25am everyday. You can add as many or as few as you want. Save your document and exit from Vi.

4. Make your script that will fire on start up. I called mine cron_update.sh

# vi cron_update.sh

In there put the following code:

#!/bin/sh
# this script apprends a job list to the existing crontab
echo "Reconfigure CRON list:"
cronpath=/share/MD0_DATA/custom/scripts
#list the cron tab and put in a temp file
crontab -l > $cronpath/cron_jobs.txt
#append the items we want to the master cron jobs list
cat $cronpath/joblist.txt >> $cronpath/cron_jobs.txt
# replace the existing crontab with the new one
crontab $cronpath/cron_jobs.txt

Save and quite out of Vi.

You'll notice I've used a variable in here to specify where to find the files. This is because the autorunmaster script is a folder higher so we need to be explicit about where to find things.

5. Go back up a directory to your custom folder. In there edit your autorunmaster.sh file with vi.

# vi autorunmaster.sh

At the end of the file append:

/share/MD0_data/custom/scripts/cron_update.sh

Then save and close the file.

Now when you reboot you should have your newly added cron jobs appended to the crontab without removing all the old ones.

Wednesday, 21 November 2007

Why was data being passed on a disc and what was EDS advice?

Readers in the UK will be aware of a Data Protection train crash that we have been watching unfold in front of us over the last few days. It turns out that 25 million records of a database managed by HMRC have been lost in the post because they were sent on a couple of disks using unrecorded mail.

There has been much speculation about which minister to blame and who in the cabinet (including the Prime Minister) should lose their job but one thing that is mostly missing is the notion of data security.

In the UK we have the Data Protection Act - policies enshrined in law to which I am constantly referring when talking to my clients. A typical day for me usually includes quoting something from the DPA at least once. Not least because a client wants to harvest user data and use it for something else that is outside the bounds of what is technically legal.

I've done a lot of work for government and I have to say in my experience they have terrible technical practices. Gone are the days of locked down machines with no floppy drives and only CD-Rs. In are mass market units from Dell with the latest in CD/DVD-RW (because they are cheap and mass produced) along with USB connectors that people can hot plug a pen drive into and download whatever they like. The current government has a woeful record on technology projects mostly because they don't understand it and they contract suppliers who talk a good presentation rather than deliver an effective solution.

According to the DPA

"Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data and against accidental loss or destruction of, or damage to, personal data."

This is why our PM said procedures weren't followed and he is bang on the money there. This relaxed attitude to data, particularly sensitive data, has been demonstrated in this debacle. If the data was going to be put on disc why wasn't it fully encrypted?

Indeed, why wasn't there a secure online facility for user data to be interrogated without recourse to physical copies to begin with?

In addition the data was supposed to have been "desensitised" before sending - a quaint term meaning removal of things like bank details, exact personal date and full address information. To do this EDS wanted to charge money for it. The department didn't want to pay so they took the lot.

EDS are complicit in this as much as the people from HMRC are. How hard is it to type into the database "Select name, age, postcode from person where...." instead of "Select * from person where..." Or else just remove the columns that were sensitive on output. It would have taken me a few minutes so it can't have taken an experienced EDS engineer that long.

EDS shouldn't have been charging for that sort of difference - but it sounds more complex so it was an opportunity to get some more cash in - probably.

Further EDS should have been saying "We advise you that the data you are requesting is excessive for the purposes of what you are going to use it for so we'll give you a more secure subset". That would have rammed home the implications of what the staff at HMRC were asking for.

In my history of working with government I have come across this sort of situation many times before. It is well known that government contractors over charge, shaking the fruit out of the infinitely laden money tree whenever they can. Our E-Minister is supposed to deal with this sort of thing but in practice he's a politician who knows as much about IT as my mum. The only way to resolve this problem is for wholesale changes to occur within government (locking down machines) and to make stiffer penalties the punishment for breaches of the DPA.

We now have a situation where 25 million adults in the UK are worried that their personal details are going to be used in some sort of mass identity fraud.

My view is pragmatic in that the CDs are propably laying in the corner of a sorting office at TNT somewhere - but they could well be in some gangster's tech lab being processed and that is the point of all this security.

Monday, 19 November 2007

Fuzzy logic could book more flights

I've talked about fuzzy logic for use by the retail sector in the past and the project I'm involved in there is maturing nicely. This week I've really realised how as software engineers we need to grasp the nettle and move a lot of service based software toward fuzzy systems for usability reasons.

Nearly everyone these days has booked a flight online and when it came time to booking a holiday to Australia this winter, the first thing I did was fire up a browser and head to expedia and travelocity.

If I was planning to fly on specific dates I would be well catered for and I could get a list of prices and book a flight in a few easy steps.

I wasn't planning on flying on a specific date though. I work for myself so can take time off whenever I want in a general sense. Really what I wanted was the cheapest flight from London to Sydney in December.

After typing a few different dates in manually I did the sensible thing and called a human travel agent who was very helpful. Unfortunately, as helpful as she was, she only had access to the same systems I did so couldn't tell me the info I needed to know. Mentioning this to friends had the usual "you can't do that" response. Can't do it?! I'm the customer I can book when I want.

All airlines operate through the SABRE booking network which is basically a masive database of flights from point to point with availability and prices per leg on it. It sits on top of a nice mature API which makes it easy to program against and so that's where the developers leave it.

But as a customer this doesn't fulfill my requirements and this is where engineers need to spend more time thinking fuzzy.

In these days of multi-processor and multi-threaded OSes it is not that difficult to build offline agents that could go and find this information out for a customer and then email it back to them. Indeed I wouldn't mind registering to use this sort of service so now the company has my personal details and they can market to me.

The agent wouldn't even need to respond with all the availability. It could just give me the cheapest 10 or 20, all from a specific operator etc or those flights routing through Hong Kong as a stop over for example. It also doesn't need to be fast. A deprioritised thread could take a day to get this sort of information and if I'm being that vague then time is hardly an issue.

If someone reads this from the travel industry please ask your techies to build this feature. If you are a venture capitalist then give me a call and we can revolutionise the online travel sector!

The web has brought us an always on, on-demand, serviced-based method of interacting with our information but the casuality of this has been flexibility. The days of fuzzification are soon to be upon us and coupled with automated agents some amazing new systems will become available that will give us back our flexibility.

Thursday, 8 November 2007

Why can't I have $100 laptop

Don't you hate it when you can't get something you'd really like?

I've been following the OLPC project more or less since its inception. When I first heard about it I was mostly interested in how they were going to pull off building a laptop for only $100 per unit.

After realising they were going to do it I was interested in how useful the machine would actually be (it has no hard drive so it can't be that great right?).

After seeing it was running Linux and was designed to be wireless from the start, run on mains or able to wind it up to power the laptop and it was designed to be durable in harsh environments I was mostly interested in how I could lay my hands on one (or two even).

My disappointment was immense when the OLPC guys decided not to offer them for sale, and then when they u-turned and started the G1G1 initiative (Give One Get One) I had a momentary blip of joy until they said it would only be available in North America.

Why they've not rolled this out to Europe is beyond my comprehesion - I don't even care if I don't have a £ key - I can always map it to a key stroke anyway. And I'd even be happy to Give 2 Get 1 if shipping was the issue.

The other thing that amazes me is that given the connectivity of these laptops Western nations aren't falling over themselves to get them for schools - even if they had to pay a higher rate along the lines of the G1G1 programme it would still be cheaper than buying Dell machines into all the schools.

Monday, 5 November 2007

Bye bye OpenMoko

Google announced today that they would be partnering up with a load of other companies including Samsung, Motorola and LG to produce a new phone "software stack". For those of us in teh technology game this basically means Google plans to release mobile phone operating system to rival that of Microsoft, Symbian and the various Linux flavours out there already.

What I find most annoying about this is that Google has for years now feasted upon the fruits of the Open Source Community, using many of their projects to enable additional features and indeed their core search facilities to work. While it may be argued that the Summer of Code gives back to that community, there is a sense that rather than sponsoring an existing project like openMoko (a Linux based, open source version of what Google has announced) they've decided to go out on their own and start from scratch.

Given Google's tremendous resources it won't be long before we see the platform hit the market.

Within the commercial market there is already Maemo (nokia's Internet Tablet platform which they actually open sourced) and QTopia, a commercial package available on the GreenPhone which is a development kit and is mostly open source too.

My guess as to why Google didn't run with any of these options is that there are already thriving communities surrounding them and trying to work with these existing communities makes it difficult for the Google techies to throw their weight around.

Hey ho. As a developer, mobile development is already a nightmare having to support various versions of Symbian, MS Windows Mobile, BlackBerry as well as smaller (but vocal) numbers os Maemo users we are now having to think about iPhone from Apple so adding "Google Phone OS" isn't that much more work.

For me, having had a mobile phone for the better part of 15 years and having had a data capable phone for nearly 10 years I've watches OSes come and go, killer apps be talked about every 6 months and watching the market mature the only two things ever to take off properly on a mobile was SMS and now e-mail.

I've got an E65 nokia and it is the best phone I've ever owned. Why? Because the web browser works seamlessly on standard web sites and the email is easy to use, even without a full keyboard. Oh, and it doesn't crash as do most of the rest.

Spending all this time and money in my opinion by Google is absolute folly, but then they have virtually limitless cash reserves and they have a staff of many thousands across the world that they have to retain doing something - they may as well be making a phone OS as anything else.

Who knows this might end speculation that we are about to have Google OS on our desktop next year as well.

Saturday, 3 November 2007

CSS Structure - what a mess

James posted a message on my blog some weeks ago and it's only now that a penny has dropped in my mind about what we need to deal with the issue of structure in CSS - the problem is we have none. As James points out you end up with a flat mess that with all the best will in the world definitions are hard to find.

I've ranted before about the annoyances of CSS - particularly to do with the lack of variables or constant definitions without recourse to server side scripting and about the nature of the W3C CSS working group not being well represented by techies - especially as CSS is nearly a language in its own right the same way Regular Expressions are.

As the web building fraternity finally weans itself off of dreamweaver and table based design and adopts a more semantic, HTML-lite way of building sites, the CSS files are getting bigger and bigger all the time.

At the moment, to get a degree of specificity one has to redeclare selectors:

div#header div.nav ul {}
div#header div#logo img {}

for example.

Many CSS zealots would say "But you can get rid of div#header altogether" and I can in this instance but what happens if my div#logo doesn't appear in div#header on a page and certainly it's not uncommon to have navigation in a header as well as a sidebar.

As can be seen, in order to get specificity we increase verbosity. Anyone that is fully converted to CSS design will tell you this, it's the casual "div stackers" who just declare a new class for every element in the document which ruins the HTML.

My solution then W3C if you're listening is this. Cascade in the style sheet, cascade in the CSS file.

In many programming languages there is a keyword to get you down to the level of an object that you are going to manipluate numerous properties of in one go for example "with" in VB. Thus I could say:

with myobject
.property1 = x;
.property2 = y;
endwith

The CSS equivalent would be:


div#header {
  div#logo img { css }
  div#nav {
    css
    ul { css }
  }
}

This gives you the specificity required, removing the redundancy and creates a cascade like structure to the document that would also make things much easier to debug what is going on.

Structural CSS along with variables would make a massive contribution to the developmental side of CSS as a language that could revolutionise the way we use CSS with the web.

Friday, 2 November 2007

FAH goes number 1 but we could do better

Folding at home (FAH) has taken the Guiness World Record for being the most powerful distributed computing network with a top speed of over 1 petaflop - (a thousand trillion calculations per second).

This is a remarkable achievement and shows the immense power that can be brought to bear by spare computing power used in a distributed network. The key here though is massive parallelism which means the various nodes in the network (your PC or PS3) are all doing different jobs at the same time and are at various points through these jobs. This is what made FAH and the old title holder Seti at Home (a search for extraterrestrial life) so scaleable.

Individual computers on the network download work units from the central repository, process them individually and then resubmit them back to the central core for post processing.

This is in contrast to say the Earth Simulator of Japan, a massive supercomputer capable of running huge simulations with ridiculous numbers of variables and calculations very quickly but where everything is interdependent. Likewise the ultimate aim of the BLUE project from IBM and the US Department of Energy is to be able to simulate all the forces and atoms of a nuclear explosion to simulate what's happening to USA's aging atomic weapons stockpile as they are no longer allowed to perform live tests.

This doesn't take anything away from their achievement, however it does go to show just how much wasted processing capacity there is lying around on the network.

The FAH project ramped up from 250 Teraflops (trillions of instructions per second) to just over a petaflop by the introduction of 670,000 PS3 owners supplying their hardware, up from the 200,000 PC users who got it to 250 Teraflops. Given that there are over 6 million PS3s in the wild this represents about 10% of the total Ps3 userbase - a quick calculation indicates that PS3 owners alone, should they all connect up to the internet, could provide about 7.5 Petaflops of processing power... this is beore we take into account PCs, XBoxes and Nintendo Wiis.

What this illustrates to me is that many of these projects are limited by their publicity and how "glamourous" they are. Taking nothing away from the geekiness of searching for ET or the importance of seeing how protein folding will affect drug development in the future, a more elegent solution would be an open framework that users subscribe to which is then used by anyone who wants to create a distributed processing application.

For the end user it is seamless and the for the multitude of public projects requiring raw processing cycles it gives them to opportunity to get access to larger numbers than their marketing budget would otherwise provide for. Even private companies could pay to rent processing time thus investing funds back into the project for ongoing development or optimisation.

Technology Treason