- Time to Tame the Apache Menagerie (Dec 8, 2009)
- Day sets up shop in Boston (where tech firms go to be ... (Nov 23, 2009)
- RFI as rich asset (Nov 17, 2009)
- IBM, Lucene, and the future of search (Nov 11, 2009)
- Solr heads for an even sunnier future (Oct 28, 2009)
- Usability still improving -- improvement still needed (Oct 19, 2009)
- Terracotta offers bolt-on distributed caching (Oct 8, 2009)
- Where did all the HTML editors go? (Sep 19, 2009)
- New Course on Web Development Platforms (Sep 15, 2009)
- Thoughts on the Future of Content Management (Aug 31, 2009)
- Recommind productizes its categorization engine (Aug 18, 2009)
- Thinking beyond the RFP (Aug 3, 2009)
- Day reports sunny results for 1H2009 (Jul 30, 2009)
- Are we reaching the limits of UI buildout? (Jul 28, 2009)
- Interest in Lucene continues to accelerate (Jul 14, 2009)
- In defense of silos (Jul 9, 2009)
- Clickability shows how not to write a white paper (Jul 1, 2009)
- The Coming Acronym Crisis (Jun 25, 2009)
- Vignette bets big on beta-SaaS (Jun 18, 2009)
- In DAM, Flashy does not always mean Flex (Jun 11, 2009)
- At Henry Stewart DAM Symposium: A Grey New World (Jun 2, 2009)
- Open Text buys Vignette: Investment or impulse? (May 6, 2009)
- Adobe: an elephant in the DAM room? (May 4, 2009)
- Open Text goes to the (eye) candy store (Apr 10, 2009)
- We live in interesting DAM times (Apr 8, 2009)
- Are you investing in technology, or people? (Apr 6, 2009)
- It's time for seat-based software licensing to end (Mar 25, 2009)
- OASIS blesses UIMA - What does it mean? (Mar 20, 2009)
- DAM vendor Ancept under new ownership (Mar 6, 2009)
- A reality checklist for vendors (Feb 26, 2009)
- Day releases not-so-sunny financial results (Feb 25, 2009)
- Software vendors need to understand how the web really works (Feb 17, 2009)
- Thoughts on Google Monoculture and the Cloud (Jan 31, 2009)
- IBM, Microsoft, and the patent mess - how to protect yourself (Jan 31, 2009)
- Day tries pay-as-you-go licensing (Jan 28, 2009)
- Startup offers commercial support for Lucene (Jan 26, 2009)
- What next for Interwoven? (Jan 23, 2009)
- Alfresco unveils a major upgrade (Jan 21, 2009)
- Vignette Village 2009 cancelled (Jan 14, 2009)
- Green IT versus blue sky (Jan 12, 2009)
The year in blogs: Yours Truly @ CMS Watch
Manic High
I can tell I am going through a manic episode because I am talking and typing faster than usual. My mind is racing and I can't get the words out fast enough. I have excess energy and I feel like running a marathon! I would much rather have a manic episode than a low or depressed episode. When I am depressed I don't want to do anything and that includes clean my house.
So I am going to go clean house and write some stuff before I lose all of this extra energy. :)
When depression really starts?
Today is a Better Day
It was nice outside too so the dogs were able to stay out most of the day. Which was a relief to me. I also wrote two articles for ehow and got one accepted for Associated Content. Demand Studios wants a re-write, of course so I have to work on that tomorrow. All in all it was a great day!
Tomorrow, I get paid from some of the programs I am in so I am going to do a bit of Christmas shopping online. I am going to buy my mom, dad's and husband's gifts online. Hopefully, it will be nice out so the dogs can stay out tomorrow too.
I hope this good feeling lasts a bit longer. I think my energy level came from the diet pills I am taking. I started taking Slimquick to help me lose a little weight and I think they are working finally. I hope so because they weren't cheap! :)
Remembering the VIC-20
Does anyone else remember the VIC-20? Am I dreaming? Did this $299 consumer appliance (the first "personal computer" to ship a million units) really transform people's lives? Or just mine?
I wonder if William Shatner remembers being in this ad?
Commodore's computer-in-a-keyboard, you may recall, came with a grand total of 5 KB of RAM (enough to run the operating system and leave 3583 bytes left over for you, the discerning consumer, to play with). Fear not, though: RAM was expandable up to 40 KB with an add-on memory cartridge.
Does anyone else recall logging onto CompuServe with a 300-baud modem, using the VIC-20 wired to a TV as a monitor? Or am I showing my age?
On second thought, don't answer that.
One of those days
I knew what I was getting myself into when I took all these dogs in but geez, they can sure drive me nuts sometimes. Imagine living in a small trailer with a husband who likes to sit there and pick at you for fun and seven dogs to take care of. Five dogs sleep in the house at night until we get the fence up and I am still waiting on my husband to get off his butt and put it up. It is a nightmare right now.
On top of all of that I got my denial letter from Social Security the other day. I was planning on getting another bigger trailer brought in here to add on to this one so we can have more room. Now that I am not getting my money I thought I was going to get, no bigger trailer.
Christmas is right around the corner and I have been working my butt off online to buy Christmas gifts. I will only be able to buy everyone one thing this year. Itis really no different from last year and the year before.
I just need a break big time. I need someone to lend me some money so I can get a bigger trailer brought in here. Not only is it too small, but we are heating with one small heater in the living room, one small heater in the bedroom for the puppies, and the oven. I really hate Winter. I wish it was Spring or Summer all year round. That is why I would love to move to Florida someday. I wouldn't miss the snow or cold weather one bit since I have been living here in Ohio in the cold all my life.
I pray to God every day and night to give me one small break. This doesn't help my depression one bit. Sometimes it feels like everything happens at once and everything is falling down around me all the time.
Back to fighting with Social Security again. I even had a lawyer this time and still didn't win. I am so tired of fighting them but I will not give up. In the meantime, I will be writing articles until my eyes bug out.
SSI woes!
I suffer from social anxiety disorder and Bipolar disorder. My husband got it for being Bipolar and it only took him seven months. I have been fighting them for 4 years now. My brother-n-law got it for having a seizure disorder and it only took him two months. Neither one of them had to go to a hearing and they didn't have to get a lawyer. The lawyer I had was supposed to be one of the best in this area. When I went to the hearing, I had to do almost all of the talking. She was a half hour late and she mostly just sat there. When I appeal this time, I am getting a new lawyer.
I also suffer from depression and this time of year it is worse. The days are shorter and it is cold and snowy and rainy most of the time. We don't have a car and even if we did we couldn't go anywhere when the roads are bad. I really hate Winter and as much as I like Christmas, it looks like we won't have much of one because I don't make much money online. I make enough to get by. I will just have to work a little harder, as usual.
Anyway, I am a little depressed today because of getting denied and decided I would write it down and maybe feel a little better. I hope I get it next time.
NoSQL Required Reading
If you're new to NoSQL, you'll want to do a bit of background reading. I'll keep this quick and limit my recommendations to just the essentials:
1. The Amazon Dynamo paper is classic. Almost everyone in the NoSQL world has read this paper.
2. Google's Bigtable paper. Again, very widely read.
3. Werner Vogels's "Eventually Consistent" (originally published in ACM Queue) is absolutely the one article you should read if you're not clear on the rationale behind "eventual consistency."
4. Brewer's CAP Theorem (a foundational bit of scalability theory) is well-explained here. Also see Brewer's original slides from his famous July 2000 PODC keynote.
5. The slideshows from the June 11, 2009 NoSQL meetup in SFO bring to mind adjectives like classic, influential, seminal, pivotal, memorable. Ignore these decks at your peril.
6. SQL Databases Don't Scale is short, basic, and to-the-point. Essential background info if you're not already a battle-scarred DBA with scalability wounds.
7. For a tabular overview of major distributed databases and how they compare with each other, see NoSQL Ecosystem by Jonathan Ellis. A similar effort is the Quick Reference to Alternative data storages page. Ellis's post is noteworthy for its clueful, concise, helpful narrative (in addition to the tables). The Quick Reference page is mainly tables -- but the tables are more complete than Ellis's.
Other Essential Resources
http://nosql-databases.org/ -- This site bills itself as "Your Ultimate Guide to the Non-Relational Universe!", and also self-assuredly calls itself "the biggest nosql link archiv in the web." It's worth knowing about, certainly.
IMHO, all fully conformant NoSQL geeks MUST follow @nosqlupdate on Twitter.
Conformant geeks SHOULD follow @al3xandru (creator of the excellent MyNoSQL blog and NoSQL Week in Review). NoSQL Week in Review is new. I'm hoping it will be updated regularly. It's excellent.
You MAY want to read recent blog posts by Ricky Ho that aptly summarize key aspects of distributed data-store technology. Two noteworthy examples: Query Processing for NoSQL Databases, and his widely read NoSQL Design Patterns post.
That SHOULD be enough to get you started. ;)
Hadoop and Solr popularity continue to scale well
Red lines: Solr
A quick check of Google Trends shows that Apache Solr (the search server based on Lucene) and Hadoop (the open-source implementation of MapReduce) are popular query terms -- and becoming more popular by the day. (For links to the news stories labelled with flags 'A', 'B', 'C', etc., go to this Google Trends page.)
Likewise: Job trend data from Indeed.com leaves little doubt that Hadoop and Solr skills are increasingly in demand:
Bottom line? If you're a developer, enrichening your Hadoop and/or Lucene+Solr skills can only be considered a good investment.
Will HTML5 be SQL-free?
There are several interesting aspects to the story. One is that the name Microsoft doesn't come up at all. Instead, Apple figures rather prominently in the Times story. In fact, the Times's depiction of Google, Apple, and W3C deciding the fate of the post-2.0 Web evokes images of the Big Three debating Europe's postwar reorganization at the Yalta Conference. One gets the (fanciful) impression that Microsoft's future is, to some extent, being decided without anyone from Redmond being present. Of course, that's not quite true. ;)
Another interesting aspect of the Times story is that it talks about HTML5 wrapping the various technologies that will (ostensibly, soon) make Gears superfluous, when technically speaking, many of the functionalities being attributed to HTML5 in the Times story are, in fact, not part of the HTML5 specification at all. They are part of various other WebApps Working Group specs.
Be that as it may, the decision facing the browser-makers at this point is what kind of offline storage to use for browser-mediated web apps. Specifically, will the underlying store support SQL, or not?
This is (trust me) a Huge Hairy Issue -- HHI(tm) -- and don't let the Times or anybody else tell you otherwise: It's far from being settled yet.
HTML5 talks about SQL quite openly. And it appears Opera, Safari, and (soon) Chrome are implementing WebDB, which is a SQL database in the spirit of the (emerging) Web SQL Database spec. But that's not to say WebDB is a traditional SQL database. It implements SQLite, which is another beast entirely.
Know well, though, not everyone wants SQLite -- or SQL, for that matter. In fact, Microsoft's Adrian Bateman has stated that Redmond probably will not go that route. In a WebApp WG teleconference, Bateman said:
Microsoft's position is that WebSimpleDB is what we'd like to seeWebSimpleDB, also known as the Nikunj proposal (in deference to the author, Nikunj R. Mehta, of Oracle Corporation), proposes a key-value store of the NoSQL variety. And interestingly enough, this approach is getting serious consideration not only from Microsoft but from Mozilla as well. (In the aforementioned teleconference, Mozilla's Jonas Sicking said: "We’ve talked to a lot of developers, the feedback we got is that we really don’t want SQL...")
... we don't think we'll reasonably be able to ship an interoperable version of WebDB
... trying to arrive at an interoperable version of SQL will be too hard
It's too early to know how it will all play out. About the only thing that's certain at this point is that Google has (thankfully) decided it's more important to back-burner proprietary approaches to web-app infrastructure than to stay on board with mainstream industry standards, even if those standards are (in some cases) still quite fluid and ill-formed. One hopes Microsoft will learn this lesson too. Otherwise? Yalta will decide.
Where Google's power goes
Ever wonder where all the electrical power ends up being used in a Google data center? This is the approximate breakout according to a recent book by Google engineers Luiz André Barroso and Urs Hölzle.
Unexpected relationship between hard-drive life and temperature
Today, I was reading Failure Trends in a Large Disk Drive Population [PDF], a February 2007 paper by Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso of Google, containing lots of great data on hard-drive failures and the difficulty of predicting same. The above graph depicts one of the more interesting findings, which is that the effect of operating temperature on disk reliability appears to vary with disk age, such that younger drives tend to be more susceptible to low-temperature failures, whereas older drives tend to be more susceptible to failure at elevated temperatures. Results are grouped by age of disk at failure, then broken out into subgroups (histogram bars) based on their operating temperature. So for example, among disks that failed at 3 years of age, the Annualized Failure Rate (AFR) was about 15% for those disks that had had operating temps of 45 deg. C or more, versus a fail rate of 5% for those that had seen temps of less than 30 deg. C.
Many people have assumed that high temps are bad for disks. And indeed maybe they are bad for 3-year-old disks, but disks that fail at younger ages tend to be much more traumatized by cold than by heat. Pinheiro et al. give additional data for this, and it's pretty convincing. For example, if you have a look at Fig. 4 of the paper, you'll see a bathtub curve, showing that extremes of temperature are deleterious to disk life expectancy. Ironically, the bathtub curve reaches its lowest point at around 38 deg. C -- very close to human body temperature.
Image editing in Firefox via Jetpack and Pixlr
Jetpack Menu API Tutorial from Aza Raskin on Vimeo.
I have to admit this is pretty cool from a couple of standpoints. First, an easy Menu API is something Jetpack has needed from the beginning, and this API strikes me as 100% spot-on: logical, intuitive, powerful. Secondly, Pixlr itself is just kick-ass. (That's my nomination for understatement of the year.) And the measly 14 lines of integration code needed to get Pixlr operating on a web image from a right-mouse menu command is (dare I say) one of the nicer parlor tricks I've seen this year. In The Year of the Parlor Trick, that's saying something.
Hadoop for Bioinformatics
Protein Alignment - Paul Brown from Cloudera on Vimeo.
About 15 minutes into this video, there's an interesting 3D visualization of a running Hadoop job, showing processor nodes as cubes in a spinning pyramid: green nodes are working normally; a node turns black and falls down to the bottom, signalling a failed job on that processor. I thought it was an interesting visualization. But I also found the presentation interesting overall, since I studied molecular biology in grad school and have an interest in bioinformatics. Beyond that, I have an interest, lately, in all things related to scalability. (Let that be a hint of things to come in future blog posts!)
SalsaDev
Stop Searching: Find! from salsadev on Vimeo.
The killer UI experience here is:
- Highlight an arbitrary piece of content on a Web page. (Select some text in your browser window.)
- Let go of the mouse.
- A panel appears automagically, containing contextually appropriate webfinds.
This ought to give anyone in the Search business an awful lot to think about.
Web as Persuasion Platform
I was surfing Vegard Sandvold's excellent blog, reading his 3 Quick Design Patterns for Better Faceted Search (well worth a look if you're in the business of designing or implementing web apps of any kind), when I came across the above slideshow. I found it thought-provoking. We're all, in one sense or another, in the persuasion business. The best ideas are always actionable. Why not make it easy for people to act?
Introducing the AJAX Solr library
Talked to the Lucid Imagineers a short while ago. Lots of neat stuff going on with respect to Solr 1.4. It seems Matthias Epheser's SolrJS (a JavaScript library for creating user interfaces to Apache Solr) has been forked and reincarnated as AJAX Solr. The SolrJS library (originally a Google Summer of Code project) had dependencies on jQuery. To their credit, the Solr team decided that being chained to someone's choice of a single rather large AJAX framework might not be such a good thing in every user's eyes. AJAX Solr, by contrast, is JavaScript framework-agnostic, thus can be used in conjunction with jQuery, MooTools, Prototype, Dojo, or any other framework that implements AJAX helper objects. The programmer who uses AJAX Solr only has to define a Manager object that extends the AbstractManager object, implementing an executeRequest() method. A jQuery-compatible Manager looks like the following (code available at managers/Manager.jquery.js):
AjaxSolr.Manager = AjaxSolr.AbstractManager.extend({canAddWidget: function (widget) {return widget.target === undefined ||
jQuery(widget.target) && jQuery(widget.target).length;},/*** @see http://wiki.apache.org/solr/SolJSON#JSON_specific_parameters*/executeRequest: function (queryObj) {var queryString = this.buildQueryString(queryObj);// For debugging purposesthis.queryStringCache = queryString;var me = this;if (this.passthruUrl) {jQuery.post(this.passthruUrl + '?callback=?',
{ query: queryString },
this.jsonCallback(), 'json');}else {jQuery.getJSON(this.solrUrl +
'/select?' + queryString +
'&wt=json&json.nl=map&json.wrf=?&jsoncallback=?',
{}, this.jsonCallback());}}});
The role-based favicon, and why Novell patented it
How many of these favicons can you identify? (From left to right: Gmail, Google Calendar, FilesAnywhere, Twitter, Y-Combinator, Reddit, Yahoo!, Picasa, Blogger)
Last week (excuse me a second while I tighten the straps on my tomato-proof jumpsuit) I was granted a patent (U.S.Patent No.7,594,193, "Visual indication of user role in an address bar") on something that I whimsically call the rolicon.
In plain English, a rolicon is a context-sensitive favicon (favorites icon) indicating your current security role in a web app, in the context of the URL you're currently visiting. It is meant to display in the address bar of the browser. Its appearance would be application-specific and would vary, as I say, according to your security-role status. In other words, if you logged into the site in question using an admin password, a certain type of icon would appear, whereas if you logged in with an OpenID URI, a different icon would appear in the address bar; and if you logged in anonymously, yet a different icon would be used, etc.
Now the inside story on how and why I decided to apply for this patent.
First understand that the intellectual property rights aren't mine. If you look at the patent you'll see that the Assignee is listed as Novell, Inc. That's because I did the work as a Novell employee.
Okay, but why do this patent? The answer is simpler than you think (and will brand me as a whore in some people's eyes). I did it for the money. Novell has a liberal bonus program for employees who contribute patent ideas. We're not talking a few hundreds bucks. We're talking contribute ten patents, put a child through one year of college.
I have two kids, by the way. One is in college, using my patent bonuses to buy pepperoni pizzas as we speak.
Now to the question of Why this particular patent.
Novell has two primary businesses: Operating systems, and identity management. On the OS side, Novell owns SUSE Linux, one of the top three Linux distributions in the world in terms of adoption at the enterprise level. This puts Novell in competition with Microsoft. That competition is taken very seriously at Novell (and at Microsoft, by the way). Perhaps it should be called coopetition at this point. You may recall that in 2006, Novell and Microsoft entered into an agreement (a highly lucrative one for Novell: $240 million) involving improvement of the interoperability of SUSE Linux with Microsoft Windows, cross-promotion of both products, and mutual indemnification of each company and their customers on the use of patented intellectual property.
Novell continues to take an aggressive stance on IP, however, and would just as soon keep ownership of desktop, browser, and OS innovations out of the hands of Redmond.
As it happens, I was on Novell's Inventions Committee, and I can tell you that a lot of attention was given, when I was there, to innovations involving desktop UIs as well as UI ideas that might pertain to security, access control, roles, trust, or other identity-management sorts of things.
One day, I was researching recent Microsoft patent applications and I noticed that Microsoft had applied for a patent on the little padlock icon that appears in IE's address bar when you visit a site using SSL. You've seen it:
I was outraged. How dare they patent such a simple thing?
I did more research and realized that favicons and browser adornments of various kinds figured into a number of patents. It wasn't just Microsoft.
Coming up with the idea of a role-based favicon (and a flyout icon menu so you can select a different role if you don't want to use your current one) was pretty easy, and I was surprised no one had yet patented it. (Most good ideas -- have you noticed? -- are already patented.) It seemed obvious to me that Microsoft would eventually patent the rolicon idea if we (Novell) didn't. So I applied for the patent. The paperwork went to the U.S. Patent and Trademark Office on February 6, 2007. The patent was granted September 22, 2009.
Would I ever have patented something like this on my own, had I not worked for Novell? No. Do I think it's a good patent for Novell to have? Yes. Am I sorry I got paid a nice bonus for coming up with what many people, I'm sure, would call a fairly lame piece of technology? Crap no.
Do I think patents of this kind (or any kind) are good or right, in general? Hey. Today may be Sunday, but I'm no theologian. I don't take sides in the patent jihad. The patent system is what it is. Let it be.
bookmark and share this
A fix for the dreaded iTunes -9812 error
A few weeks ago, sometime in early September, I tried to go to the iTunes store and found myself locked out of my iTunes account. Anything I tried to do that involved a transaction of any kind resulted in an alert dialog that said: "iTunes could not connect to the iTunes Store. An unknown error occurred (-9812)." That's it. No diagnostic information, no tips, no links, no help whatsoever. A less useful dialog box, I cannot begin to imagine.
Apple's site was of no use whatsoever. The troubleshooting advice I found there was incredibly lame. And of course, the -9812 error code means exactly what the dialog says it means: Unknown Error.
I figured there must be someone, on one of the forums, who would have found the answer to this problem. I did a lot of searching (and wading through a lot of lame "did you try this? did you try that?" non-answers), to no avail.
Finally, on September 18, Mike P. Ryan posted what (for me) turned out to be the solution, on discussions.apple.com.
The problem? A corrupt or missing trusted root certificate for iTunes. How or why this got messed up on my Vista machine, I don't know, but the same thing has clearly happened to boatloads of people, judging from the uproar on the discussion boards. The cure is to download Microsoft's latest trusted root certificate update from here: http://www.microsoft.com/downloads/details.aspx?FamilyID=f814ec0e-ee7e-435e-99f8-20b44d4531b0&displaylang=en. Follow the wizard instructions carefully, because you need to download two executables, not just one.
Note that the fix works for Win XP as well as Vista.
Mike, if you're reading this, thanks!
Jet-powered Beetle
Once more, it's Saturday morning and I find myself catching up on really important reading, stuff I've been caching all week in hopes of getting back to Real Soon Now. At the top of the list? This excellent post by Ron Patrick describing his jet-powered VW Beetle, which was featured on the David Letterman Show (above) on September 9, 2007. Patrick's web page has lots of photos and goes into detail about the design, motivations, and installation details behind the use of a General Electric T58 turboshaft engine (meant for Navy helicopters) in a street-legal Beetle. The engine in question develops 1350 horsepower in its original helicopter application, but note that that's in a turboshaft configuration. Patrick is using it as a free turbine (jet thrust only, no mechanical drive). If he were to couple it mechanically to the drive train of the Beetle, the torque would probably pretzel the car's chassis faster than you could say Holy Halon, Batman, where's the fire extinguisher?
Is this a great country, or what?
The newline legacy
But that's a hardware problem. ;^)
As a programmer, I think the legacy annoyance I most love to hate is the newline.
The fact that the computing world never settled on an industry-standard definition of what a newline is strikes me as a bit disconcerting, given how ubiquitous newlines are. But it's way too late to change things. There's too much legacy code out there, on OSes that aren't going to change how they treat newlines. The only OS that ever changed its treatment of newlines, as far as I know, is MacOS, which up to System 9 considered a newline to be ASCII 13 (0x0D), also known as a carriage return (CR). It's now the linefeed (ASCII 10, 0x0A), of course, as it is in most UNIX-based systems.
It always bothered me that DOS and Windows adhered to the double-character newline idiom: 0x0D0A (CR+LF). To me it always seemed that one character or token (not a doublet) should be all that's needed to signify end-of-line, and since UNIX and Linux use LF, it makes sense (to me) to just go with that. But no. Gates and company went with CR+LF.
Turns out it's not Gates's fault, of course. The use of CR+LF as a newline stems from the early use of Teletype machines as terminals. With TTY devices, achieving a "new line" on a printout required two different operations: one signal to move the print head back to the start position, and another signal to cause the tractor-feed wheel to step to the next position in its rotation, bringing the paper up a line. Thus CR, then LF.
The fact that we're still emulating that set of signals in modern software is kind of funny. But that's how legacy stuff tends to be. Funny in a sad sort of way.
In any event, here's how the different operating systems expect to see newlines represented:
CR+LF (0x0D0A):
DOS, OS/2, Microsoft Windows, CP/M, MP/M, most early non-Unix, non-IBM OSes
LF (0x0A):
Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, others
CR (0x0D):
Commodore machines, Apple II family, Mac OS up to version 9 and OS-9
NEL (0x15):
EBCDIC systems—mainly IBM mainframe systems, including z/OS (OS/390) and i5/OS (OS/400)
The closest thing there is to a codification of newline standards is the Unicode interpretation of newlines. Of course, it's a very liberal interpretation, to enable reversible transcoding of legacy files across OSes. The Unicode standard defines the following characters that conforming applications should recognize as line terminators:
LF: Line Feed, U+000A
CR: Carriage Return, U+000D
CR+LF: CR followed by LF, U+000D followed by U+000A
NEL: Next Line, U+0085
FF: Form Feed, U+000C
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
There's also an interesting discussion of newlines in the ECMA 262 [PDF] specification. See especially the discussion on page 22 of the difference in how Java and JavaScript treat Unicode escape sequences in comments. (For true geeks only.)
Many happy returns!
The single biggest usability quagmire in computing
As you probably know, the QWERTY layout, conceived by James Densmore and patented by Christopher Sholes in 1878, was specifically designed to make it difficult for people to type fast on early typewriters. In other words, it was purposely designed and implemented as a usability antipattern. Fast typing caused jamming of mechanical typewriter keys, which were (in Densmore's time) returned to their original "rest" position by weights, not springs. We continue to live with the QWERTY legacy-layout today even though it is well accepted that other keyboard layouts (for English, at any rate) are much more usable.
The best-known alternative layout for Latin-based alphabets is the Dvorak keyboard, which dates to the 1930s. The U.S. Navy did a study in World War Two that found that typing speed was 74 percent faster for Dvorak than for QWERTY, and accuracy better by 68 percent. Other studies (both by private industry and government) have tended to confirm this general result, although there's a considerable cult movement (given impetus by a 1990 article in the Journal of Law and Economics) claiming Dvorak usability to be nothing more than urban legend. (See further discussion here.)
The studies of Dvorak typing accuracy have produced some interesting results. It's instructive to compare the most-mistyped English words for QWERTY users versus Dvorak users.
The mere fact that your fingers travel dramatically less interkey distance when using Dvorak layout means less wrist, finger, and arm movement; thus Dvorak presents the potential for reduced risk of muscle fatigue and injury. This alone would seem to argue for more widespread adoption.
Interestingly, variants of Dvorak are available for Swedish (Sworak), Greek, and other languages. Also, there's a single-handed-typing version of Dvorak, to help with accessibility.
So, but. Let's assume for sake of argument that Dvorak is demonstrably better in some way (speed, accuracy, accessibility, risk of wrist injury) than QWERTY. Why are we still using QWERTY?
It seems an influential 1956 General Services Administration study by Earle Strong, involving ten experienced government typists, concluded that Dvorak retraining of QWERTY typists was cost-ineffective. This study apparently was instrumental in sinking Dvorak's prospects, not so much because people put stock in its results as because of the government's role as a market-mover. The practical fact of the matter is that the U.S. Government is one of the largest keyboard purchasers in the world, and if a large customer convinces manufacturers to settle on a particular device design, it becomes a de facto standard for the rest of the industry, whether that design is good or not. (Today that sort of reasoning is less compelling than in the 1960s, but it's still a factor in market dynamics.)
It turns out to be fairly easy to configure a Vista or Windows XP machine such that you can toggle between QWERTY and Dvorak with Alt-Shift, the way some people do with English and Russian layouts. Basically, to enable Dvorak, you just go to Control Panel, open the Regional and Language Options app, choose the Keyboards and Languages tab, then click the Change Keyboards button, and in the Text Services dialog, click the Add button. When you finally get to the Add Input Language dialog (see below), you can go to your language and locale, flip open the picker, and see if Dvorak is one of the listed options. In U.S. English, it is. (Click the screen shot to enlarge it.)
If you have tried the Dvorak layout yourself, I'd be interested in hearing about your experiences, so please leave a comment.
In the meantime, I hope to give the Dvorak layout a try myself this weekend, to see how it feels. In all honesty, I doubt I'll stay with it long enough to get back up to my QWERTY typing speed. But then again, if it improves my accuracy, I'll have to consider staying with it a while, becuase frankly my accuracy these dsya sucks.
Garbage collection 2.0 vs. Web 3.0
I continue to think about garbage collection a lot, not only as a career move but in the context of browser performance, enterprise-app scaleup, realtime computing, and virtual-machine design. Certainly we're all affected by it in terms of browser behavior. Memory leakage has been an ongoing concern in Firefox, for example, and the Mozilla team has done a lot of great work to stem the leakage. Much of that work centers, of course, on improving garbage collection.
One thing that makes browser memory-leak troubleshooting such a thorny issue is that different browser subsystem modules have their own particular issues. So for example, the JavaScript engine will have its own issues, the windowing system will have its issues, and so on. What makes the situation even trickier is that third-party extensions interact in various ways with the browser and each other. And then there are the monster plug-ins for Acrobat Reader, Flash, Shockwave, Java, Quicktime, and so on, many of which simply leak memory and blow up on their own, without added help from Firefox. ;)
A lot's been written about GC in Java. And Java 6 is supposed to be much less leakage-prone that Java 5. But Flash is a bit of a mystery.
The memory manager was apparently rewritten for Flash Player 8, and enhanced again for 9. (I don't know what they did for 10.) At a high level, the Flash Player's GC is very Java-like: a nondeterministic mark-and-sweep system. What the exact algorithms are, though, I don't know. How they differ for different kinds of Flex, Flash, AIR, and/or Shockwave runtime environments, on different operating systems, I don't know.
I do know a couple of quirky things. One is that in an AIR application, the
System.gc()
method is only enabled in content running in the AIR Debug Launcher (ADL) or in content in the application security sandbox.Also, as with Java, a lot of people wrongly believe that calling System.gc() is an infallible way to force garbage collection to happen.
In AIR, System.gc() only does a mark or a sweep on any given object, but not both in the same call. You might think that this means that if you simply call System.gc() twice in a row, it'll force a collection by causing both a mark and a sweep. Right? Not so fast. There are two different kinds of pointers in the VM at runtime: those in the bytecode and those in the bowels of the VM. Which kind did you create? You'll only sweep the bytecode ones.
How the details of memory management differ in AIR, Flash, and Flex is a bit of a mystery (to me). But they do differ. The Flex framework apparently makes different assumptions about garbage lifecycles vis-à-vis a pure Flash app. The use-cases for Flex versus Flash are, of course, quite different and have no doubt influenced the GC approach. Flash comes from a tradition of short-lived sprite-based apps that the user looks at briefly, then dismisses. Obviously you can use a very tactical approach to GC in that specific case. But if you've got an app (Flex based) that is long-running and not constantly slamming animation frames to the video buffer, you need a more strategic approach to GC. (When I say "you," I'm talking about the folks who are tasked with designing the Adobe VM's memory management logic, not the application developer.) A Flex-based DAM or CMS client made for enterprise customers won't necessarily benefit from a memory management system designed for sprite animations in a sidebar ad.
By now every developer who cares about memleaks in AS3 knows not to use anonymous functions inside event handlers. Callbacks should have a name (so they can be GC'd) and weak references should be used. However, weak references won't totally save the day here. In AS3, asynchronous objects register themselves with the Flash player when they run. If one of those objects (Timer, Loader, File, DB transaction) continues to be referenced by the player, it's essentially unreachable to you.
There's also the issue of Object Memory versus Rendering Memory. The bulk of all memory used by the Flash player goes toward rendering. And that's the part you have the least control over. A stage in Flash can grow to 100Mb fairly easily, but if you try to destroy it you might only reclaim 40Mb. I have no idea how much of this can be attributed to AS3-C++ object entanglement versus deep-VM mayhem (or some other gnarly issue).
Overall, I think GC is (regardless of technology) something that benefits from openness and community involvement. In other words, this is an area where "proprietary" serves no one. The code needs to be open-source and the community needs to be involved in figuring out solutions to deep memory management issues. Apps can't simply be allowed to detonate unpredictably, for no apparent reason (or no easily-troubleshot reason), in a Web 3.0 world, or in enterprise.
Bottom line? Solving memory-management problems (at the framework and VM level) is critical to the future success of something like AIR or Flex. It's much too important to be left to an Adobe.
Twitter's new terms of service: Give us all rights to your words
According to the Sept 10 post on the Twitter Blog (which tries to explain Twitter's new Terms of Service in plain English):
Twitter is allowed to "use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute" your tweets because that's what we do. However, they are your tweets and they belong to you.Pray tell me, in what possible sense does something belong to me if I've given every worthwhile right of usage over to somebody else?
My First Content Management Application
Now it seems a new meme is making the rounds: Pie started a snowball rolling with his blog post My First Content Management Application, which begat similar posts by Jon Marks, Johnny Gee, Lee Dallas, and CherylMcKinnon all telling how they got started in the content-management biz. I can't help but chime in at this point, meme-ho that I am.
Back in the paleolithic predawn of the Internet, before there was a Web, there was FidoNet. Instead of Web sites, there were electronic bulletin board systems, and in 1988, I was the sysop of a BBS powered by the freeware Opus system. Opus essentially put me in the content management business, although no one called it that at the time, of course.
Opus was extremely popular not only because it supported the bandwidth-efficient ZModem protocol but because it was a highly configurable system, thanks to a one-off scripting language that let you exercise godlike control over every imaginable system behavior. The Opus scripting language was my first introduction to any kind of programming.
In those days, bandwidth was dear (modems ran at 300 and 1200 baud) and you took great pains to compress files before sending them over the wire. The most popular compression software at the time was SEA's ARC suite, the code for which went open-source in 1986. ARC seemed adequately fast (it would process files at around 6Kbytes per sec on a reasonably fast PC, which is to say one with a 8MHz processor) until a guy named Phil Katz came along with an ARC-compatible program that ran six to eight times faster. In a matter of a year or so, almost every BBS switched from supporting ARC to supporting PKARC.
SEA sued Phil Katz for copyright violation (Katz had violated the terms of the open-source license) and a major legal fracas ensued. BBS operators, unsure of their legal exposure, didn't know whether to stay with PKARC or go back to the much slower ARC (and risk losing visitors). Being young and foolishly optimistic, I decided to write my own compression and archiving software for the use of my BBS customers. I decided it would be a good thing, too, if it was faster than PKARC. Of course, I would have to learn C first.
Thus began an adventure that led toward the path that finds me where I am today, free-ranging in the CMS jungle as an analyst. I'll save the details of that adventure for another time. Suffice it to say, I did learn C, I did write a compression program, and it was faster (though less efficient) than Phil Katz's routines, and in fact I won a bakeoff that led to my code being licensed by Traveling Software for use in their then-popular Laplink connectivity product. Katz lost the bakeoff. (He also lost the lawsuit with SEA.) But he eventually did all right for himself. Perhaps you've heard of Pkzip?
So to answer a question no one asked (except Pie), my first "content" application was in fact a compression and archiving program that I wrote in 1988 to support users of an Opus BBS. That's what started me down the path of learning C, then Java, JavaScript, HTML, XML, and all manner of W3Cruft leading to the purple haze I walk around in today.
Is Yak Shaving Driving You Nuts?
One of the better definitions I've seen so far for yak shaving is the following:
1 March 2008, Zed Shaw, " You Used Ruby to Write WHAT?!" [5], CIO. Yak shaving is a programmer's slang term for the distance between a task's start and completion and the tangential tasks between you and the solution. If you ever wanted to mail a letter, but couldn't find a stamp, and had to drive your car to get the stamp, but also needed to refill the tank with gas, which then let you get to the post office where you could buy a stamp to mail your letter—then you've done some yak shaving.Shaw's explanation of finding a stamp to mail a letter is a little quaint (who mails letters any more?) and begs for more pertinent examples. I think most of us could easily come up with quite a few. Right away I'm thinking YSEE as a synonym for J2EE. Some others:
- Creating a Mozilla extension
- Hello World in OSGi
- Building a Flex app
- Doing a clean install of OpenCms on a virgin machine (i.e., a machine that doesn't already have JDK, Tomcat, MySQL) -- not difficult, just a lot of Yak shaving
- Getting almost any kind of enterprise software configured and running
- Installing a nontrivial application on Linux (and having to resolve dependencies)
What would you add to the above list?
Augmented Reality, Nokia Style
In case you thought the "virtual reality goggles" idea was strictly 1990s sci-fi, guess what? People are still working on it, and Nokia is leading the charge to the commercial finish line. In Nokia's case, the primary goal is not to develop goggles, although clearly they've put some thought into it. The primary goal is to bring augmented reality technology to the cell phone.
"Augmented reality" can be thought of as a highly annotated model of real-world milieux, containing rich-media annotations, text annotations, and other kinds of embedded goodness. This sort of thing is seen by Nokia and others as a major value-add for cell phone customers. But there are other commercial possibilities as well. To grok the fullness, take a look at the following slideshow.
If you're lucky enough to be in or near Palo Alto next Wednesday (September 9), SDForum will be hosting a talk,“Augmenting Both Reality and Revenue: Connecting Mobile, Sensors, Location and Layers”, by Clark Dodsworth of Osage Associates and Maribeth Back of FX Palo Alto Labs. (Registration begins at 6:30 and is $15 for non-SDForum members. Details here.) This talk will give a non-Nokia view of the subject that should be quite interesting. Unfortunately, the real me won't me able to attend. And the virtual me isn't ready.
Google as Skinner Box
But what's interesting is, I do think the Google "fast and lean" design motif (whether it was consciously designed or not) has had a profound influence on people's usability expectations. It sets the bar in a number of ways (see below) and anyone who designs interfaces should take heed, because people are now literally conditioned to expect certain things from a UI.
When I say conditioned, I mean it in the true behaviorist sense. I think the argument can (should) be made that Google's landing page represents a kind of virtual Skinner box. And yes, we are the rats.
The similarities to a Skinner box experiment are striking. The mechanism is quick and easy to operate. The feedback is immediate. You are either rewarded or not. Iterate.
I make a trip to the box about 15 times a day, and hit the lever an average of three times per visit. I am well conditioned. Are you?
I submit that the many people in enterprise who use Google intensively are very thoroughly conditioned to expect certain things from a UI, as a result of operant conditioning.
- The feedback cycle should be short. You should be able to do a task quickly and get immediate feedback on whether it succeeded. Actual success is less important than being told quickly whether you succeeded or not.
- It should be quick and easy to repeat an operation.
- Controls should be very few in number and located high on the screen (right in your face).
- Hitting Enter should be sufficient to get a pleasure reward.
- Everything is self-documenting.
- The UI is flat: No drill points.
- Everything is a link. (Except the main action controls: text field and button.)
The Google operant conditioning cycle is the new unit of interaction (not so new, now, of course). It's the behavioral pattern your users have the most familiarity with, and it's burned into their nervous systems by now. Ignore this fact at your own peril, if you're a UI designer.
Counting the number of DOM nodes in a Web page
I decided to hack together a piece of JavaScript that would count DOM elements. But I decided doing a complete DOM tree traversal (recursive or not) would be too slow and klutzy a thing to do in JavaScript. I just wanted a quick estimate of the number of nodes. I wanted something that would execute in, say, 100 milliseconds. Give or take a blink.
So here's the bitch-ugly one-liner I came up with. It relies on E4X (XML extensions for ECMAScript, ECMA-357), hence requires an ECMA-357-compliant browser, which Firefox is. You can paste the following line of code into the Firefox address bar and hit Enter (or make it into a bookmarklet).
javascript:alert(XML((new XMLSerializer()).
serializeToString(document.documentElement)
)..*.length());
Okay, that's the ugly-contest winner. Let's parse it into something prettier.
The code serializes the DOM starting at the document element (usually the HTML node) of the current page, then feeds the resulting string into the XML() constructor of E4X. We can use dot-dot-asterisk syntax to fetch a list of all descendent elements. The length() method -- and yes, in E4X it is a method, not a property -- tells us how many elements are in the list.
I know, I know, the E4X node tree is not a DOM and the two don't map one-to-one. But still, this gives a pretty good first approximation of the number of elements in a web page, and it runs lightning-fast since the real work (the "hard stuff") happens in compiled C++.
The code shown here obtains only XML elements, not attributes. To get attributes, substitute "..@*" for "..*" in the 5th line.
Again, the Document Object Model has a lot more different node types than just elements and attributes. This is not a total node-count (although I do wish someone would post JavaScript code for that).
Last night when I ran the code against the Google home page (English/U.S.), I got an element count of 145 and an attribute count of 166. When I ran the code on the Google News page, I got 5777 elements and 5004 attributes. (Please post a comment below if you find web pages with huge numbers of nodes. Give stats and URLs, please!)
That's all the time I had last night for playing around with this stuff; just time to write a half dozen lousy lines of JavaScript. Maybe someone can post some variations on this theme using XPath? Maybe you know a slick way to do this with jQuery? Leave a comment or a link. I'd love to see what you come up with.
The Future of Content Management
My main observation is that metadata management is really what we mean by content management today. Content is the payload; it's what gets consumed. Metadata determines how the payload is managed. It's what makes the content manageable.
As I point out in the CMS Watch blog post, content, today, is not what we used to think of as content ten or fifteen years ago. Content used to mean document. Then for a while it meant HTML, or the artifacts destined to make up a web page. Now it means whatever it means. Content can be anything. Which is good, because now we don't have to argue over what content is.
One thing almost everyone I talk to agrees on is that content is becoming rich and unruly. It is becoming less structured, more diverse as to composition and mimetype, and ultimately less manageable. Twenty years ago you didn't have such a thing as a PDF file with embedded Flash. Now you do. Ten years ago the Word (.doc) format contained no XML. Now it does. Composite files are everywhere. Ephemeral (consume-once) content is everywhere. Audio and video files are everywhere. That's a lot to manage.
I've been telling anyone who'll listen that if you want to manage content, or design software systems that do, you have to think of content entirely abstractly. Content can be anything. For management purposes, you shouldn't have to know in advance what the content is; your system should be capable of managing any kind of content. It should be able to let you find content, search content, version it, access-control it, workflow it, etc., without knowing or caring that the content is structured, unstructured, flat, hierarchical, text, binary, animal, vegetable, or mineral.
Since content can be anything, it has to be managed through descriptors (metadata). This is an extremely important concept. Ten years ago, you could code the detailed knowledge of how to handle HTML and other web formats directly into a CMS. Today that would be foolish. Even in a Web CMS, the core code should know nothing about HTML. Detailed knowledge about a content type exists in applications and modules living several abstraction layers above the core code. The system itself needs to be mimetype-agnostic.
A file's metadata is the shim between the core CMS and the applications that consume the content. It's the content's interface to the outside world. The metadata describing a piece of content is analogous to the WSDL describing a Web Service. It says "Here's where am I, here's what you can do with me, here's what you need to know about accessing me."
Not to put too fine a point on it, but: Everything you need to know in order to manage a piece of content is, or should be, in its metadata.
In a nutshell: The Future of Content Management is about metadata. But also, the Present of Content Management is about metadata. The future, make no mistake, is here already.6f82f1d2683dc522545efe863e5d2b73