unemployment depression

Google uses more electricity than most countries on earth

Google is notoriously secretive about its data centers (their locations, their layouts, how much electricity they use, or even how many of them there are), saying only that the company has a stated goal and of being carbon-neutral. It's believed that Google has at least three dozen dedicated data centers (although studies of IP addresses suggests there may be many more than that). Most, if not all, of the Google data centers draw power from hydroelectric or nuclear plants.

Google says that a typical search query uses an amount of energy equivalent to the release of 0.2 grams of CO2. If that's true, and if Google handles a billion queries a day, that's equivalent to a net release of 200 tons of CO2 per day. But remember, Google's data centers are not powered by coal-driven generators (they use hydro or nuclear power instead), so in essence a Google search costs nothing, in terms of carbon dioxide.

In terms of electricity, it's a different story. We're talking huge amounts of electrical power. Some estimates put Google data center power usage at 50 megawatts per data center. At the end of a year, that's 432 million kilowatt-hours of energy used, per data center. For 36 data centers, we're talking a grand total of around 15 billion kWh per year. That's roughly twice the amount of electricity consumed by all U.S. government data centers put together [ref].

Just to put it in perspective, this means Google consumes more electricity than most countries on earth. If Google were a nation, it would rank somewhere around No. 75 of 215 countries. (For electricity usage by country, see the excellent chart here, based on statistics from the CIA Factbook.)

reade more...

Java performance resources

Seeing as how I've written a post or two before on the subject of performance optimization, I should perhaps point out that Sun Microsystems now has a resource page devoted to Java performance (tuning info, troubleshooting advice, white papers, best practices articles, etc.) at http://java.sun.com/docs/performance/.

Also, the canonical advice on tuning Java for performance is still the 2005 white paper found at http://java.sun.com/performance/reference/whitepapers/tuning.html. Sun says this white paper is "a living document." But it hasn't changed since 2005, so you be the judge as to whether it can still fog a mirror.

reade more...

isNaN( Dunbar ) == true

There seems to be a lot of blogging these days about how many Twitter followers (and/or folllowees) is "too many." Inevitably someone will mention Dunbar's number.

Wikipedia, paraphrasing Gladwell, says that the Dunbar number "is a theoretical cognitive limit to the number of people with whom one can maintain stable social relationships. These are relationships in which an individual knows who each person is, and how each person relates to every other person."

"No precise value has been proposed for Dunbar's number," the person who wrote this Wikipedia entry points out, "but a commonly cited approximation is 150."

Approximation? Precise value?

The value 150 comes from a combination of observations in primatology (specifically, observations of grooming behavior in apes) and comparative anthropology (of a retrospective sort; i.e., looking at human organizational patterns of the far distant past).

Bottom line: This is a social metric based on ape habits and the tribal activities of ancient peoples. It's hard to imagine a less appropriate starting point for talking about the new modalities of social interaction made possible by technology that is still emerging. Do the grooming habits of non-human primates really apply to Twitter users? That's a pretty big disconnect for me.

I make no assumptions about how many social contacts a modern human can have. Over the past 150 years, a lot of effort has gone into the invention of new technologies that extend the social reach of individuals. The automobile. The telegraph. The telephone. The Internet.

Dunbar hypothesized that language itself may have been invented as a cheap way to maintain large numbers of social relationships (a cheap substitute for one-to-one physical contact).

Is there a limit on how many friends a person can have? What does it mean, in the online world, to maintain a friendship? Some friendships (in the offline world) are low-maintenance, while others are high-maintenance. I have friends from school and/or prior lives that (in a few cases) I've reconnected with after decades of no contact, and guess what? We're still friends.

Think of how many times you've found yourself in an airport waiting for a connecting flight, and you strike up a conversation with a total stranger, and go home with that person's contact info after "hitting it off." Does that not count as a friend relationship?

We're in new territory with the Social Web. Technology is connecting people in new ways. Debates about "how many online friends are too many" (especially when they invoke concepts from primatology) seem pedantic and parochial. "Dunbar's number" is not some fundamental constant of nature; it's not Planck's constant. It's a theoretical construct from sociology. Let's not give it more stature than it deserves.

In fact, I say let's call it what it is -- NaN (not a number) -- and move on.

reade more...

Code reuse is overrated

Good coders code, great ones reuse. How can anyone disagree with that?

I don't disagree. The principle of not reinventing something after you've invented it is so old and so obvious (and so obviously useful) that no one would seriously dispute it.

What's very much worth disputing, though, is the value of "code-reuse percentage" as a metric, and the degree to which code reusability actually brings about any economies in the software business. I would argue that the economies are largely nonexistent, because of the generally high cost of achieving reusability in the first place -- and a failure to depreciate investments in reusability over time. If you're reusing code in your product that was written in 1997 (to support a 1997 file format, say) and that code is only still in the product for legacy reasons, should that really count as reuse? Shouldn't "reuse" be weighted according to whether the code actually gets executed or not?

Should you build out on bad code (flabby code; spaghetti code; stuff that may contain unreachable or deprecated methods, etc.) and count it as reusability? Or go back and do it right? If you build out on bad code, you've achieved reusability. If you go back and clean up the code, you've killed your reuse metrics but you may well score a longterm ROI win.

There are so many problems with "reuse percentage" as a metric that I won't litigate the case fully here but instead ask you to refer to the paper by Lim, the 2005 blog by Dennis Forbes, and the 2007 blog by Carl Lewis (for starters).

Will Tracz (an early advocate of reuse, ironically) pointed out in "Software Reuse Myths Revisited" that reusable code costs around 60% more to develop than code not designed with reuse in mind. That estimate (derived in 1994) is probably off by a factor of three or four (maybe ten, with Java). But it's moot, in any case, given that the cost of producing code is, in reality, a comparatively small part of the overall cost of producing and marketing commercial software. And that's what I'm really saying here, is that the potential for cost savings is not a proper motivation for reuse. There is no significant cost savings. It costs more to develop reusable code, and the payoffs are mitigated by longterm maintenance costs associated with a larger code base.

Someone will inevitably argue that although you may end up with more classes and interfaces if you design for reusability, the code will ultimately be more readable. I dispute that. The code becomes more complex generally and it's not necessarily true that it becomes more readable. Does JMenu really need to have 433 methods? Why? It got that way because someone (lazily) decided to use inheritance as a code-reuse mechanism, instead of designing JMenu to have just what it needs. You could argue, "Well, so what? The ancestor classes are already written, they never have to be written again, why not reuse them?" There are so many fallacies with that argument, it's hard to know where to begin. JMenu is at the bottom of a 7-classes-deep inheritance chain. The odds that nothing in that chain will ever be rewritten in the future are small. Touching the code in that chain entails risk (a breakage risk for subclasses); this is the kind of thing that keeps half of Bangalore in business, doing regression tests. At runtime, you're carrying around the baggage of 400-odd methods you don't need. The footprint of your software (on disk and in memory) is bigger, performance is affected, garbage collection is affected.

What I'm suggesting is not that you should rewrite JMenu. What I'm saying is that if you're Sun, and you're going to write something like Swing (from a clean sheet of paper), do it with common sense in mind rather than taking an "inherit-the-world" approach to reusability.

Rest assured, when I write code, I try (out of sheer laziness) to make as many lines classes and methods reusable as makes sense (and no more). And I guess that's the point. Sometimes it doesn't make sense to go out of your way to write highly reusable code. Sometimes it's more important to have something small and streamlined that works now, that's purpose-built and does what it does well. If you can do that, fine. If you can't, for some reason, that's fine too, but do what's appropriate to the situation.

That does not mean you abandon good programming practices. It doesn't mean you write poorly structured code. It means you write only as much code as you need, and resist the temptation to overfactor. Unfortunately, the latter can be quite hard, especially if you're steeped in the Java arts.

There's a place in this world for silverware, and there's a place for plastic spoons. And yes, you can recycle plastic spoons, but for gosh sakes, silverware is expensive. Let's not accumulate it needlessly.

reade more...

Software Quality: A Survey of the State of the Art

I happened to come upon an excellent slide show by Capers Jones called Software Quality in 2008: A Survey of the State of the Art. It contains a number of eyebrow-raisers.

"Software is blamed for more major business problems than any other man-made product."
"Poor software quality has become one of the most expensive topics in human history."
20% of defects can trace their origins to faulty requirements.
Studies by Mitre, TRW, and Nippon Electric have found that 60% of shipping defects can be traced to design time.
Coding errors account for 25 to 40% of shipping bugs, far less than design-time errors.
Quality-excellence investments have an ROI of > $15 per $1 spent

The Jones deck is a quick read and well worth a quick look. You'll be thinking about it long after the 5 minutes you spent looking at it are over.

reade more...

Stupid JavaScript Tricks: LackOfSomethingToDoException

function getVitalInfoFromUser( ) {

     var up = "LackOfSomethingToDoException";

     // if user Cancels, exit ungracefully
     
      var userInput = prompt ( "Enter something:", "[here]" );

     if ( null == userInput || userInput.length == 0 )
               throw up;  // puke and die

}

reade more...

Top 15K English words

There are plenty of lists of commonly used English words out on the Web, but it turns out most such lists are limited to the 1000 or 2000 most-used words in whatever-corpus-was-sampled, and it's actually surprisingly hard to find a free list of, say, 10K or 20K words, sorted by frequency of usage. I did manage to find such a list, though (containing 15000 words) at AudienceDialog.net.

The explanation behind the genesis of the list is interesting:

While writing our page on Global English we discovered that the best vocabulary for students of English to aim at is around 15,000 words. With a vocabulary of that size, you should have a sustainable knowledge of English. That means when you find a word you don't know, you can usually work out from the context of the sentence. In a document of average difficulty, there will be less than 2 words in every 100 that you do not know.

The technical approach behind compiling the list is explained here. It's always interesting to see what kind of methodology someone uses when compiling lists of this sort, because the results are dependent on so many factors. There's no one right way to determine "the most frequently occurring words in the English language" (the whole idea is a bit absurd if you think about it) and in fact no two lists of this kind are ever the same. But that doesn't at all limit the utility of such lists for the purposes for which they're usually used, fortunately.

I wonder if anyone keeps a list of the most frequently occurring syllables?

And by the way: If you want to see a truly great interactive longtail graphic on this subject, I urge you to check out http://www.wordcount.org/main.php. It's astonishing.

reade more...

Pages

.