unemployment depression

Doing AJAX from Acrobat: Tips, Tricks, and Travails

Praise ye mighty gods on Mt. Olympus: It is, in fact, possible to do AJAX from Acrobat. That's the good news. The rest of the news is (how shall we say?) not entirely salutary, and certainly not well documented. But it's pretty interesting nonetheless.

While it's certainly good news that you can do AJAX from Acrobat, Adobe (for whatever reason) has chosen not to follow the well-accepted idiom (in the Web world) of allowing AJAX code to run in the context of a web document. In other words, you can't just put your AJAX code in a PDF (as a field script in a form, say), then serve the PDF and expect to phone home to the server while the user is interacting with the PDF document. Instead, Adobe requires that you put your AJAX calls in a folder-level script, which is to say a static file that lives on your hard drive in a special subpath under your /Acrobat install path. This is roughly the equivalent of Firefox requiring that all AJAX be done in the context of a Greasemonkey script, say, or in the context of Jetpack. Hardly convenient.

The magic comes in a method called Net.HTTP.request(), which is part of the Acrobat JavaScript API. (You'll find it documented on page 548 of the JavaScript for Acrobat API Reference, April 2007 edition.) Due to security restrictions (supposedly), this method cannot be used in PDF forms, nor in a "document context," nor even in the JS console. It must specifically be used in a folder script.

If you look in your local Acrobat install hierarchy, you'll find a folder under /Acrobat called /Javascripts. What you need to do is create an ordinary text file, put your code inside it, and save that file (with a .js extension) in your /Javascripts folder. Acrobat will then load that file (and execute its contents) at program-launch time.

If you're paying attention, you'll notice right away that this means developing AJAX scripts for Acrobat is potentially rather tedious in that you have to restart Acrobat every time you want to test a change in a script.

Something else you're going to notice when you actually get around to testing scripts is that Acrobat pukes (gives a security error) if you don't explicitly tell Acrobat to trust the particular document that's open while you're running the script. This makes relatively little sense to me; after all, if it's a folder script (running outside the document context), why do I have to have a document open at all, and why do I now have to designate that doc as trusted? As we say in aviation, Whiskey Tango Foxtrot.

Whatever. Jumping through the hoops is easy enough to do in practice: To specify the doc as trusted, go to the Edit menu and choose Preferences (or just hit Control-K). In the dialog that appears, choose Security (Enhanced) from the list on the left, then click the Add File button and navigate to the document in question. Once you do this, you can run the AJAX code in your folder-level script.

But wait. How do you run the script? What's the user gesture for triggering a folder script? The answer is, you need to include code in the script that puts a new (custom) menu command on the File menu. The user can select that command to run the script.

Without further head-scratching, let me just show you some code that works:

ajax = function(cURL) {
    var params =
    {
            cVerb: "GET",
            cURL: cURL,
            oHandler:
            {
                    response: function(msg, uri, e,h){
                            var stream = msg;
                            var string = "";
                            string = SOAP.stringFromStream( stream );
                            app.alert( string );
                    }
            }
    };

    Net.HTTP.request(params);
}

app.addMenuItem({ cName: "AJAX", cParent: "File",
    cExec: 'ajax( "http://localhost/mypage");',
    cEnable: "event.rc = (event.target != null);",
    nPos: 0
});

Read the code from the bottom up. The app.addMenuItem() call at the bottom adds a new menu command, "AJAX", to Acrobat's File menu. When the command fires, it executes the code in cExec. For now, you can ignore the code in cEnable, which simply tests if a document is open. (The AJAX menu command will dim if there's no open PDF doc.)

Before going further, let's take note of the fact that the magical Net.HTTP.request() method needs one parameter: a parameter-block object. The parameter block, in turn, needs to have, at a bare minimum, a cURL property (containing a URL string pointing to the server resource you're trying to hit) and a cVerb property (containing one of 'GET', 'POST', 'PUT', 'DELETE', 'OPTIONS', or 'HEAD', or one of the allowed WebDAV verbs, or 'MKCALENDAR'). Optionally, the request block can also have a property called oHandler that will have its response() method called -- asynchronously, of course -- when the server is ready to respond.

So the basic notion is: Craft a param block, hand it to the Net.HTTP.request() method, and let params.oHandler.response() get a callback.

So far, so good. But what should you do inside response()? Well, when response() is called, it's called with four arguments. The first is the response body as a stream object (more about which in a minute). The second is the request URL you used to get the response. The third is an exception object. The fourth is an array of response headers. This is all (sparsely) documented in Adobe's JavaScript for Acrobat API Reference.

What's not so well documented by Adobe is what the heck you need to do in order to read a stream object. I'll spare you the suspense: It turns out the stream object is a string containing hex-encoded response data. The easiest way to decode it is to call SOAP.stringFromStream() on the stream, as illustrated above.

There's more -- lots more -- to doing AJAX from Acrobat (I haven't yet touched on authentication, for example, or WebDAV, or even how to do POST instead of GET), but these are the basics. If you end up doing something interesting with AcroJAX, be sure to add a comment below. And if you want to know how to do Acrobat AJAX against an Apache Sling repository, watch my blog space at dev.day.com. I'll be writing about that soon.

reade more...

In defense of PDF

Can you identify the living fossil?

There seems to be a growing perception, not unlike the controversies surrounding Flash, that Adobe's Portable Document Format is (in the modern world of the Web) a legacy format, something of a living fossil, a technological Coelacanth that refuses to become extinct. Some would go so far as to say PDF doesn't belong on the Web. Some would be wrong, however.

My current employer, Day Software, has (how shall I say this in politically correct form?) a strong prejudice in favor of HTML as the one true and proper Web format for documents. This is reflected in the fact that all of Day's product documentation (like just about everything else Day produces, information-wise) is available online as HTML. There are those at Day, I'm sure, who would like to see PDF disappear from Web sites, if not from planet earth. HTML is a bit of a religion around Day.

And yet, when I joined Day as an employee (two weeks ago), not one document in the half-inch-thick stack of new-employee paperwork that I was asked to fill out was based on an HTML form. Every single document was based on a PDF.

So what is this walking fossil called PDF and why is it still so pervasive?

In the beginning, there was Postscript. Far and away the most successful Adobe technology ever created, Postscript was the first commercially successful page-description language based on vector graphics. It's a Turing-complete language with subroutines, looping, branching, and all the rest. Amazingly, it continues to inhabit printers worldwide. (You probably rely on a Postscript driver to get your printer to work.) It's a brilliant bit of technology, describing, as it does, fonts and shapes and whole pages in resolution-independent terms, in a plain-text (non-binary) interpreted language. Write once, rasterize anywhere.

Since fonts themselves can be described in Postscript, and since Postscript is just text, you'd think PS would be the ideal self-contained document format. Alas, it is not. It's far too verbose, and laborious to render onscreen. (This is what killed Display Postscript.) Still, its inherent portability made Postscript a compelling basis for a document format. So Adobe went on a quest to make Postscript smaller and more amenable to quick screen rendering. They reduced the number of operators (and made their names smaller), and cut out subroutines, and eliminated loops, and did a bunch of other things designed to make Postscript small and screen-friendly, yet without sacrificing portability. The result was PDF.

The first generation of PDF was ASCII-based, and if you looked inside it you basically saw thinly disguised Postscript commands, with loops unrolled and major page elements described as objects. References to objects were maintained in offset tables. Fonts could be embedded, or not. It was still a somewhat verbose format, but at least it could be interpreted and rendered quickly, onscreen or to a printer. (The translation from PDF to Postscript is extremely straightforward.) Adobe came up with a free Reader program and put PDF files out there for anyone who wanted to give them a try. Lo and behold, the format took off.

Why? Why does the world need something like PDF? The (sad, to some) answer is that there is still a need in the world for electronic documents that mimic paper documents. There are industries (such as insurance) in which the physical size and placement of certain pieces of text is regulated by law, and many forms have to fit on a certain size piece of paper when printed out -- there's zero tolerance for text autowrap variations. Form 1040 from IRS has to look like Form 1040, every time. (Imagine the pandemonium at IRS if every tax form that arrived in the mail looked different because it was printed out on a different printer at a different resolution, with text wrapping every which way, all manner of font substitutions, etc.) Like it or not, certain documents have to look a certain way every time, without fail. This is where PDF shines.

Is PDF right for every occasion? No. It's not. No more than HTML is.

Is PDF going to become obsolete? Not any time soon. Not any more than Postscript.

Can/should PDF coexist with markup languages in the world of the Web? I think the answer is yes. Adobe has done a good job of making online PDF forms (for example) REST-friendly and user-friendly. Having to load Reader (or a Reader plug-in for your browser) is a bit of a hassle, but you do get a lot of bang for the buck. The advantages of PDF tend to balance out the disadvantages -- for certain users, in certain situations.

And that's the key. It's not about religion -- it's not about whose document format is inherently better or worse. It's about diversity and choice: choosing the right tool for the job and letting the user (or customer) choose what's right for her. This is the part that the HTML zealots don't get. Some customers want PDF. Some users demand to have documents in a format that looks nice onscreen and prints out nicely (and predictably) on a printer. By not providing those users with that choice, we're (in essence) forcing a technology decision on people. We're forcing our religion on non-converts. And historically, that's always been a dangerous thing to do.

reade more...

A workaround for Acrobat JavaScript's lack of a Selection API

Acrobat has a mind-destroyingly rich JavaScript API (with hundreds upon hundreds of methods and properties, on dozens of object types), but one thing it sorely lacks is a Selection object. Bluntly put, you can't write a line of code that fetches user-selected text on a page. Which sucks massively, because in any modern browser I can do the equivalent of

document.getSelection( )

to get the text of the user's selection (if any) on the current page. In Acrobat, alas, there's no such thing (using JavaScript, at least). If you want to write a plug-in to do the requisite magic using Acrobat's famously labyrinthine C++ API, be my guest (I'll see you at Christmas). But it seems like overkill (doesn't it?) to have to write a C++ plug-in to do the work of one line of JavaScript.

Fortunately, there's a workaround. I nearly fell off my barstool when I happened onto it.

It turns out (follow me now) you can get the exact location on a page of an annotation (such as a Highlight annotation) using the Acrobat JavaScript API; and you can programmagically get the exact location on a PDF page of any arbitrary word on that page. I thought about those two facts for awhile, and then a light bulb (~40 watt Energy Saver) went off in my head: If you're willing to use the Highlight annotation tool to make selections, and if you are willing to endure the indignity of iterating over every word on a page to compare each word's location to the known location of the Highlight annotation, you can discover exactly which words a user has highlighted on a page. It's a bit of a roundabout approach (and wins no awards for elegance), but it works.

The first thing you have to know is that Acrobat lets you do

getAnnots( page )

to obtain a list of Annotation objects (if any exist) on the specified page. (Just specify the zero-based page number: 0 for page one of the document, etc.)

The second thing you have to know is that every Annotation has a gazillion properties, one of which is the list of quads for the annotation. A quad simply contains the annotation's location in rotated page space. The key intuition here is that every annot has a bounding box (think of it as a rectangle) with an upper left corner, an upper right corner, and so on (duh). Each corner has x,y coordinates in page space (double duh). Since there are four corners and two coords for each, we're talking about eight floating-point numbers per quadrilateral (i.e., per quad).

Okay. Cache that thought.

Now consider that in the land of PDF, every word on a page also has a bounding box. And Adobe kindly provides us with a JS method for obtaining the bounding-box coordinates of the Nth word on a page:

getPageNthWordQuads( page, N )

Therefore it's possible to build a function that accepts, as arguments, an Annotation and a page number, and returns an array of all words on that page that intersect the quad space of the annot in question. Such a function looks like this:

function getHighlightedWords( annot, pagenumber ) {
       var annotQuads = annot.quads[0];
       var highlightedWords = new Array;
       // test every word on the page
       for (var i = 0; i < getPageNumWords(pagenumber); i++) {
               var q = getPageNthWordQuads( pagenumber ,i )[0];
               if ( q[1] == annotQuads[1])
                  if ( q[0] >= annotQuads[0] &&
                       q[6] <= annotQuads[6] )
                           highlightedWords.push(getPageNthWord( pagenumber ,i ));
       }
       return highlightedWords;
}


// Test the function:
// Note that this test assumes there is at least one
// annotation on the current page:
page = this.pageNum; // current page
firstAnnot = getAnnots( page )[0];
words = getHighlightedWords( firstAnnot, page );

We can safely compare quad coords for exact equality thanks to the fact that when Acrobat puts a Highlight annot on a page, it sets the annot's quad location to (exactly) the word's location. There's no "off by .0000001" type of situation to worry about.

Something to be aware of is that functions that return quad lists actually return an array of quads, not a single quad; you're usually interested in item zero of the array. (And recall that a quad is, itself, an array -- of eight numbers.)

I tested the above function in Acrobat 9 using Highlight annotations and (voila!) it seems to work.

Now if Adobe will get busy and add a proper Selection object to its JavaScript API, I can turn off the 40-watt bulbs and hop back on my barstool, and get some real work done.

reade more...

Designers, please don't make this UI mistake

While we're on the subject of UI faux pas, let me mention another type of blunder I'm running into more and more these days. See if you can figure out from the following screen shot what I'm talking about. (Click the graphic to see a larger version of it.)

Is it just me, or does anyone else see the illogical nature of having a menu command that leads to a submenu command of exactly the same name? Save As > Save As makes no sense to me. Does it make sense to you?

In the above case, the perpetrator is Adobe. I entered bug 2642169 in the appropriate bug-tracking system; hopefully it will get fixed. There are other problems with the above menus. Some of the commands need to have an ellipsis ("...") after them because they lead to dialogs rather than having an immediate effect. But that's minor (IMHO) compared to Save As > Save As.

Here is yet another instance of the Same > Same problem:

The perpetrator in this case is Microsoft. Somehow (and I don't remember what I did, exactly), I got to this weird menu situation in Vista's Explorer while doing a desktop operation. Obviously, New > New makes no sense.

I could cite a third example, but I think you get the point.

Don't have menu commands that go to submenu commands of the same exact name. It's illogical and it looks funny. It's not what nature intended.

reade more...

Cancel does not mean Done

I keep running into situations in user interfaces for well-known products where the Cancel button is misused. A typical scenario involves a configuration dialog, one that perhaps has several tabs. Because there are so many configuration options, you might spend several minutes on this one dialog, checking checkboxes and entering string values. When you're done, you look for a Done button -- and find none. Is there a Finish button, perhaps? No. An "Accept These Settings" button? No. Just a Cancel button.

Do you see what's wrong with this picture? Cancel does not mean Done. It does not mean "I'm finished now, so accept my changes and save my work and dismiss this dialog."

Let's be clear, Cancel means (or should mean) one thing only: Let me back out of the current operation as if nothing had happened.

UI designers, take note. Please. Don't make me use "Cancel" to get out of a dialog after I know I've just made important changes to the state of the program.

reade more...

Resizing a PDF document from A4 to Letter with Javascript

You'd think (wouldn't you?) that it would be easy to change the native printing size of a PDF document from A4 to U.S. Letter, or vice versa. I'm not talking about simply printing the document out with "Shrink to Printable Area" enabled in the Scaling part of the Print dialog. (There are several problems with this.) I'm talking about actually changing the native format of the document itself from A4 to Letter or vice versa. I'm talking about changing the actual page size. It turns out there is no easy way to do this in Acrobat Professional.

At first, I thought I could use the Crop dialog to resize the pages of my A4 document, growing the margins by 25 points on the left and right sides (to make the page 8.5 inches wide) and shrinking the page by 8 or 9 points at top and bottom (to make the page 11 inches tall). But to do it, you need to enter a negative margin size for left and right margins. Acrobat won't let you do that.

The trouble with "Shrink to Printable Area," incidentally, is that it simply downsizes the A4 page by about 6% (retaining the A4 aspect ratio), pinning the page(s) to the top of the print area, which causes the page to have a large amount of white space at the bottom. (In other words, the page is no longer centered vertically.) Call me fussy, but this won't do. I want the page centered, to my liking, on a Letter-size area, with no downsizing at all. (The A4 document in question already has plenty of margin space; it doesn't need more. The pages simply need to be cropped and centered.)

Javascript to the rescue. It turns out Acrobat Professional's Javascript API exposes a couple of helpful methods that allow me to do just what I need to do. The getPageBox() method will tell you what the native size of the doc's pages is, in points (or is it picas?). It turns out an A4 page is 842 tall by 595 wide. A Letter-sized page is 792 by 612.

The method that gets the job done is called setPageBoxes(). For consistency, it should probably have been named setPageBox(), but Adobe decided setPageBoxes() was more descriptive. Either that or someone just wanted to be perverse.

In any event, setPageBoxes() has four parameters: The first is the box type value (a String), which is one of Crop, Media, Art, Bleed, or Trim. The second parameter is the (zero-based) page number at which to start cropping. The third param is the ending page number. The final parameter is an array of four numbers representing the new left, right, bottom, and top coordinates of the (final cropped) page, in rotated user space.

The magic lines of code that worked for me are as follows:

var rect = [ -9,812,603,20];
this.setPageBoxes("Crop",0,276,rect);

I typed those lines in the Javascript console in Acrobat Professional, highlighted them with the mouse, and clicked Control-Enter to execute the code. Very quickly, Acrobat cropped all 277 pages of my A4 document.

I played around a little to get the top and bottom margins where I wanted them (because the doc's original margins are a little too generous at the top and too skimpy at the bottom of the page). Note, however, that rect[1] minus rect[3] equals 792 and rect[2] minus rect[0] equals 612, which means the final page size is 8.5 by 11. (Recall that there are 72 points in an inch.)

Note, by the way, that you can use negative numbers in the rect array. This means you can expand a page on the lefthand side, using Javascript, whereas (as I mentioned earlier) you cannot do this in Acrobat's Crop UI, where negative numbers are not allowed. Very very handy.

So there you have it. If ever you need to resize an A4 document to Letter size (or vice versa), now you know how. You just need a copy of Acrobat Professional.

reade more...

Chrome annoyances: Console does not support multiline code entry

I'm dismayed (and shocked) to learn that the Chrome Javascript console still doesn't support the manual entry of multiple lines of code; instead, you have to copy and paste multiline-code into the console to work around the problem, which frankly sucks pretty badly.

If I want to type 3 lines of code into the console, like so:

> function subtract( a,b ) {
return a - b;
}

. . . I simply cannot. The instant I hit Enter after typing "function subtract( a,b ) {", that line executes -- with a Syntax Error (unexpected end of input). I can't type the whole function, on multiple lines, then execute it.

In Adobe Acrobat Pro (which has a Javascript console -- enter Control-J to see it), Adobe solved this problem by letting Enter take you to the next line (as you'd expect) and letting Control-Enter execute all lines of code. Firebug's console has the same behavior.

I would like to see Chrome implement the same behavior as it exists in the Firebug and Acrobat consoles. The Enter key should let you type on multiple lines. Control-Enter should execute code.

Alas, Chrome doesn't work that way. The one-line-at-a-time behavior has been in Chrome since the beginning (and is still there in 5.0.375.70 beta). Issue 35487 was raised in February, addressing the problem. Let's hope it gets fixed soon. As a developer, I find it to be a blocker: i.e., it's a top-priority bug, not just an annoyance. Immediate attention required.

reade more...

Pages

.