Humane Interfaces

There's more to User Interface (UI) design than just aesthetics and fluffy feelings; actually, there's hard theoretical work turning a typically guru-istic field into hard engineering(1). The goal is to start from the basics of human psychology and information theory to derive a quantitative means for analyzing and designing UIs. And perhaps nobody has achieved so much in this field as Jef Raskin(2), popularly known as the inventor of the Macintosh. However, his central characteristic seemed to be his inability to accept the status quo. Lately, Jef Raskin believed that we probably reached a dead-end in the development of interaction paradigms for text, and envisioned a profound change in the entire model of computing: From an application-centered model to a content-operator model, meaning you should eliminate the artificial differences between Operating Systems (OSs) and applications – i.e., in very practical terms, to add a different capability to the system, you need just to leverage the already existing commands rather than writing a specialized application, with its specialized command structure. The application-centered model of computation is most often characterized by inconsistencies of operations and the need to rebuild many functions already existing in other applications. In the new world of the content-operator model of computation, there's only 2 objects: Content and operations on that content, so you can use any command on any content, at any time. There's no fiddling with the tool; no bloatware.

But, hey, did you really mentioned … "commands," in the above paragraph? Well, yeah. Why not? This is what you will most probably testify in the next, say, 10 or at most 20 years from now, in case you're not a hermit: The next generation UIs, or, if you prefer, the next UI breakthroughs will be a kind of return to the "fundamentals". Paradoxically, this much possible technological path tends to make computers more transparent to the user, so you don't have to think as much about using the computer while … you know, while actually using it.

As a matter of fact, Graphical User Interfaces (GUIs) were a big step forward in terms of making computers … well, (I couldn't help myself) usable by people who are not part of that all-important group of "nerdy little bowtied fissiparous creatures" (now seriously, no offense here(3)). Even today, there are some brilliant developers producing good open-source & free software who, when confronted with people asking for easier UIs insist to (angrily) say something like "All these whinners should either RTMF and learn how to use the command line, or they should shut up and go away". Underlying this simple phrase there's a serious attitude based on 3 common mistakes:

  1. they're assuming that improving the UI means exactly making it more like Microsoft's;
  2. they're assuming that improving the UI is something only novices will benefit from;
  3. they're assuming that non-technical users are inherently dumb.

Overall, they're completely overestimating the importance of their software in people's lives. Non-technical users just don't have time or the inclination to learn the implementation of their software; they probably contribute to society in some other way – e.g., by trying to correct the pronounced lordosis on that same software developer. Finally, the quote-unquote the status quo is good enough is not an attitude that has ever lead to progress.

In fact, there are lots of lessons we have learned from both, the good old Command-Line Interfaces (CLIs) and GUIs.

With CLIs you can get a lot more done with just a few keystrokes, thanks to features as: very short command names, tab-based autocompletion, the command history that let you easily repeat or modify earlier commands, and its highly customizable nature. This make it a very efficient interface in a quantitative, information-theoretic sense. Furthermore, CLIs are not just a set of commands, they're an entire very expressive computer language (e.g., the Bash is Turing-complete): Pipes, stdin/stdout redirections, backticks, environment variables et al form the grammar, while the executables form the vocabulary. Every written command line is a little one-time program you can make reusable with shell scripting. OK, we have to accept that that's beautiful, but if we will consider a wider market share there are some serious misfeatures:

  1. CLIs are not inherently discoverable, since there's no guidance given to the first-time user;
  2. from the point of view of the non-technical user, that same very short command names that save a lot of the time of the software developers are nothing more than cryptic, unnatural, unfamiliar names, so they have to be learned by rote;
  3. in the same pace, the myriad command options that makes for an almost unbelievable wide range of control over the computer are hard to remember, and
  4. it's also unbelievably easy to make mistakes (and there's no undo!)

(OK, some of you already know or will discover that a significant part of the open-source community has made a huge effort to make the internal functioning of Unix-based systems more transparent to the user, but note that this often means large steps toward the common desktop metaphor – i.e., there's little to no real UI innovation.)

Now to the GUIs, there are some valuable lessons too. Experience is showing that the combination of tagging and searching is sufficient for navigating vast ammounts of content(4), but the visual/spatial representation and organization of GUIs are unbeatable when the task at hand depends on optimal and exhaustive navigation on smaller ammounts of content. However, in the anxious run to abandon CLIs for GUIs we totally left behind the versatility of language, with all its immense ("infinite"?) ammount of descriptive power (e.g, for capturing abstractions) that pictures haven't. The current overuse of icons doesn't necessarily makes a program easier to learn; meanwhile, it makes it completely language-agnostic(5).

For some people the next generation of UIs are just behind our eyes: search command languages – i.e., search engines (retrieve information) converted into answer machines (retrieve knowledge), controlled by a modern version of CLI. Think of the address bar in a web browser, or Google Search. Googling something is almost always faster than wading through your web browser's bookmarks menu. Consider also the quick-add features of Google Calendar that forgo the clurky and time-consuming forms that make you think the way the designer thought: You just have to type an event's information. However, these instances are not correctly classifiable as command line languages. They are more of form interfaces, because generally they can execute only one command: Go to this URL, add this event, search the web for this word or expression, etc. (However, some often unkown advancements are also in place here:

  1. Google Search actually performs some calculations, define words, recognizes US addresses and telephone numbers, suggests and gives you a clue for the number of results of each "anticipated' search input; 
  2. In the Opera web browser you can go to the address bar and type something like "imdb 'horton'" and directly get the search results from inside the IMDB database.

This kind of intelligent parsing and matching of inputs are clear and undeniable improvements in form interfaces.) The underlying control languages are yet more ad-hoc than systematic and the language forms are still spotty and idiosyncratic, but there are already some virtues: they are somewhat tolerant to variations(6) (robustness) and exhibit slight touches of natural language flexibility. Then, if you use an illegal command format the answer engine can retreat to the status of a search engine, often returning pages that are of direct relevance. I.e., there's no need for strict adherence to syntax and form. Gmail has no hierarchical menu structured for storage, but a search line interaction that's good enough, provided you attach labels (the equivalent of folders in natural language) and make use of keywords and whatever is available and necessary for organization and information retrieval.

For the desktop itself, there's a bunch of find-n-launch applications over there, so you no longer have to always think the way the computer does. On Microsoft Vista's Desktop Search, you can search folders and save dynamically changing results. The Mac OS X's Quicksilver is a traditional find-n-launch application which provides an additional and very efficient way for emailing a selected file, to skip a song on iTunes, to append some text to a file somewhere, etc. Just to name a few more: Google Desktop Search, JPSoft's 4NT and "Take Command", AppRocket and Gnome Deskbar. Different programs, different capabilities. Some require a hold of one key while you type the verb "open", a clear loss of efficiency as you have 4 more necessary keystrokes and can't use one finger anymore.

The old Jef Raskin has suggested the concept of humane interfaces as a more explaining term than usable interfaces. (See his book "The Humane Interface: New Directions for Designing Interactive Systems.") Here's a current general definition: "An interface is humane if it's responsive to human needs and considerate of human frailties." OK, it seems a better term, at least in the sense that "human needs/frailties" are concepts accurately catalogued by more than 50 years of Cognitive Psychology. However, what are the desired levels of responsiveness and considerateness? An alternative, simpler definition is "interfaces optimized for the human brain and anatomy," at many different aspects: efficiency, comfort, pleasure of experience. That sounds good, as some of the abovementioned find-n-launch applications requiring a significant ammount of finger gymnastics appears to be optimized for monkeys, as their foots are much more flexible than the human hands.

The ideal, humane UI combines the efficiency and expressiveness of conventional CLIs with the ease of learning of (some) GUIs. Here's a short list of some of the extremelly chalengeable general features the ideal humane interface must have, according to Aza Raskin:

  1. it must autocomplete a partial word with a keystroke, while giving you suggestions about other commands it "thinks" you might be looking for, e.g., according to your working context or to the type of data you've selected, while giving you clues about what you can type next and what the current would-be command will do if executed, while helping you understand what ranges of arguments to a command are valid and what they mean, while remembering suggestions you've chosen in the past and pop them up next time you give the same input, while giving you a sensible way to resolve eventual input ambiguities;
  2. it must handle commands with multiple arguments and various data types, while letting you chain commands together, while allowing you to compose complex commands out of small parts, while allowing you to save this complex command with a simple name for future use;
  3. it must provide a simple and efficient way to create and share commands with interested people.

I've found one different, more interesting and worth describing vision about the future, ideal, humane UI: A context-sensitive, implicit command-based interface driven by collected user data. In fact, the vision encompasses a socialized text-based interface where the computer uses statistics to find the best matching actions other users have performed before. Actually, data mining on human contributed data can be much more powerful than today's Artificial Intelligence systems trying to be humane in terms of understanding user behavior, syntax forgiveness and so on. The envisioned implementation involves ubiquitous desktop search boxes where users will frequently formulate a task in terms of a text search. Eventually, they will receive, as a result, step-by-step instructions that will compose a new "interface": Now, the user will just type the query in a DoIt-box, instead of the search boxes. Unfortunately, there's a huge set of requirements here:

  1. context information about what the user's currently doing, and
  2. a public database which records users' interactions with their computers.

Some standard protocol will be required for polling application's states. There are obvious and serious privacy issues involved in building the aforementioned database, but a less comprehensive approach could build it from specialized help forums – i.e., current forums' "explain me" mode would be replaced by "do it for me once". Common queries (the FAQ-analogon) will not charge the forums as they will yield succesful results first. Despite being automatically filtered by the ranking system, some sort of trust system will also be needed in order to overcome spam queries properly. Parameters inherent to non-digital tasks should be ranked on frequency of usage. Here's a real syntax-free implementation: There's no use of verbs in imperative form and no need to learn commands – i.e., you just have to guess which words other users have used to describe the task at hand.

Footnotes:

(1) There's 3 basic tools in a UI designer "toolkit":

  1. the GOMS model analysis, created in the early 80's, for predicting how long it will take a user to use an arbitrary interface;
  2. the Fitts's law, "discovered" in the 50's, helps predicting how long it will take a user to target an object, based on object size and distance from the user (most common use in UI design: predicting how long it will take the average user to move the cursor to an on-screen bottom or menu), and
  3. information theory, the most general of the 3, is actually a full-blown mathematical theory and gives you an absolute rating for how good an interface is (compared with the theoretic-best interface).

(2) Along his life, Jef Raskin has collected a remarkable set of achievements: He was a Professor of Art & Photography and a competent bycicle racer, orchestral soloist and model airplane designer; he was also a published mathematician.
(3) Despite being the 2nd time I use this infamous expression in my writeups.
(4) I.e., avoid the indiscriminate use of hierarchies; imagine yourself visually navigating through 1GB of stored email, 700 folders, 6,000 files and 4,500 digital photographs.
(5) The range of options a text interface gives us effortlessly is huge: take just 5 alphanumeric characters (something some people can type in, roughly, 1 second) and you can choose one out of 100,000,000 possible sequences.
(6) E.g, when faced with spelling inaccuracy, the system can correct the spelling errors or at least suggest variants; often, synonims and related terms are used.

Um comentário sobre “Humane Interfaces

Deixe um comentário