• Home
  • About PLUMB
  • PLUMB Contributors

PLUMB

A Cultural and Arts Blogazine

Feeds:
Posts
Comments
« Review – Nothing or Next to Nothing by Barry Graham
Short Story Month: Cami Park’s Food Stories, Part One »

I Am Not An Algorithm: Text Engines and the Future of Writing

May 8, 2011 by mattbriggs

Eclectic Pencil is 35 Years Old

When I open a document in a word processor there are two features that are so familiar I don’t think about them anymore. The document is on a white background. In some word processors the metaphor of a piece of paper is delineated by a bounding box the proportions of a sheet of typewriter paper with a drop shadow as if the paper is somehow lifted slightly off the a table. I have used a typewriter. When I open a word processor my major interaction with the word processor is to type. I type. The words appear on the screen. They unfurl from my mind appearing as a row of words accumulating on the virtual page. Unlike the noisy clatter of a typewriter on a desk with a hammer marking each letter, the letters on the word processor flow smoothly. In the early days there was some lag between my typing and what appeared on the screen. I would type a burst and then pause and the burst would flow onto the screen coming from the other major feature of the word processor: the cursor. The cursor blinks. At one time it was a colored box. Now it is a thin line that blinks and shows the mark between what I’ve have written onto the screen and what is still within my mind. The principle aim of the word processor is to capture that flow of text and store it in a document (a collection of virtual pages.)

This paradigm was effective in the early days of the word processor. And in fact like the accounting spreadsheet was such an effective user interface for a previous paper and ink task, such an effective conceptual metaphor for creating and storing the written word that it lent itself to several waves of “killer apps.” Lotus 1 2 3 (using the spreadsheets metaphor) was the first killer app in the early days of the computer. But the word processor was just as essential. And both applications have become so entrenched into the perception of the possibilities of a personal computer that it is nearly impossible to consider a desktop computer, or any device that offers the features of a desktop computer, that cannot run a word processor or spreadsheet software.

The cursor was always the mark of possibility in the word processor. The blinking courser differentiates the word processor from a typewriter. The word processor simulates a typewriter. In fact, that blinking indicated the computer contains at least one capability impossible in the typewriter: the machine’s awareness of time. Each blink suggests a process looping in the background capable of performing computations.

Less obvious is the role of the computer as a telecommunications device. The early desktop was invariably a connected device, a dummy terminal connected to a single mainframe computer. With the advent of the microprocessor, computers could be housed in a device the size of a desktop typewriter. These desktop computers were not often connected to each other. They sat untethered to data lines in their desktop. To communicate, it required printing something, folding it up, putting in an envelope and mailing it. A modem was a separate device, a bit esoteric in the 1980s. It required a cradle for receiver. Phones were typically analog devices. Even the phone ring was created by the agitation of a metal bell. Now the phone is a tiny computer, completely digital, and it is nearly impossible to think of a computer without also thinking about the World Wide Web and the teaming and explosive access to all of the people and devices found there.

In this connected word, no one mails anything. No one really even faxes anything. Even email is a slightly fusty technology compared to the flow of text found in twitter streams and status updates. Text pours from a billion coursers into the Web and is captured in a vast network of databases, entered not just from keyboards, but tiny touch screen on our phones, and from automated voice transcriptions.

But the simple word processor remains a relic in our operating systems. Microsoft Word has bolted some concessions to the wired world into it — you can right-click and look up a word in Wikipedia — but the essential metaphor, a sheet of paper, harkens back to typewriters, correction tape, and postage stamps.

The old word processor should be thrown out. Do not even consider it. We don’t read on paper and haven’t been using it write for nearly a generation. What would the new word processor be like? And how might this new word processor change our conception of composition?

Text Engines: Machines Can Read and Write

While the word processor of the future would have a great deal of flexibility about the presentation and control of the user interface for the input of text this is a trivial thing compared to the three main capabilities that would provide for a radically different tool than the current hoary model employed by Microsoft Word.

1. Encapsulating composition into meta-tagged units

When writing the author would compose texts in “chunks” of information. Encapsulating these chunks would provide the tool with the ability to manipulate the text beyond the unit of character, letter, and word that is found in present day word processor. The tool would process text in the following units: letter, word, phrase, sentence, paragraph, passage, chapter, book, volume, and library levels.

A few basic applications of this would be:

  • Ability to alter the representation of the text from just a linear view (of one word after another) to more dynamic views such as mind-map, outline, and cluster.  For example, you could interact with your text as virtual post-its on a virtual wall.
  • Ability to reuse text within the same document, and when updating a frequently used passage the change would be reflected in each occurring of the document. This would support, for instance, a modular documentation practice.

2. Natural Language Processing and Text Mining.

The tool would parse your writing and be able to place it within a machine-based understanding of language. Natural language processing uses information technology and linguistics concerned with the interactions between computers and human language. There are several main strains in how these processing techniques occur, but essentially it allows for machines to create algorithms informed by virtual models that allow for the prediction and assessment of language. The basic difference between this processing model and classic word processor is in the realm of spell check. Word checks if a string, a sequence of letters, occurs its dictionary. If it does, then it is “spelled correctly.” Every writer has encountered spelling errors as a result of this assumption, particularly with homophones such as accept/except. A tool using natural language processing would place the string into a language model, and in the context of this model, understand if the word was spelled correctly or not. But this is just one aspect of the use of the model. The model would be constantly refined based on user input but not just from your activity, but all users of the product. The tool would also be constantly harvesting models from the Web and parsing this information and refining and improving the accuracy of its model in relation to your use of language. In a sense like a cook’s pan, the tool would get seasoned through use.
A few basic applications of a model would be:

  • Automatically change the sentence tense or point of view.
  • Automatically identify changes in diction.
  • Automatically check grammar relative to models classified in discourse groups such as “technical writing,” “classic American fiction,” “the Brontes,” or any set of texts.
  • Create brief abstracts and representations of larger texts. For instance, the tool could identify key concepts or figures in the piece (a processes called entity-extraction).

3. Natural Language Generation and Text Production. Reversing the direction of the parser would allow for the tool to be become a writer. It could generate text from its models. The most primitive model of this is the surrealist game The Exquisite Corpse where a basic model of a sentence, in this case, subject, verb, object converted into six words: 1) adjective 2) noun 3) adverb 4) verb 5) adjective 6) noun can create a sentence. The game is named after the first sentence produced in this manner: The exquisite corpse will drink the new wine. Now expand on this idea to include the ability to: 1) randomly (or systematically) choose words within a model discourse 2) choose a sentence syntax that fits within a model discourse 3) Automated generation of model text. You then have the ability of a tool to spin out not only unique sentences, but unique documents that resemble (in terms of the machine model) love letters, articles, novels, and epics.

A few basic applications of this model would be:

  • Randomly generate a text based on parameters you set.
  • Randomly generate a text based on a machine-trained model such as as “technical writing,” “classic American fiction,” “the Brontes,” or any set of texts.

4. Text Mixing.

Combining both a machine understanding of your text and the ability to generate text, you could re-mix your text. This would be analogous to applying a filter in Photoshop or an audio program.

A few basic applications of this model would be:

  • “Auto-tune: your writing so that slips in diction, sentence tense, and point of view were “normalized.” Or you could write something in one style and then shift it to another style, say “technical writing,” and the tool would shift your work into that style (thereby flattening individual idiosyncrasies). From a professional stand point this process could support multiple writers turning out nearly identical copy.
  • You could apply filters that would result in particular effects like Photoshop. For instance you could make your language more or less sexist, in the style of James Joyce, in the style of David Foster Wallace, or what have you.
  • You could perform an auto-cut up in the style of William Burroughs with another text or a randomly generated text.

Collaborating with the Machine

Text entry may or may not be from a keyboard. Text output may or may not be to paper. Instead we have a machine capable of ingesting language input, performing computations and mechanical analysis of that language, and storing it for retrieval in a variety of contexts. In addition, this machine allows for the operator to connect to other users, to online libraries, and to programmers who may constantly increase the functionality of the program.

When writing a sentence (or line of poetry), the tool would provide a tool that would focus on the composition of the individual sentence.  The sentence would be written in the middle of a space that could allow for word look-ups, assessment of the sound qualities in the word, diagram your sentence, and measure the poetic meter of your line.

A paragraph would benefit from its own interface in terms of assessing diction, syntax, sentence length and variation, and other features of a paragraph composition.

Text is authored in segments and chunks that are described with meta-data. Rather than a continuous document, the tool would process what you have written in terms of words, controlled phrases, sentences, paragraphs, and passages. This seems like a small difference, however, imagine working on a novel in which you could shift between the view of your novel as an unbroken stream of text, to a view of your novel as an outline, and then a view of your novel as a mindmap. If you shuffled around the mind map or outline, and then expand back to a text view and you are looking at a different novel.

Each chunk of text would contain meta-data descriptions such as a short summary or any tags or labels you would like to assign.

Organized in chunks you could view your work as a pile of index cards, post it notes, or blocks and organize them as you would a sequence of index cards, post-it notes, blocks, or necklace beads. You could place each chunk on a virtual board, and then string them together like beads on a necklace. You could also dynamically organize the chunks of text based on the meta-data. For instance, say you wanted each chunk of text organized against a list of key terms (such as an index) then the tool could shuffle the chunks of text in relation to each word.

These chunks then become nodes in a mindmap, or conversely a mindmap could produce a collection of document chunks. These chunks could also become levels in an outline. And of course individual chunks could be re-sued where needed in your document.

The tool would capture and measure your daily writing so that you could measure your writing speed either in words per unit of time or words per project. You could review reports both of what you have written in terms of volume, but also what kind of things you have written about in terms of abstracts, subjects, and frequent words and phrases.

Behind the blinking cursor, the tool would parse your sentences and while not understanding the sentence in the way that a human brain would understand the sentence, the tool would understand the syntax of your sentence (if it was well-formed or mal-formed). The sentence syntax would not only be in the context of the piece you were composing but exist in relation to model documents, the ongoing writing of other writers, and the flow of text on the Web. Dictionaries would exist not as a pure lexical look up (does this string occur in the dictionary) but rather does this word conform to the syntax limitations of a word, does the word fit into the connotation of the word, does the word fit into the domain of discourse for the document, and so on.

After you had written a piece, you could open a panel that would walk you through your work with a number of natural language based, machine reader measures such as originality. How unique is each sentence, paragraph, passage, and the overall piece? It would compare each unit against Google, for example, and the library of text used as training models. The tool would mark reoccurring phrases that were either unique to the piece or collectively shared (such as clichés). Furthermore, the machine could assess seemingly qualitative measures such as “interestingness,” the appeal of passages to female or male readers, the conformance of passages to trained models. For instance you could place every story accepted by The New Yorker for the last year, and check how well you story compares to the machine model of The New Yorker story.

The tool would be able to perform the standard set of lexical checks against a dictionary (word list) but also use semantic models trained using natural language processing and text mining strategies to essentially understand and check your copy not only against particular style books but against work produced by human editors in relation to a style book.

Thus not only could the tool mark if you are using a part of speech in an odd way, but it would be able to assess if you are writing in a way that is consistent with a particular type of discourse.

The Future is Vapor

Such a tool begs the issue of what is “your own work.” I don’t think technology is neutral. Word-processor and blog software have both altered writing. Tools using natural language processing and crowd sourced text generation I suspect will tap into generative processes that can create uncanny simulations of writing and likely result in more sophisticated readers and a re-assessment of the underlying human elements to creative work. This places too the idea of progress I think more in relation to the kind of progressive step rooted in technology (this always been the case — alphabet, printing, photography, film all informing and language) rather than purely growth in technique using the fixed medium/technology of language.

Likely software will evolve more rapidly and produce writing that is as inventive and/or challenging (in certain ways) as writing produced directly by humans. Software is also a form of writing, but it is a writing that can contain its own capacity to produce more writing. Reportedly Narrative Science has already produced software that “out performs” human sports writers. I suspect “high-concept” writers — that is modernists such as James Joyce, Gertrude Stein, or Italo Calvino may be easier to turn into algorithms than even Jacqueline Susanne. Likewise, if you consider the success of the mechanism behind the exquisite corpse and if you add to that artificial intelligence, natural language processing, advances made in text/knowledge mining using word vectors, and the availability of a massive library of richly tagged digital text, a text processor/generator should be able to create not only Borge’s “Library of Babel” but also find in that library additional “as yet unwritten” works by James Joyce or the complete works of Sapho. Tasks such as turning Jane Austen into Bram Stoker would be as simple as doing a find and replace. If I were a sports writer right about now I would think about becoming something else. It would seem a kind of threat — since we equate language with human thought and clearly these machines are not thinking but spinning elaborate simulations, but really it makes me wonder at the outer edge of this, what unique cognitive functions do I bring to a piece of writing that could not be created by algorithms? In reading an Exquisite Corpse sentence, the human thought in that sentence is not in the generation of the sentence but in the reading of that sentence. I think that is one of the many ironies in thinking about Borge’s “Library of Babel” is that all of this knowledge is there and yet it is meaningless without human cognition.

A further step beyond this would then be simulated readers. What is the valuable output of reading a book for instance? Say we create a library of virtual texts being read by virtual readers, what could they produce for humans?

In terms of robots (GoogleBot is such a reader), terms such as “uniqueness” and “interestingness” already belong to the domain of text mining/machine reading. I can easily see literary magazines using a database backend / web front end to collect and machine read submissions. The tool could be trained with examples of stories that have been accepted and rejected and then submission that are close to rejected submission would get nixed before a human would look. And submission with a high ranking would get pushed to the top of the human reader’s queue. And then an arms race begins. The same vendor who made this useful submission tool/sorting tool could sell a tool to writers to help them “prepare” (aka ‘write’) their submissions so that they will get high rankings in the system and get in front of human readers and more likely to be accepted. As annoying as such an “advancement” sounds, actually it would be useful all of the way around, and in some ways improve the writing of submitters, but may also it would dramatically curtail the possibility of radical innovation outside of the system unless the editors trained the system to use other models. Such a word processor would create an arms race between generative computer models creating synthetic text and suddenly a new definition of “anti-synthetic text,” whatever that is.

Share this:

  • Share
  • Email
  • Print
  • Facebook
  • StumbleUpon
  • Twitter
  • Digg
  • Reddit

Like this:

Like
One blogger likes this post.
  • charlesdoddwhite

Posted in Culture, Writing | Tagged text engine, text mining, word processor | Leave a Comment

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 20 other followers

    • RSS - Posts
    • RSS - Comments
  • Meta

    • Register
    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.com
  • Authors

    • charlesdoddwhite
    • mattbriggs
    • Nicolette Wong
    • robertkloss
    • lauraellenscott
    • Lavinia Ludlow
    • kirbygann
    • cynthiareeser
    • Matt Baker
    • Rosemary Rhodes Royston
  • Tags

    appalachia art atticus books badlands. films band Broadcastr cami park discuss//words disproductions ebooks editing Electric Literature film food fiction freshen fried chicken and coffee horror htmlgiant interview J.A. Tyler kindle lavinia ludlow literature Literature Blog melville mfa programs moby dick photography PLUMB poetry pop review rock rusty barnes saul bellow short story month small press stephen king steve himmer teaching terence malick wes craven Woody Allen writing writing advice
  • Recent Comments

    • the dirty poet on Review of Emergency Room Wrestling by The Dirty Poet
    • James Tadd Adcox on Pynchon, again
    • robertkloss on Pynchon, again
    • herocious on Pynchon, again
    • zulutolstoy on On Heroes and Hero Worship
  • Top Clicks

    • None
  • Archives

    • September 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
  • Categories

    • Culture
    • Drinking
    • reading
    • Uncategorized
    • Writing
  • Networked Blog

    Follow this blog

Blog at WordPress.com.

Theme: MistyLook by Sadish.


Follow

Get every new post delivered to your Inbox.

Powered by WordPress.com
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.