Tag: humanities

Coh-Metrix

Has anyone here experimented with this tool (http://cohmetrix.memphis.edu/cohmetrixpr/)? It is described as follows:

Coh-Metrix is a computational tool that produces indices of the linguistic and discourse representations of a text. These values can be used in many different ways to investigate the cohesion of the explicit text and the coherence of the mental representation of the text. Our definition of cohesion consists of characteristics of the explicit text that play some role in helping the reader mentally connect ideas in the text (Graesser, McNamara, & Louwerse, 2003). The definition of coherence is the subject of much debate. Theoretically, the coherence of a text is defined by the interaction between linguistic representations and knowledge representations. When we put the spotlight on the text, however, coherence can be defined as characteristics of the text (i.e., aspects of cohesion) that are likely to contribute to the coherence of the mental representation. Coh-Metrix provides indices of such cohesion characteristics. http://141.225.213.52/CohMetrixWeb2/HelpFile2.htm

The tool has recently been used to analyse (surprise, surprise) the language of the candidates in the US Presidential election (http://wordwatchers.wordpress.com/). It would be particularly interesting if this had been tried on more demanding text or with more demanding questions.

Linguistic Inquiry and Word Count (LIWC)

LIWC (http://liwc.net/liwcdescription.php) seems at first glance to be methodologically much simpler. As far as I can tell from a quick reading, it computes scores based on occurrences of target words pre-defined to belong to different affective categories, plus scores based on counts of sentence length and the like. It depends centrally on a dictionary of 4500 words:

The LIWC2007 Dictionary is the heart of the text analysis strategy. The default LIWC2007 Dictionary is composed of almost 4,500 words and word stems. Each word or word stem defines one or more word categories or subdictionaries. For example, the word cried is part of five word categories: sadness, negative emotion, overall affect, verb, and past tense verb. Hence, if it is found in the target text, each of these five subdictionary scale scores will be incremented. As in this example, many of the LIWC2007 categories are arranged hierarchically. All anger words, by definition, will be categorized as negative emotion and overall emotion words. Note too that word stems can be captured by the LIWC2007 system. For example, the LIWC2007 Dictionary includes the stem hungr* which allows for any target word that matches the first five letters to be counted as an ingestion word (including hungry, hungrier, hungriest). The asterisk, then, denotes the acceptance of all letters, hyphens, or numbers following its appearance.

Not being up-to-date with research in this area (psycholinguistics?) I don’t know how this tool compares with affective research via text-analysis that has been going on for decades. Perhaps someone here can say. How reliable is such research?

PhiloLine announced on HDG

2008-12-04

work

This is from a recent posting in the Humanist Discussion Group:

We are pleased to announce the alpha release of PhiloLine, an extension to PhiloLogic designed identify similar passages in relatively large collections of documents. PhiloLine is based on a simple implementation of a sequence alignment algorithm, a generalized technique used in bioinformatics and other disciplines. This implementation performs an all-to-all comparison of a set of documents loaded in PhiloLogic and generates results which can be linked to and from the database. PhiloLine is an experimental implement of our more generalized PAIR (Pairwise Alignment for Intertextual Relations) implementation which functions without PhiloLogic bindings to be released in Winter 2009.

Source code, documentationand release notes, links to relevant papers, and a slide show discussing sequence alignment in digital humanities are available at [Google Code][http://code.google.com/p/text-pair/].

PhiloLine, like PhiloLogic and PhiloMine, are open source systems. Please feel free to contact us at the address listed on the site with your comments, complaints, bug reports (yes, there will be bugs), suggestions and, always most gratefully accepted, code.

The Expense of Field Research

2008-06-19

work

This has been a great week. Two exceptional interviews with two exceptional individuals. On the way back from one of those interviews today, I got on I-10 in Crowley and realized I would probably be a little short for gas for the entire trip back to Lafayette.

So I popped off at Rayne to fill up. The signs announcing $4 a gallon didn’t really make an impression, but the $50 readout on the pump did. Whoa. Suddenly the fact that I am a field researcher became a clear expense. The difference between me and my fellow humanists is not only do they never need to leave their campus offices or their home studies but they don’t have to pump $50 worth of additional gas into their cars or trucks once, or sometimes twice, a week.

And, while I’m thinking about it, that doesn’t include money spent on batteries, hard drives, tapes, and other supplies let alone the money spent on equipment itself: camera, recorder, microphone. Why, why do this? Wouldn’t it be easier to work with existing data, with existing texts? Yes, yes it would. But I think it’s part of my job as a folklorist to add to the archeological record, to bring more people into history, to make data. If that means my job moves more slowly, so be it. But it really would be nice if somehow one got credit for such work. If Project Bamboo’s efforts could somehow lead to my colleagues occasionally recognizing that, it would have done at least this particular field researcher an immense favor.

D2A: Direct to Archive

2008-06-15

work

It’s interesting how not only one’s discipline but also one’s practice within it so sharply shapes your view of methods and technologies both near and far. Reading the Project Bamboo proposal, for example, prompted a field researcher like me to respond that the library is not … [quotation here].

As I noted in my 4/6 presentation in Chicago, I don’t want to marginalize the library. I want to re-center it as a working repository to which many contribute as well as upon which many draw. All of this means that I see the library, or archive (I will use the two interchangeably), as a collaborator in my research process. One way it can do that is to help warranty the safety of my data. How can it do that?

By acting as my backup? That’s right. The archive needs data to exist, and field researchers need a safe place for their data. The great advantage of the digital age is that copying data is easy and inexpensive — all things considered. Just as importantly, in the digital age I can give the library my data and still have it for myself. In fact, by giving a copy away early and often I guarantee I will have it for myself.

This is something I am calling Direct to Archive, or D2A for short. One of the greatest chores in going through a collection of recordings, be they images or audio or video or even pages of field notes, is properly sorting, labeling, outlining, and indexing them. It is joyous when you discover something, but in between those moments of joy are long trawls through a variety of materials. (The trawling of course is what sets up the discovery: it’s only by flipping through photo after photo of an artifact that suddenly, to one’s conscious mind but not really suddenly, a pattern emerges.) One does it when you’re logging materials as you gather then, and then there’s the later effort to do something similar when you turn over a box of materials to an archive.

But why wait? Why not simply make the two motions the same? Coders call it DRY, short for “Don’t Repeat Yourself.” The application here is simple. A lot of field researchers are already using some for of digital asset management software (DAM for short). For me, it’s an application like Adobe’s Lightroom which I use to organize my images. So right away a couple of important caveats here:

I don’t have any good DAM software for audio or for video. (There’s something for a future Project Bamboo team.)*
This software only organizes my digital images and the relatively small percentage of film images — slide and print — that I have had the time or wherewithal to digitize.

My current process when I get back from fieldwork is to take the memory cards out of my camera and/or my camera bag and put them in my card reader. I fire up Lightroom to import the images into my library — Lightroom’s own term. That library sits on an external hard drive, but I have the option, which I use, of simultaneously backing up images to another volume. In the image below, you should be able to make out that backing up to another volume, in this case called “StJerome”, can occur even as I am uploading images onto my main volume.

Lightroom *Click here to embiggen.

Why not make that other volume a hard drive sitting in an archive vault somewhere? My current DSL connection probably wouldn’t support it, but it will some day. (I could come close now if I was willing to pay AT&T an exorbitant amount of money every month, but I’m not.) Another alternative would be something like the Flickr export plug-in that someone has already made for Lightroom. Why not a similar plug-in for an archive. All my images in my library not only have all the usual EXIF information, which one day will have GPS already built-in, but I have gone through the trouble of adding a fair number of tags:

Louisiana
Boat
Crawfish Boat
Gerard Olinger

All my images? Yes. Why not? I have nothing to hide, nothing to lose, by making all my images available. Any system could easily make it possible for a researcher uploading his data to later manage it, setting terms and conditions for usage. One easily imagined term is that no materials would be available to the public for two years, three years, five years, or until a certain date. Et cetera. In the detail below, you can see that “Flickr” is one possible export. If I can export that easily to my Flickr account, surely I should be able to export to an archives. (Here’s a complete view of the export window in Lightroom.)

Lightroom export detail

Such a system would have multiple advantages:

A researcher would have a reliable back-up.
Such a system backing one up would also encourage researchers to be more thorough-going in their logging — let’s admit that it helps to have an audience and that might take the edge off a task too easily put off for later.
Archives would be in a collaborative relationship with researchers from the very beginning of a research project, making it possible not only for archivists and librarians to have a fuller understanding of the research process but also for researchers to have a better understanding of data management. Equally compelling is the opportunity both parties would have in potentially developing new ideas or seeing new things in extant materials. (The old saw about more hands make the work lighter applies here.)
Finally, archives could guarantee their own development, nurturing collections even in their formation. (Please note that I’m not concerned about how this might bias data collection. I have faith in the process over the long term.)

Our Ontological Future

2008-05-30

work

Recently, the National Institute for Standards and Technology hosted a conference to establish a word/logic bank for thinking machines. Out of that conference came an agreement:

Information scientists announced an agreement last month on a “concept bank” programmers could use to build thinking machines that reason about complex problems at the frontiers of knowledge from advanced manufacturing to biomedicine.

The agreement by ontologists — experts in word meanings and in using appropriate words to build actionable machine commands — outlined the critical functions of such a bank. It was reached at a two-day Ontology Summit held during NIST’s Interoperability Week in Gaithersburg, Md. The decision to create a unique Internet facility called the Open Ontology Repository (OOR) culminated more than three months of Internet discussion. (Quote taken from Science Blog report. The OOR proposal is here.)

When I was an undergraduate in college, I was both an English and Philosophy major. (I know, what hope for me, eh?) Studying philosophy in the 1980s, before the rise of artificial intelligence (AI), ontology meant only one thing: the study of existence to determine what entities (we called them phenomena) were present, what categories (or types) into which those entities prevailed, and the relationships between entities.

With the rise of AI, there has been a need to re-use ontology with a different vector: ontology can also be a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. So far as I know, Tom Gruber and his colleagues were the first to re-use ontology in this way. Their use is not as far from the philosophical usage as they might believe: their goal is to establish a set of concept definitions expressly for knowledge sharing and re-use. To my mind, such a project isn’t that far from what philosophers were doing, especially within the phenomenological tradition. Their goal, at least in my reading of Heidegger and Bachelard and others, was a kind of concise mapping of the universe as humans understood it in order to understand the very principles of human understanding. (Levi-Strauss’ structuralism operated in much the same manner. Again, to my mind, which may now be proving itself divergent and/or errant.)

The point, for Gruber et al., is that one commits to an ontology — their term is in fact “ontological commitment” — in order to create agents that can then engage in knowledge sharing. There were several levels (layers, dimensions) to what Project Bamboo participants aspired to, but one was definitely at the deep infrastructural level of meta-data. One of the groups in which I participated was tasked with the job of teasing out the notion of foraging which was something that the larger group perceived as being a commonality among humanities practitioners. We go out. We search for data. Faced with the forest of data to be found in libraries, which really do feel like being lost in the woods sometimes (in a good way), and on-line, we forage. Sometimes we find what we wanted. Sometimes we find not the berries we were looking for, but a root which is even better. That is the nature of foraging. All of us, however, yearn for better breadcrumbs through our proverbial forests. Better search devices would seem to be a key to better, more efficient searches — though one does have to wonder if efficiency, within the humanities paradigm, doesn’t also lead to impoverishment. Building better searches would seem to be founded on not only the data being accessible, which is one of the shiniest promises of the digital age, but also the data being searchable. Often what we want to know about something isn’t contained within the object itself. Take, for instance, a digital audio file of a performance by Varise Conner of his “Lake Arthur Stomp.” Nothing in the file itself will tell you the name of the tune — there are no words, no refrain. Nothing will tell you who originated the song or who is playing it now, unless you can recognize the tune and/or the style of its performance. Or that it’s a melody in the Cajun repertoire. Or that its author was of Irish descent. Or that he lived in Vermilion parish. Or that he was also a sawyer. None of this is the data itself. It’s all meta-data and it turns out that meta-data is sometimes more important than the data itself, especially when it comes to finding the data. The problem is committing to a meta-data set. Some of you live in places where there is a university library, which usually adheres to the Library of Congress call system, and a local public library, many of which still use the Dewey Decimal system. It’s not as easy as switching gears from letters to numbers, from PS to 800. The ordering of entities and their groupings are different. Philosophy, for example, occupies a different place and is near different things in the two systems. And that’s just to catalog — in order to house and then to find — books and other printed materials. What do you do with other kinds of objects? Especially objects that will never actually be housed in a physical facility? (We begin to border on an infinite regression here, since what we are dealing with is housing data about objects which, it turns out, is really meta-data itself. Oh. My.) But that is the grail that humanists seek, because we really would like to get as much of the human universe into a form that is searchable and accessible. Why is that important? Well, precisely because so much human activity still lies outside the scope of libraries and archives. And that not only includes the majority of humanity on this planet, but the majority of the lives of even the hyper-connected. One could easily argue that any assessment of what humans are based on what is currently available is really only a small part of the story. And we’re looking to tell a big story. I am the father of a toddler, after all, and so the idea of putting things away in a place where you can later find them is central to my existence.

My Bio for Humanities Computing

2008-05-12

work

In order to join the Humanities Computing mailing list, you have to apply. One part of the application requires that you compose a short biography about yourself with your interest in humanities computing as the focus. Here’s what I wrote:

I am a folklorist whose primary field of interest is human ingenuity. While I have published on linguistic/literary topics, my primary interest is in material culture. My interest in computing has two dimensions: I am interested in technology itself as a manifestation of techné and because it helps me solve problems, both through its application as well as in grappling with it as a craft in and of itself. (I should also admit that I am the son of a mother and father who were themselves gadget freaks and firmly believed that technology, as the manifestation of progress itself, was capable of solving almost any problem. I inherited, I confess, some of their optimism.)

I am currently at work on a book about boats that go on land and water here in south Louisiana. These are clearly technological creations, and computing offers me two things: (1) a better way to describe the archeological record I am creating — through the use of CAD and 3D modeling software — and (2) it gives me some opportunity to make machines of my own — I am currently teaching myself how to script in Ruby and I run my personal website on Rails. I had no formal education in computer science or in programming, and so this is a logic that is fairly foreign to me. Frankly, it makes my head hurt on a regular basis. But in making my head hurt, I am — I hope — training myself to think in new ways, to see new things in what I already know, and learning to communicate complex relationships in another language, in much the same way that I am trying to convert the complex relationships contained within these metal machines into words.

I have for some time been thinking about computers and networks as the new platform not only for study but also for communication, and I have done a fair amount of experimentation in that direction. (There will be more on my website, http://johnlaudun.org/, shortly, but I am slowly rebuilding it and that rebuilding will be delayed by the Project Bamboo meeting later this week.)

I have experimented with using computing as a platform for teaching. Please see the current version of the Louisiana Survey of Folk Culture at http://code.google.com/p/louisianasurvey for the first survey. My idea there — I’m not claiming it was that grand or that well done — was that having students who were taking their first, and typically their only, folklore class write long, synthetic essays was an exercise frustration for both them and me. Better to involve them in some larger project where their steps were straightforward but the edifice within which they worked provided a path toward synthesis. Out of that, we began a wiki that allowed students to index discrete items — like jokes, anecdotes, dites — by genre, teller, location, use, etc. … goodness, this got long. Sorry.

In the Era of the Meta-Platform, Content Is King

2008-04-20

work

The Story Everyone Tells

What we are in the midst of, but everyone already knows, is that the computer is displacing all other means of communication and distribution of content. Already the race for IP TV, as it is sometimes called, is on. IP stands for “internet protocol” and it acknowledges that television programming will no longer be delivered in analog waves but in digital bits. Almost all consumers already are getting their television pictures in bits; they just don’t know it. Cable companies may charge more for “digital” packages but the fact of the matter is that even basic cable programming is already largely distributed digitally and then converted to analog at the node as opposed to in your house. Take a close look at your television picture some time, if you have cable, and look at a smoke-filled or dark scene, you will discover a range of digital artifacts that will reveal to you that you aren’t getting a full analog signal but merely a digital signal resolved “up” to something like analog. The switch to digital equipment allows cable companies to push four digital channels in the same bandwidth of one analog. To do so, they have to degrade the picture.

And while radio stations still exist — and they will continue to exist in the scenario I unfold here, but in a lesser, or at least different, capacity — I think everyone would agree that you see few people listening to radios outside of certain contexts. The radio still plays in the car and in shops and other work situations, but when you see people listening to music individually, they are usually porting some sort of MP3 player. Most new cars now come either with auxiliary ports or iPod docks built into their radios.

While we’re in the car, let’s not forget the GPS aids in there, along with the DVD players. When I first wrote the proposal for the LCVC project four years ago, I predicted we would see a convergence of those two devices, and I think that’s already begun to happen. It won’t be long before you’ll not only get live updates of traffic additions to your in-car navigation / entertainment device, but other forms of content streamed to it as well. That device will be simply another computer, one that is location aware — as the iPhone G3 already is.

All these devices will, of course, function by being “on the net.” That means they will be delivering on the great promise that the internet made — at least that was the intent of the people who worked to create the network and so far it has largely held to be true — and that is to put the means of production into a vastly larger number of people’s hands than ever before. The communications industries have their array of catch words for this phenomena, “one-to-one advertizing” or “niche marketing” or “focused marketing” or even “the long tail”, but they are also just as worried about what is, in effect, P2P, because the reality is: they don’t own the network. They can’t, and that boggles and maddens them.

Now, I should take a moment to say what I mean by “communications industries.” For me, these are the usual suspects of radio and television stations and cable channels, the movie studios (which produce content in the sense of financing it but don’t really make anything themselves) and the record labels. It also includes all the support industries that have sprung up to service the financial conduit that make these providers go: advertizing, marketing, etc. Universities train workers for these sectors and lump their job-training programs under the banner of “communications” — sometimes breaking out a department of “mass communications” — and thus it’s just easier for me to call the whole lot “communications industries.”

Now, having defined some terms, let’s get back to the convergence, because it really changes a lot.

(It doesn’t change “everything” as some technologists are fond of proclaiming, and thus I can now further qualify the qualification I made above about how I am not calling for the demise of the communications industries, and the university departments, or other personnel, that serve them.)

Perhaps the best way to describe what is happening is to look no further than the RIAA’s current campaign to make web-based radio stations pay more than broadcast radio stations for the distribution of content. (The figure I have heard bandied about is something like double, but I’ll leave that to others to argue over.) The recording industry’s argument for why conventional radio stations are getting a better deal is that they drive album sales where network radio stations drive only sales of singles. Immediately, a previous era, and its methods and business models, is revealed quite clearly.

In making that claim the recording industry conveniently leaves out a larger history that would recall a rather long period during which radio stations sold “singles” in the form of fast-playing 45s that contained both the single that made it onto the air as well as the once-infamous “B side” cut which everyone hoped would prove the depth of a musical performer and lead to album sales. The album era thus had the down-side of having these singles, which had to be priced less but of which you could sell a lot, but it had the upside that only the record companies could produce them. The audio cassette era was short-lived, but it too had its upside — largely the demise of the single — and its downside — the ability of consumers to make their own tapes.

The “mix tape” survived into the compact disk (CD) era, but the single did not, which meant that the recording industry could sharpen its focus on its business model, which was to continue to generate what really were often “one-hit wonders” that really did sell albums that held only one song worth listening to.

(This experience was recently made fresh for me. Because we live in a small house, Yung-Hsing and I have been making a concentrated effort to “purge” stuff we don’t use or don’t need. We recently came across two boxes tucked up in a closet full of audio cassettes, and as I went through them, I realized how much money I had spent for what really came down to only one to three songs per tape that were worth remembering — quite literally, I could only remember one to three songs per tape.)

So the recording industry built albums with a hit or two on them, and the radio stations played those hits which were, in turn, only available through purchase of the album. It was the rare station that played albums, not necessarily because there really was so little on albums but because in order to maximize the total amount of product displayed, they could only play a song per album. They had to maximize the number of albums because they were, after all, a broadcast medium, which meant they had to appeal to the maximum number of individuals who might be listening during any given time period. It’s one thing to listen to three minutes of a song or musician you don’t like, another to know you have to wait half an hour or an hour to hear something different.

And so audio content got packaged into three-minute chunks. That isn’t necessarily how long all songs or tunes want to be, but that’s how long they are. As a folklorist and as someone who attends live music events, I know — as do you — that the three-minute song in no way reflects the nature of jam session nor does it reflect the classic dance tunes that usually have at least an A and B part and can thus be played for as long as people seem to be willing to dance or the musicians are interested in going. The three-minute single has been here for so long that we are all now convinced that that is how long we want to dance.

Much the same goes for television, with important differences of course. Like radio, television sought to parse out its content in order to maximize viewers. Like radio, it did so in part by limiting the length of its content so that it could rotate enough material so as to minimize alienating individual viewers — thus the half-hour comedy and one-hour drama were born. It also had to schedule its programming so that viewers could find it. Since viewers could not choose when to watch a program, it had to come on at a fixed time, which led to things lasting only in terms of one of three time allotments: half hour, one hour, two hours. (So far as I know, it wasn’t until the mini-series came along in the 1970s that the two-hour ceiling was broken.)

Well, what does all this have to do with the content? (And in turn with the humanities?)

Everything. Perhaps most importantly, in the broadcast era, given the nature of the distribution channels and the businesses that evolved to feed those channels, content was structured for the time allotted. (Movies shown on television now carry the warning upfront that they have been edited not only for various content concerns but also “to run in the allotted time.”)

In the post-broadcast, post-channel era, programming doesn’t need to conform to broadcast schedules, or to air play schemes. Instead, programming can conform to the content. As I said in the talk I gave for the Project Bamboo workshop up in Chicago, an audio program or video program can now be as long or as short as it needs to be to deliver its content. I listen to a variety of podcasts. It’s clear which ones are created by old-school broadcasters and which are generated by new-school content creators. A program like “This Week in Technology” always clocks in right at an hour. A program like “The Ruby on Rails Podcast” is 27:51 one week, 40:55 the next, and 1:02:13 another. It is, in short, as long as it needs to be and no longer. There’s no need to fill for air time and there’s no need to cut things short either.

Just as important, this new network (the network of connected computers) doesn’t care what form my content comes in. It can deliver audio and video, but it was delivering texts for far longer and it does just as well at delivering images. And it can deliver all those formats in any mix I as a content creator want or need or I as a content consumer desire. One photo not enough? I can create a slideshow on Flickr or any of a dozen sites. Slideshow pacing too fast? As a consumer, I can decide the pace at which I want to proceed.

Let’s call it “fit.” In this new network era, the era of the meta-platform as I termed it above, form is fitted to content. Content creators are free not only to choose what form works best to deliver a given content but they are equally free to size the form appropriately for the scope of the content. It’s not just audio and video, the staples of radio and television, that are going to enjoy this new paradigm, but think about texts! I think about it in terms of scholarly publishing in the humanities, where the 7500 to 10000 word article has been the mainstay of academic productivity. Yes, there is some argument to be made that expression of certain arguments or ideas require a given amount of room (words), but it is no less a factor that journals are built around publishing x number of articles averaging y number of words.

It frees up a lot.

And it frees up content creators to engage their readers in more ways. There will always be the simple disgorgement of content, but there is also the increasing number of collaborations that are taking place between artists and their audiences. Blog posts and the attached comments are a writer-reader revolution in this regard. Readers not only respond, but their response becomes part of the content for future readers!

All this is already changing the way I teach, and will continue to do so for some time to come. It, too, is freeing, especially in terms of what it means for the classroom. For certain kinds of content, like lectures, which deliver a fixed and mostly static content, I can create it in advance and post it to a course website. Students need only view the material in advance of a class activity — a discussion, a workshop, or something else that I am now free to invent. More importantly, they can stop and start the lecture according to their needs as note-takers or learners. They can watch a lecture again, if they need to. That frees up class time for other forms of collaboration between teacher and student in order not only to maximize the student’s education but the teacher’s own as well.

Other forms of collaboration are also possible. This past spring I began something I am calling, for the time being and for lack of anything catchier, “The Louisiana Survey.” I asked my Louisiana folklore class to go out and record any kind of stories they had heard that met the requirements for being folklore. Instead of telling them that Louisiana folklore consisted of X genres with Y topics, I allowed them not only to figure it out for themselves, but in the process which actually began to map out not what had been the case, as is the case with the printed collections, but what is the case. So, yes, folklorists know the folktale’s time has come and gone, but it turns out that it isn’t just the personal anecdote and the joke that have taken over in terms of what we talk about when we talk to each other but the history as well. I might have been able to guess this after a year or more in the field, but I would also never quite be sure if the fact that I was an university professor wasn’t skewing the results. With twenty-some odd college students out talking with their friends and the family, there is a much better sense not only that the data reflects a broader reality but with so much of it, we can actually begin to quantify the dimensions of that reality.

All of that material, by the way, is publicly available as a Google Code project. It’s at code.google.com/p/louisianasurvey. And it’s in the form of a wiki. My students not only did the research, but they also did the publishing and are now authors of documentation available to the world at large.

A Condensation of Our Results So Far

Reality is multi-modal.
The content of any representation of a reality should be in the mode that best suits the function, and purpose, of the representation.
The networked era frees content creators to pursue the mode of their choosing.

A New Future for the Humanities

Wasn’t that the title of what was the original Star Wars? (Later, we discovered it was Episode 4, which wasn’t so bad when it was the prelude to Episodes 5 and 6 but not so good when it was the results of Episodes 1 and 2.)

So how do the humanities fit into all of this? Why does any of this matter to me as humanist? (Sure, the technologist in me is also interested, but that’s for another time.)

It’s really pretty simple. I think the humanities are really in the best position, and have in some ways long been about, training content creators, and I don’t simply mean in the way that UL currently has it configured as training workers for the “digital media” industries. Framing it in terms of “digital media” and in terms of “industries” reveals that the folks who wrote the university’s vision just aren’t thinking about how much reality of content creation and consumption may very well already be changing.

What we need to be doing is not creating “digital media workers” but creating content specialists who are fully aware of the new platform and its many possibilities.

But I’m not sure I really like the term “aware” because that seems an awful lot like the current computing requirement of our students. They seem aware that computers can do more things than check e-mail and surf the web, but they have no reality of how any of it works. If they have exposure to production using computers, then it’s mostly to use word processors as somewhat glorified typewriters to mash out term papers in the usual fashion.

Don’t get me wrong. I’m a big believer in texts and textual production. I love turning out words — this text is currently at 3000 words after all, but it troubles me when I discuss hypertext markup and students have no idea what I’m talking about. Hypertext markup. It doesn’t get more rudimentary than that, and yet our students, juniors and seniors, have no idea what it looks like, let alone how to produce it.

Which is another way to say that my vision for humanities becoming the new content specialists is predicated that the larger goal of universities is going to make sure that students have a firm grasp of the foundations of the networked world in which they live. They are going to need to know at least the core concepts and practices of acronyms like IP/TCP, HTML, and SQL as well as the technologies behind DRM and the technologies that implement it like HDMI.

And it goes without saying that I think humanities faculties are going to have to get themselves versed in these things as well. All of these things are involved not only in the production, distribution, and consumption of ideas but the latter are at the heart of emergent intellectual property regimes, which will probably get worse before they get better.

What will humanities faculty being doing in their classrooms? Well, they will probably be discussing many of the same things they always have. Human nature remains as diverse and as consistent as ever. Extraordinary representations of it will continue to seize our imaginations, and examination of those representations will continue to be one way that we encourage students to examine all representations as well as to create their own representations.

That’s the central engine that has always driven the humanities.

But we are in, or, for universities like UL which are not quite there yet, emerging into a universe where the kinds of texts that we examine and produce are far more varied. More importantly, some texts which were beyond us because of the technologies involved — e.g., film — are now within our reach not only as consumers but also as producers.

We will, in short, need to be versed in a lot more media. We will also, however, be free to find those forms, as well as those contents, with which we are most proficient. Content best determines form, but that is in constant dialogue with the producer who will have his/her preferences and proclivities. (Some of us write better than we play music or frame images or edit videos.)

And, I hope, humanities faculty will encourage, if not require, their students to discover for themselves their preferences and their proclivities. And now, with form freed, so can our preferences and proclivities for content. Thoreau once urged his readers to “gnaw your own bone.” I like the way Annie Dillard puts it: each of us has within us that one thing we were put on this earth to give life to. As a folklorist, I am open to the fact that for some this will be a crawfish boat or side plow. For others this will be a particular dish or a quilt. And for others it will be a story or a way of telling jokes. That is, folk culture has long had this openness to diversity, to everyone within a community finding their own excellence and pursuing it.

It looks like we might be on the verge of being able to do this on a larger scale, and that’s terribly exciting.

I’ll leave for another time the program I think humanists need to pursue to get there, but I think I’ve begun to paint the bigger picture as I see it for you. I’ll be discussing some of this when I go to Chicago for the Project Bamboo workshop in a few weeks.

What Platforms Do

2007-10-04

work

In everyday usage, the word platform usually means a surface raised above ground level in order to accomplish some task. One such task of platforms is at public events, so that performers or speakers may be seen and heard quite literally above the audience or crowd, and hence the ideas upon which a group founds itself are sometimes called a platform. It made sense, then to extend the metaphor even further within the realm of computing to be “the standards that set the parameters for what a system can and cannot do.” The notion generally refers to the microprocessor at the heart of the computer’s hardware or the operating system which orchestrates interactions between the user and the hardware. In essence, a platform is an agreed upon set of conventions. Within a computing platform, there are agreed upon ways to interact with the kernel — there is, in fact, a term for this: API or “application programming interface.”

In less technological realms, political parties establish platforms, and those who wish to run for public office agree to abide by an agreed upon set of tenets or ideals when allying themselves with one party or another. Using platform in this fashion reveals that we live in a world of platforms — the irony of the “platform shooter” within the video game world should not be lost here — if we imagine that platforms are spaces within which we agree to live by a certain set of rules.

Television and cinema are one such platform: though there is room for negotiation on quantity and quality, everyone agrees that images and sound are central to publishing on that platform. If you want to work within that industry, then it’s incumbent on you to learn the basics of good shots — sound, lighting, composition — and how to edit those shots into a montage that conveys your idea. There are further refinements of the platform, depending upon what you want to achieve. If your goal is to produce the next great sitcom or next great Discovery documentary, then you will need to understand and abide by the conventions established within those forms. Please note that none of this precludes innovating within a platform, across platforms, or developing new platforms.

Arguably, the personal computer and the internet have become not only a new platform but also one that can deliver other, older media platforms like television and radio. The innovation it has spawned as a result of not only absorbing those older platforms but also shaking up the conventions within which they operated can easily be seen in the rise of the “podcast.” Formerly, radio programming was bound by clocks, because it went out as a live broadcast or stream and viewers could not tune in later to catch the same program. This meant a viewer had to know when a program began and ended. It became the convention to start programs either on the hour or the half hour, in order to make it easier for viewers to remember when a program aired. That meant a program had to be in increments of half hours, at the very least.

But suppose one didn’t have half an hour of content? Too bad — for the viewer that is — expand it to fit the space. But it turns out that a lot can get done in ten minutes, and the ten-minute podcast is a rather common length. With the rise of personal computers and the internet, immensely powerful forms of data aggregation and analysis as well as communication of syntheses in topic-appropriate media became something within reach of individuals and not the exclusive domain of institutions and industries.

I originally wrote the above as the prelude to a course syllabus for on computing in the humanities. I concluded the syllabus with:

The goal of this course is to introduce participants to the basic elements of the computing platform: the creation of texts/data and manipulation in order to arrive at new insights, interpretations, and knowledge.

And then I offered up the following units:

The Command Line. The humanistic user of any platform should have a reflexive understanding of the very basics of its operations. In the case of the computer, the place to start is the command line. In this unit, we will learn how to: log into the shell, understand and navigate directory hierarchies, create and edit texts using an editor (e.g., Nano, vi, emacs).

Working with Texts. After creating and editing texts, we need to be able to search quickly through them for things they have in common in order to discern larger patterns. In this unit, we will: manipulate texts using grep, sed, and other shell programs including the use of options and pipes to control results; work with regular expressions.

From Texts to Data. It doesn’t take long before the average writer or scholar has not only a wide variety of texts but also a great umber of them. Keeping such a large number of texts organized and being able to call up relevant results when needed is best achieved by committing information to a database. There are a variety of database options, free ones even, but the emergent standard for basic tasks is MySQL.

Of Packets and Ports. The nature of communication. How is information exchanged?

The Structure of Things. There are a confusing array of MLs out there. The two most common are HTML and XML, but their approach to describing data and documents is different enough that we need to spend some time talking about different needs and approaches. In this unit we will explore the differences between document structure (HTML) and information structure (XML).

Outputs. None of our research and productivity means much if the data we have collected, the information we have developed, and the knowledge we have created isn’t made accessible and/or public.

All those who wander are not lost.

Tag: humanities

Two More Text Analysis Tools from HDG

PhiloLine announced on HDG

The Expense of Field Research

D2A: Direct to Archive

Our Ontological Future

My Bio for Humanities Computing

In the Era of the Meta-Platform, Content Is King

The Story Everyone Tells

A Condensation of Our Results So Far

A New Future for the Humanities

What Platforms Do