Building DAMs and other digital constructions: 2007

Monday, December 3, 2007

Where we are at...

I am putting off thinking about what my access tool will be until the ALA Mid-Winter Meeting, which is here in Philadelphia, and will allow for me to be totally overwhelmed by the vendor exhibits. My director and I have talked briefly, however, and I am on track to acquire my software sometime between April and June.

In the meantime, I get to play with Special Collections. I'm really listening to Dr. Skinner when she said that you need to know your collection before you choose the tool, so I'm revamping the Special Collections website and photographing the collections to learn what we have, meaning both physical objects and their bibliographic control. What diverse, fun stuff!

Here's a couple of highlights:

Photo Sharing and Video Hosting at Photobucket

This is an illuminated bible from the 14th Century.

This is an artist's book of Ho Chi Minh, handprinted on rice paper in a handmade case.

And then there are kitchen matches from the premier of Rambo 3 in France.

This quickly shows the range of materials I will be working with. We have both the written word, preserved carefully in our Bible collection, and then there is the Vietnam War collection, intentionally a collection of artistic expressions including both fine art and ephemeral novelties. My challenge here is to come up a methodology for exposing these materials in a thought-out, cohesive manner. Hopefully by the time I'm ready to choose my software, I'll be ready to tackle the digital needs of the collections.

Thursday, November 29, 2007

Stewardship of Digital Assets @ Palinet (Post 2)

Alright -- Onto Day 2 of my notes from the Stewardship of Digital Assets workshop!

Dr. Katherine Skinner from Emory University and the MetaArchive spoke for 3/4 of the day. Her interest in digital preservation has risen from her doctoral research on emergent fields. While digital preservation is a few steps beyond just stumbling in the dark at this point, it is still an emergent field and we should "get used to the discomfort" of what goes along with being on the bleeding edge.

In her definition, Skinner describes digital preservation as the "management and maintenance of digital information over a long period of time." How long, she says, is unknown at this point, but its sure is longer than many of our access systems will be around, which is currently running at about 3 years per system. This is why adopting recognized standards are important -- for interoperability -- between systems and between collaborating institutions.

Skinner's largest project of the last few years has been in developing the MetaArchive, which was in turn developed out of the Lots of Copies Keeps Stuff Safe (LOCKSS) framework created by Stanford. The idea behind LOCKSS and the MetaArchive is that a minimum of 6 copies of information is stored across a large geographic space (could be even across several continents). Automated checks continually verify the accuracy and completeness of data. If anything happens to a single file in one of the repositories then the other 5 check to make sure their data is complete, and replace the bad or corrupt file in the 6th location. This idea of spreading information across a great geographic area could help restore information in the case of a major disaster. An example that Skinner gave was NPR content that was destroyed by Hurricane Katrina was able to be restored from the mirrored content kept at Emory University.

The biggest "AHA!" that came out of the day for me was that if you choose your access tool too early in the digital library design process, then your collection ends up becoming bound by the tool. By examining and fully understanding your collection, you can choose a tool that allows for curation of the materials that complements the care received by the analog materials. Process for selection should be identifying materials, seeing what information is available (ie. cataloging record) and then choose the metadata. Only when this process is complete should the tool be selected. I will have more thoughts on how this will change my approach to developing my repository in a later post.

The presentation was great, and the instructors knowledgable and friendly. They realize fully that as the field is new, no one really truly knows what they are doing for the long haul, and the more that we can help each other out, the more successful the digital preservation program will be.

The internet is running out of space!

"Consumer demand for bandwidth could see the internet running out of capacity as early as 2010, a new study warns."

http://news.bbc.co.uk/2/hi/technology/7103426.stm

I can't stop thinking about this study that reports that the internet could run out of bandwidth by 2010, and wonder how we -- internet users, content creators, and/or librarians -- could help solve this problem. Digital information is "invisible," or so it seems... You upload to the internet and its physically away from you -- Yet accessible as need be. But how much stuff on the internet is junk? I know that I personally have all the images and pages from my old website still hanging out on an FTP site somewhere -- And those images were probably unnecessarily large, as that was before I know anything about correct imaging for projects and webpages. If everyone just cleaned out their Flickr accounts, or deleted old webpages, could we then get another year or two out of the internet at its current capacity? By realizing that data truly are physical objects, perhaps we feel a greater sense of responsibility to the upkeep and care.

Its physics, its kind of string theory, I know, but as I sit here right now I am surrounded by the internet. I can't see it or feel it, but its here, physical, voluminous and completely disorganized. A Tech Director at a former job of mine always wanted to take a filing cabinet, cram if full of paper in dissarray, with materials falling out of it and jammed into the drawers. "This is what the server actually looks like," would be the message. I am sure that the same analogy would apply to the internet.

Tuesday, November 27, 2007

Stewardship of Digital Assets @ Palinet

On November 14 and 15th I had the opportunity to attend the first in a series of workshops on digital preservation. Developed by the North East Document Conservation Center, "Stewardship of Digital Assets" took digital projects to the next level -- No longer we were just talking about just digitizing materials, but we are beginning to look at sustaining the collections.

It's taken me about 2 weeks to get around to writing about this workshop, and I don't think that I would have been able to tackle any sooner. What a dense presentation! The faculty's experience was diverse and well reaching -- from working with the National Archives (NARA), the Research Libraries Group (RLG), Lots of Copies Keeps Stuff Safe (LOCKSS) framework developed at Stanford, the powerhouse Online Computer Learning Center (OCLC), the Pennsylvania Library Network (Palinet), Institute of Museum and Library Science (IMLS), and National Information Standards Organization (NISO), and that's just to name a few of the distinguished organizations that these four people worked for, developing standards and managing digital content. Liz Bischoff, Tom Clareson, Robin Dale, and Dr. Katherine Skinner were kind enough to share with the forty attendees the results of the rich work that they have been doing in encouragement of collaboration.

Overall, the librarians and archivists in attendence are newbies to the field, as 1/3 of participants have yet to begin digitizing anything. That is a welcome number to hear -- as it means my institution is not falling behind of the herd. What I have found happened to many early digitization projects, anyway, is that objects were not created at a high enough resolution to warrant digital preservation, and some projects may have to rescan to render objects to meet new standards. The prevailing wind for digital projects these days is to scan BIG (and I really mean BIG -- as big as your institution can afford to maintain) ie. full size @ 300 - 600 dpi. Save that BIG scan as an unweildy tiff, and then make smaller derivative jpegs from that file for use copies.

But I digress -- This workshop was not aimed at the creation of these files -- Moreover it addressed what you do with the files now that you have them. Digitizing costs lots of money -- Improper care of files can lead to obselence or corrpuption of materials, rendering all your hard work wasted.

The workshop began with a lesson in assessment, simply meaning that if you can't articulate your institution's needs, then you can't apply standards. Not everything can be preserved all the time, and what becomes preserved should be a conscious choice, not a passive decision out of laziness. Whether it be analog or digital, a preservation program costs money to upkeep, and lots of it -- Why save junk? If the file is in a lesser format at alower quality level then the institution should make less of a commitment to preservation.

A fundamental shift is happening in the digitization world -- no longer are digital projects finite, but morever regarded as a cradle to cradle process. To ensure the integrity and authenticity of a digital document over time, digital objects needs to have a sense of curation present -- one that guarantees coceptually that the information comes out the same way that it went in to the system. This can be accomplished through preservation metadata that tracks the lifecycle of the object.

Reshaping digital projects to digital programs clarifies who is in charge of what tasks, and where and how the information is stored. Many organizations contain disparate silos of information across campus. Let IT run the servers and back everything up. Let the digital repository manage access. That way the organization knows clearly where all the information is, how it is being managed, and by who. Informed individuals make educated decisions, and can also identify potential risks. For example, in just the last few years reccommendations have moved away from storage on CD/DVD to spinning disk, hard drive and tape. Why? CDs aren't scalable, are hard to manage, and have been found to fail -- 15% of all information at Emory University failed when checked on CD -- That may be 15% of ALL digital information. If you can get boxes of CDs out of peoples individual offices and centralized in one preservation office, risk management becomes all the easier.

Grants at this point will not give monies for long term management of objects -- But the language of many grants includes a commitment to the long-term access and preservation of the objects. As institutions complete the digitization project and the grant money runs out, institutions may find the best way to manage preservation is to repurposing time and leveraging the skills of existing staff. The teams that are created through this process should be evenly weighted between curatorial and tech staff -- Curators choose content, techies digitize.

Preservation must be stabilized -- Choose a recognized standard and keep it there, don't unnecessarily migrate information, secure funding and document EVERYTHING.

Yikes, that's already a lot of notes -- I think this will be best broken up into a series of posts.

Friday, November 16, 2007

Sarah Thomas at PMA

On Wednesday evening, I attended the Sarah Thomas library lecture at the Philadelphia Museum of Art, "First Sell the First Folio." The title of the lecture was in reference to how, many years ago, the Bodleian Library at Oxford deaccessioned a first folio of Shakespeare's plays, not seeing the potential value of the 1st edition. Thomas used this analogy to make a leap to why the physical existance of libraries is still relevant, and how dismissing libraries as obselete in the digital age can seem as erroneous as a decision to get rid of a first edition of a potential future rare title.

Thomas made the argument that we can't digitize everything, especially if its analog existence doesn't have strict bibliographic control already. Even if it was digitized, there would be no access information. Digitization also means we would need to make choices about what to keep and allow access to -- and if we destroy materials after they are digitized we may be losing the future's equivalent to the First Folio. Likewise, if libraries only exist digitally, then our choices about what to digitize are as weighted as what to purchase, preserve or destroy -- if patrons cannot access the materials it is the same as choosing them to not be in the collection.

She referred to the traditional round reading room as the "center of the universe where one would consult the oracle." As libraries have been replaced as information hub by the internet, the library must shift from being a box of books to a suite of services. To remain relevant, the library must connect the information it stores to what is being created outside the library.

A nice change from a text-based lecture, Thomas infused her lecture with photographs of library architecture and rare books. Her words were infused with a sweet reverence for what she refered to as the "power of the artifact." The lecture was ended by bringing Anne d'Harnoncourt, director of the Philadelphia Museum of Art, to tears by lauding the newly redesigned library in the newly opened Perelman Building of the PMA. A touching and soothing experience, in praise of libraries of the past, and in argument, gentle and hopeful, for the libraries of the future.

Monday, October 29, 2007

Non-use of ContentDM and other sites

One software that we are looking at is ContentDM, which is a great software for accessioning and displaying digital assets. It is clear, easy to use, and can pretty much work right out of the box. I have yet to come across any librarian who really dislikes the software. The one comment that I have heard, however, is that no one really uses the site. So why go through the effort of digitization if no one is really using the software? Digitization is anything but cheap -- Between licenses and staff, a digitization project can easily cost tens of thousands of dollars.

Cornell, too, published a report evaluating the non-use of D-Space at their university. The whole report can be found at http://www.dlib.org/dlib/march07/davis/03davis.html. Part of the reason for non-use is that each discipline already has mechanisms for publication of materials in place and that their DSpace is additional, operating outside the sphere of traditional avenues.

Librarians are dreaming up these wonderful digital repositories, and companies are creating awesome software packages to host them, but is it all just Library Science laboratory work? How can digital initiatives be made more relevant and integrative into the academic sphere? Blackboard and other class content management and presentation software systems are thriving, because students and faculty have to go to them for class materials. It almost makes sense to grow a repository as attached to Blackboard, as people are already there.

I see now why ArtStor has taken off so well in the academic art realm -- Faculty can direct their students to the repository, add to the population of the image collection, sort and create their presentations and pretty much base classes out of there. I don't know if ContentDM can offer similar flexibility, or if the collections that we are speaking of digitizing have as much relevance to the coursework as would images of art for art history classes.

But it could.

And with careful planning, maybe it will?

Wednesday, October 10, 2007

Indiana University, My Heroes!

This summer I attended the Visual Resources Assoc.'s Summer Educational Institute for Visual Resources and Image Management (or something along those lines)...

That's how I found out about the Digital Library Program at IU. I just wanted to give them some props, as they do a great job of sharing information about how/why they create their collections to look and function as they do. For example, the Cushman Photo collection (which is in the links of this page) includes a history of the project, including proposals, tech specs and rationales. I think that's very awesome -- helping others help themselves through knowledge sharing. More info can be found at: http://www.dlib.indiana.edu/

A Decision, I believe...

So Fedora may have gotten the best of me. I don't think that I want to pursue an open source solution me, by my lonesome, as my one mind blowing project. I really wanted to, in a way, to make myself feel smart -- Now I'm feeling like it would be an overwhelming decision. So... We may go with ContentDM for images.

My hesitations with ContentDM or any proprietary software:

1) Lack of ability to customize and to integrate into the library's website
2) Cost of updates / Inability to stay up with updates, if cost prohibitive
3) Creating a "cookie cutter" respository

I will be learning more about ContentDM next month at Palinet. My hypothesis is:
ContentDM will prove to be much more intuitive and easier to work with out of the box than an open source system. ContentDM will be less customizable and harder to integrate into the library's website, but may be worth the sacrifice, due to the lack of labor that the program will require to set up. ContentDM is making strides in abililty to handle text-based assets, but will continue to be outshined by DSpace or other institutional academic respositories.

I want to keep my hand in open source work to some degree so maybe we will look at implementing DSpace or EPrints for an academic respository of research and writing. The open source community is hard at work deploying cutting edge technologies and programming to make collections more accessible - I think it would be a good experience to have to do some of the original programming myself, and not rely completely on tech support and help lines to get things done...

Thursday, October 4, 2007

Archival Certification

Yesterday I found out that I passed the examination to become a Certified Archivist, which means I am now a member of the Academy of Certified Archivists. I find preservation to be a very important component to digital repository building, and I'm excited to become part of the greater good that is doing work for digital archiving.

Oh yeah, and now I can add the initials CA after my name if I wish - That's pretty cool.

I'll add more info about certification and archiving as it relates to digital asset management as this blog develops.

Wednesday, October 3, 2007

Note to self

Dual resolution images are somewhat of a misnomer if you are going to be using the small image for a thumbnail for a collection. I should make sure to figure out what the pixel width x height @ what dpi the images are displayed for optimum output...

Tuesday, October 2, 2007

Standardizing vocabulary in Dublin Core

For most systems, Dublin Core(DC) is the lowest common denomenator metadata. Fedora automatically creates DC metadata that can be edited. The program verifies that it is properly constructed DC, otherwise it will not accept the changes. When creating a basic collection based on the University of Hull's documentation, the DC values that need to be present are DC:TITLE, DC:IDENTIFIER (which is generated automatically as the unique identifier [or PID, as it is referred to in Fedora), and DC:DESCRIPTION. What I am going to need to tackle very soon is standardizing what is acceptable entries in each field and notate that in a reference guide -- because I know I can't remember what I just entered for the item before, let alone for the previous collections. Also, for a description, what should be the standard for letting know what an object is? Is it necessary to say that something is a photograph, or will that be inferred? Is there added-value to describing something as a "Photograph of Bibles in a Custom Wood case" versus just saying "Bibles in a Custom Wood case"?

Hmmn...

A new hat...

I have been working for the past couple of weeks with the Flexible Extensible Digital Object Repository Architecture (FEDORA) which was created by Cornell and UVA and can be downloaded from here: http://www.fedora-commons.org/ It is an open source software for housing institutional collections. I've been hitting my head trying to set up collections and test out this software, but thanks to the University of Hull's documentation of developing their RepoMMan project, I have found many helpful tutorials. I can't help but smile at the project's name, too, given that my husband and I just watched the very campy movie from the 80s "Repo Man" just a couple of months back -- Could that be more than a coincidence and old school punk rockers are at the helm? http://www.hull.ac.uk/esig/repomman/

Right now I've been working on creating implicit and explicit collections. That's what they're really called, which resulted in the collection that I made of my puppy's pictures becoming displayed as "Bichon Frise pictures - Explicit." That will be changed before I run the beta evaluation...

I am not sure if we are going to go with Fedora -- It is very coding extensive, and from where I'm sitting right now, I am the only staffer working on this project. My plan is to work with Fedora for one month, then I get to go to Palinet and learn about ContentDM. I will take each as a representative of an open source vs. turn key DAM and decide which direction I would like to immediately go in for beta testing.

At bare minimum, I am getting the opportunity to 1) visit many online institutional respoitories and see what I like and what they are using 2) Getting to intimately understand Dublin Core, which will most likely be applicable to all future digital asset management work that I will be doing. I quickly am seeing the need for developing standards for controlling what values are acceptable in each Dublin Core field, but that will be one step beyond where I'm at right now...

Building DAMs and other digital constructions