Alright -- Onto Day 2 of my notes from the Stewardship of Digital Assets workshop!
Dr. Katherine Skinner from Emory University and the MetaArchive spoke for 3/4 of the day. Her interest in digital preservation has risen from her doctoral research on emergent fields. While digital preservation is a few steps beyond just stumbling in the dark at this point, it is still an emergent field and we should "get used to the discomfort" of what goes along with being on the bleeding edge.
In her definition, Skinner describes digital preservation as the "management and maintenance of digital information over a long period of time." How long, she says, is unknown at this point, but its sure is longer than many of our access systems will be around, which is currently running at about 3 years per system. This is why adopting recognized standards are important -- for interoperability -- between systems and between collaborating institutions.
Skinner's largest project of the last few years has been in developing the MetaArchive, which was in turn developed out of the Lots of Copies Keeps Stuff Safe (LOCKSS) framework created by Stanford. The idea behind LOCKSS and the MetaArchive is that a minimum of 6 copies of information is stored across a large geographic space (could be even across several continents). Automated checks continually verify the accuracy and completeness of data. If anything happens to a single file in one of the repositories then the other 5 check to make sure their data is complete, and replace the bad or corrupt file in the 6th location. This idea of spreading information across a great geographic area could help restore information in the case of a major disaster. An example that Skinner gave was NPR content that was destroyed by Hurricane Katrina was able to be restored from the mirrored content kept at Emory University.
The biggest "AHA!" that came out of the day for me was that if you choose your access tool too early in the digital library design process, then your collection ends up becoming bound by the tool. By examining and fully understanding your collection, you can choose a tool that allows for curation of the materials that complements the care received by the analog materials. Process for selection should be identifying materials, seeing what information is available (ie. cataloging record) and then choose the metadata. Only when this process is complete should the tool be selected. I will have more thoughts on how this will change my approach to developing my repository in a later post.
The presentation was great, and the instructors knowledgable and friendly. They realize fully that as the field is new, no one really truly knows what they are doing for the long haul, and the more that we can help each other out, the more successful the digital preservation program will be.
Thursday, November 29, 2007
The internet is running out of space!
"Consumer demand for bandwidth could see the internet running out of capacity as early as 2010, a new study warns."
http://news.bbc.co.uk/2/hi/technology/7103426.stm
I can't stop thinking about this study that reports that the internet could run out of bandwidth by 2010, and wonder how we -- internet users, content creators, and/or librarians -- could help solve this problem. Digital information is "invisible," or so it seems... You upload to the internet and its physically away from you -- Yet accessible as need be. But how much stuff on the internet is junk? I know that I personally have all the images and pages from my old website still hanging out on an FTP site somewhere -- And those images were probably unnecessarily large, as that was before I know anything about correct imaging for projects and webpages. If everyone just cleaned out their Flickr accounts, or deleted old webpages, could we then get another year or two out of the internet at its current capacity? By realizing that data truly are physical objects, perhaps we feel a greater sense of responsibility to the upkeep and care.
Its physics, its kind of string theory, I know, but as I sit here right now I am surrounded by the internet. I can't see it or feel it, but its here, physical, voluminous and completely disorganized. A Tech Director at a former job of mine always wanted to take a filing cabinet, cram if full of paper in dissarray, with materials falling out of it and jammed into the drawers. "This is what the server actually looks like," would be the message. I am sure that the same analogy would apply to the internet.
http://news.bbc.co.uk/2/hi/technology/7103426.stm
I can't stop thinking about this study that reports that the internet could run out of bandwidth by 2010, and wonder how we -- internet users, content creators, and/or librarians -- could help solve this problem. Digital information is "invisible," or so it seems... You upload to the internet and its physically away from you -- Yet accessible as need be. But how much stuff on the internet is junk? I know that I personally have all the images and pages from my old website still hanging out on an FTP site somewhere -- And those images were probably unnecessarily large, as that was before I know anything about correct imaging for projects and webpages. If everyone just cleaned out their Flickr accounts, or deleted old webpages, could we then get another year or two out of the internet at its current capacity? By realizing that data truly are physical objects, perhaps we feel a greater sense of responsibility to the upkeep and care.
Its physics, its kind of string theory, I know, but as I sit here right now I am surrounded by the internet. I can't see it or feel it, but its here, physical, voluminous and completely disorganized. A Tech Director at a former job of mine always wanted to take a filing cabinet, cram if full of paper in dissarray, with materials falling out of it and jammed into the drawers. "This is what the server actually looks like," would be the message. I am sure that the same analogy would apply to the internet.
Labels:
BBC,
digital information,
digital preservation,
internet,
report,
storage
Tuesday, November 27, 2007
Stewardship of Digital Assets @ Palinet
On November 14 and 15th I had the opportunity to attend the first in a series of workshops on digital preservation. Developed by the North East Document Conservation Center, "Stewardship of Digital Assets" took digital projects to the next level -- No longer we were just talking about just digitizing materials, but we are beginning to look at sustaining the collections.
It's taken me about 2 weeks to get around to writing about this workshop, and I don't think that I would have been able to tackle any sooner. What a dense presentation! The faculty's experience was diverse and well reaching -- from working with the National Archives (NARA), the Research Libraries Group (RLG), Lots of Copies Keeps Stuff Safe (LOCKSS) framework developed at Stanford, the powerhouse Online Computer Learning Center (OCLC), the Pennsylvania Library Network (Palinet), Institute of Museum and Library Science (IMLS), and National Information Standards Organization (NISO), and that's just to name a few of the distinguished organizations that these four people worked for, developing standards and managing digital content. Liz Bischoff, Tom Clareson, Robin Dale, and Dr. Katherine Skinner were kind enough to share with the forty attendees the results of the rich work that they have been doing in encouragement of collaboration.
Overall, the librarians and archivists in attendence are newbies to the field, as 1/3 of participants have yet to begin digitizing anything. That is a welcome number to hear -- as it means my institution is not falling behind of the herd. What I have found happened to many early digitization projects, anyway, is that objects were not created at a high enough resolution to warrant digital preservation, and some projects may have to rescan to render objects to meet new standards. The prevailing wind for digital projects these days is to scan BIG (and I really mean BIG -- as big as your institution can afford to maintain) ie. full size @ 300 - 600 dpi. Save that BIG scan as an unweildy tiff, and then make smaller derivative jpegs from that file for use copies.
But I digress -- This workshop was not aimed at the creation of these files -- Moreover it addressed what you do with the files now that you have them. Digitizing costs lots of money -- Improper care of files can lead to obselence or corrpuption of materials, rendering all your hard work wasted.
The workshop began with a lesson in assessment, simply meaning that if you can't articulate your institution's needs, then you can't apply standards. Not everything can be preserved all the time, and what becomes preserved should be a conscious choice, not a passive decision out of laziness. Whether it be analog or digital, a preservation program costs money to upkeep, and lots of it -- Why save junk? If the file is in a lesser format at alower quality level then the institution should make less of a commitment to preservation.
A fundamental shift is happening in the digitization world -- no longer are digital projects finite, but morever regarded as a cradle to cradle process. To ensure the integrity and authenticity of a digital document over time, digital objects needs to have a sense of curation present -- one that guarantees coceptually that the information comes out the same way that it went in to the system. This can be accomplished through preservation metadata that tracks the lifecycle of the object.
Reshaping digital projects to digital programs clarifies who is in charge of what tasks, and where and how the information is stored. Many organizations contain disparate silos of information across campus. Let IT run the servers and back everything up. Let the digital repository manage access. That way the organization knows clearly where all the information is, how it is being managed, and by who. Informed individuals make educated decisions, and can also identify potential risks. For example, in just the last few years reccommendations have moved away from storage on CD/DVD to spinning disk, hard drive and tape. Why? CDs aren't scalable, are hard to manage, and have been found to fail -- 15% of all information at Emory University failed when checked on CD -- That may be 15% of ALL digital information. If you can get boxes of CDs out of peoples individual offices and centralized in one preservation office, risk management becomes all the easier.
Grants at this point will not give monies for long term management of objects -- But the language of many grants includes a commitment to the long-term access and preservation of the objects. As institutions complete the digitization project and the grant money runs out, institutions may find the best way to manage preservation is to repurposing time and leveraging the skills of existing staff. The teams that are created through this process should be evenly weighted between curatorial and tech staff -- Curators choose content, techies digitize.
Preservation must be stabilized -- Choose a recognized standard and keep it there, don't unnecessarily migrate information, secure funding and document EVERYTHING.
Yikes, that's already a lot of notes -- I think this will be best broken up into a series of posts.
It's taken me about 2 weeks to get around to writing about this workshop, and I don't think that I would have been able to tackle any sooner. What a dense presentation! The faculty's experience was diverse and well reaching -- from working with the National Archives (NARA), the Research Libraries Group (RLG), Lots of Copies Keeps Stuff Safe (LOCKSS) framework developed at Stanford, the powerhouse Online Computer Learning Center (OCLC), the Pennsylvania Library Network (Palinet), Institute of Museum and Library Science (IMLS), and National Information Standards Organization (NISO), and that's just to name a few of the distinguished organizations that these four people worked for, developing standards and managing digital content. Liz Bischoff, Tom Clareson, Robin Dale, and Dr. Katherine Skinner were kind enough to share with the forty attendees the results of the rich work that they have been doing in encouragement of collaboration.
Overall, the librarians and archivists in attendence are newbies to the field, as 1/3 of participants have yet to begin digitizing anything. That is a welcome number to hear -- as it means my institution is not falling behind of the herd. What I have found happened to many early digitization projects, anyway, is that objects were not created at a high enough resolution to warrant digital preservation, and some projects may have to rescan to render objects to meet new standards. The prevailing wind for digital projects these days is to scan BIG (and I really mean BIG -- as big as your institution can afford to maintain) ie. full size @ 300 - 600 dpi. Save that BIG scan as an unweildy tiff, and then make smaller derivative jpegs from that file for use copies.
But I digress -- This workshop was not aimed at the creation of these files -- Moreover it addressed what you do with the files now that you have them. Digitizing costs lots of money -- Improper care of files can lead to obselence or corrpuption of materials, rendering all your hard work wasted.
The workshop began with a lesson in assessment, simply meaning that if you can't articulate your institution's needs, then you can't apply standards. Not everything can be preserved all the time, and what becomes preserved should be a conscious choice, not a passive decision out of laziness. Whether it be analog or digital, a preservation program costs money to upkeep, and lots of it -- Why save junk? If the file is in a lesser format at alower quality level then the institution should make less of a commitment to preservation.
A fundamental shift is happening in the digitization world -- no longer are digital projects finite, but morever regarded as a cradle to cradle process. To ensure the integrity and authenticity of a digital document over time, digital objects needs to have a sense of curation present -- one that guarantees coceptually that the information comes out the same way that it went in to the system. This can be accomplished through preservation metadata that tracks the lifecycle of the object.
Reshaping digital projects to digital programs clarifies who is in charge of what tasks, and where and how the information is stored. Many organizations contain disparate silos of information across campus. Let IT run the servers and back everything up. Let the digital repository manage access. That way the organization knows clearly where all the information is, how it is being managed, and by who. Informed individuals make educated decisions, and can also identify potential risks. For example, in just the last few years reccommendations have moved away from storage on CD/DVD to spinning disk, hard drive and tape. Why? CDs aren't scalable, are hard to manage, and have been found to fail -- 15% of all information at Emory University failed when checked on CD -- That may be 15% of ALL digital information. If you can get boxes of CDs out of peoples individual offices and centralized in one preservation office, risk management becomes all the easier.
Grants at this point will not give monies for long term management of objects -- But the language of many grants includes a commitment to the long-term access and preservation of the objects. As institutions complete the digitization project and the grant money runs out, institutions may find the best way to manage preservation is to repurposing time and leveraging the skills of existing staff. The teams that are created through this process should be evenly weighted between curatorial and tech staff -- Curators choose content, techies digitize.
Preservation must be stabilized -- Choose a recognized standard and keep it there, don't unnecessarily migrate information, secure funding and document EVERYTHING.
Yikes, that's already a lot of notes -- I think this will be best broken up into a series of posts.
Friday, November 16, 2007
Sarah Thomas at PMA
On Wednesday evening, I attended the Sarah Thomas library lecture at the Philadelphia Museum of Art, "First Sell the First Folio." The title of the lecture was in reference to how, many years ago, the Bodleian Library at Oxford deaccessioned a first folio of Shakespeare's plays, not seeing the potential value of the 1st edition. Thomas used this analogy to make a leap to why the physical existance of libraries is still relevant, and how dismissing libraries as obselete in the digital age can seem as erroneous as a decision to get rid of a first edition of a potential future rare title.
Thomas made the argument that we can't digitize everything, especially if its analog existence doesn't have strict bibliographic control already. Even if it was digitized, there would be no access information. Digitization also means we would need to make choices about what to keep and allow access to -- and if we destroy materials after they are digitized we may be losing the future's equivalent to the First Folio. Likewise, if libraries only exist digitally, then our choices about what to digitize are as weighted as what to purchase, preserve or destroy -- if patrons cannot access the materials it is the same as choosing them to not be in the collection.
She referred to the traditional round reading room as the "center of the universe where one would consult the oracle." As libraries have been replaced as information hub by the internet, the library must shift from being a box of books to a suite of services. To remain relevant, the library must connect the information it stores to what is being created outside the library.
A nice change from a text-based lecture, Thomas infused her lecture with photographs of library architecture and rare books. Her words were infused with a sweet reverence for what she refered to as the "power of the artifact." The lecture was ended by bringing Anne d'Harnoncourt, director of the Philadelphia Museum of Art, to tears by lauding the newly redesigned library in the newly opened Perelman Building of the PMA. A touching and soothing experience, in praise of libraries of the past, and in argument, gentle and hopeful, for the libraries of the future.
Thomas made the argument that we can't digitize everything, especially if its analog existence doesn't have strict bibliographic control already. Even if it was digitized, there would be no access information. Digitization also means we would need to make choices about what to keep and allow access to -- and if we destroy materials after they are digitized we may be losing the future's equivalent to the First Folio. Likewise, if libraries only exist digitally, then our choices about what to digitize are as weighted as what to purchase, preserve or destroy -- if patrons cannot access the materials it is the same as choosing them to not be in the collection.
She referred to the traditional round reading room as the "center of the universe where one would consult the oracle." As libraries have been replaced as information hub by the internet, the library must shift from being a box of books to a suite of services. To remain relevant, the library must connect the information it stores to what is being created outside the library.
A nice change from a text-based lecture, Thomas infused her lecture with photographs of library architecture and rare books. Her words were infused with a sweet reverence for what she refered to as the "power of the artifact." The lecture was ended by bringing Anne d'Harnoncourt, director of the Philadelphia Museum of Art, to tears by lauding the newly redesigned library in the newly opened Perelman Building of the PMA. A touching and soothing experience, in praise of libraries of the past, and in argument, gentle and hopeful, for the libraries of the future.
Labels:
architecture of libraries,
lecture,
Oxford,
PMA,
Sarah Thomas
Subscribe to:
Posts (Atom)