All posts by A Piece of History

Completing the Project

For our project and our contract, we selected four of the many scrapbooks held by the University of Mary Washington Special Collections. My assigned job was originally to transcribe the written portions of the scrapbooks and to use OCR software to transcribe typed documents within them.  The biggest change between the contract and what I actually did is the omission of the OCR program.  This was mainly because when I did try running OCR, I could not get it to capture the information the way I wanted; instead, it would grab portions of the text out of order.  Also, when I tried to break it up into very small pieces that would copy what I wanted, the text (due to the age of the documents) was filled with errors that I had to correct.  This is not so much due to the program being faulty, it just did the best it could with what it could decipher from old and faded ink.

That said, I decided that I would just hand transcribe the documents.  To do this, I had to download the documents and transcribe them into a Word document.  Once Ellen and Laura-Michal were able to get our website up and running, I then had to copy and past the transcriptions onto the pages for individual items.  For the first few scrapbooks, I had no problem getting my transcriptions done on time. I must admit later in the semester it did become a little more difficult to get them done, but I was still able to finish them in a modest amount of time.  In periods between scrapbooks, I was able to help Ellen with inputting metadata for one of the scrapbooks after she and Laura-Michal had issues with technology in getting our website put up. Outside of that, my only job was the transcriptions and that job did take me through the end of the semester.  Based on my assigned task, I would argue that, although my timing was a little late on some, I did effectively complete my part of the project as designated by the contract.

As a whole, I think our group did a very good job executing and completing our assignment. Although we were not able to work on additional scrapbooks like we originally wanted to, but for the scrapbooks that we did work on, we did very well getting them done on time.  It is unfortunate that we were not able to use Omeka to its fullest potential, but now at least we know that to use Omeka we should first take a computer science class.  Although our site might not make the most use of Omeka, based on the main requirements I would argue that we did complete our job based on our contract.

Working on Omeka

Well, I am starting to put my transcriptions onto our Omeka site. I have to say, its great to see the start of an end result. I’ve completely uploaded the transcriptions for our first scrapbook (with the exception of page 7 due to a tiny technological hiccup) and will start on the second scrapbook soon. In other news, I’ve made great headway into the fourth scrapbook, despite my horrible timing schedule this last week or so. I have finished the written transcription and should be done with the typed transcription soon. As for the third scrapbook, while the written transcription is uploaded, I have gotten assistance from Alex for the typed transcription and will post that just as soon as he can get it to me and I can look it over. Our site is finally starting to come together and so is all of my tedious transcription work, which is good to see. I can’t wait to see what the final product will look like.

Heading on from Home Economics

Well the home economics scrapbooks were fun, but I’m ready to move on to the Young Republicans scrapbooks. I have completely finished and uploaded the second scrapbook (yay!) and am gearing up to start on the first YR book. At a quick glance, this one might be a little more challenging, with multiple entries some of which are splashed onto the page with no obvious or direct rhyme or reason. But, what’s a history class without a little challenge? I’m glad to be done with half of the scrapbooks and am looking forward to starting the second half.

Resume

So here is the link to my resume page. I just added it as a page to my main website, along with another page for my portfolio. Because I unfortunately don’t have too much experience as of yet, my pages are both somewhat bland. This was mostly by choice. I wanted to be able to have pages that I could build on in the future and tidy up when I did get more experience and got out into the job hunting world.

Digital Footprints and Modern Education

The three articles I selected varied in their topics. First, I read Will Richardson’s “Footprints in the Digital Age” in which he makes the case that within the world of Google and digital identities, the education of children should be changed in order that they can fully understand how to use digital media and forums in a way that is both meaningful and understandable.  He points out that networking, whether through websites, blogs, or something as simple as a Facebook page, has been revolutionized in the digital age and everyone, even children, should be taught to understand that fact.  The second article, though relatively small in comparison, was Seth Godin’s “Personal branding in the age of Google.”  Despite its short length, it packs a hefty lesson.  Godin tells the story of a woman searching for a housekeeper through Craigslist.  Upon getting three resumes, she looked the applicants up online only to find out that revealed (and this is what I’m assuming was her reaction) a drunk who’s main hobby is binge drinking; a lazy artist who doesn’t really want to work and will only do so until they can sell some of their work; and an ex-convict who had been arrested for shoplifting.  The third source I looked at was “Digital Tattoo.” This source was a website, but it was well made and is broken down into sections of help guides and advice based on what you’re looking to learn about.  Anything from protections on the web either from scammers or bullies to information about publishing work online, its really an interesting guide for anyone looking to start their digital footprint.  So, out of all of these articles, I have learned many tips and lessons, but the five major ones are:

1) Learn, understand, and take advantage of the wide range of networking options afforded by the digital world.

2) Nothing goes unnoticed and nothing is forgotten; if you put a video up of you chugging five beers in a row and your cousin’s wedding 10 years ago when you were 16, chances are a potential employer will find it.

3) When building a website, organization is key and the simpler the layout, the better. This one’s a bit different, but I really did like the layout of the “Digital Tattoo” website.

4) Even if you think something you put on the internet, either good or bad, is insignificant, it doesn’t take much for someone to find it and make it significant. Whether a desire to help those in need or a phrase that is offensive, what you put on line has the chance to be enlarged far beyond what you expect.

5) The internet can be filled with wonderful information and can be the key to great networks and friendships, but it can also be dangerous. With scammers looking to infect your computer and cyberbullies becoming the bane of parents’ existence, it is important to know how to spot these more dangerous components and either avoid them, or stop them.

Overall, the articles were interesting to read and taught me a lot about just how important a digital footprint is.

OCR: Helpful Tool or Student’s Worst Nightmare?

For my part of the scrapbook project, I have been assigned transcription and running OCR on documents to make them searchable and (to use a lovely term of my own invention) copy/paste-able. Now, the concept of OCR in nutshell is for the program to scan through a PDF, find copyable information, and allow the user to then copy and paste that information, thus making my job as transcriber easier. Or so I thought. I have spend maybe half an hour with this technology and I already and frustrated. I have the concept of getting it to work just fine, its the outcome that is getting on my nerves, regrettably.  Now, no technology is perfect and it never will be. However, I have come to the sad conclusion that I could transcribe the documents by hand faster than it takes me to copy and paste the information and go through to fix all the errors.  Now, my frustration does not come without reason, and I am going to put my issues out there in the hopes that some other poor soul is also working with OCR and can help me out. Problem #1: Many of my documents are actually newspaper articles that the makers of the Pi Delta Gamma scrapbook placed in it to mark important milestones of the year. Now, the problem, is that many of these newspaper articles use the classic column format, breaking the story up into multiple narrow columns that span two or three to the story rather than a straight paragraph format.  On the side of OCR, the problem is that you cannot pick and choose which parts of the document you highlight. The program, once you start highlighting a word, will block off a chunk of the document. However, it cannot recognize this column format and spans the box all the way across, segmenting and dividing three columns’ worth of information. Problem #2: When I do successfully transplant information in the tiny portions I’m subjected to because of the above problem, I think at most it is able to squeeze out an “at” or a “the” somewhere that doesn’t have a mistake. Beyond that, it for some reason scrambles letters within a word, scrambles letters between words, inserts symbols that I’m not 100% sure how the program even came up with them, or just misses words entirely. There are a couple little problems besides these two, but these are the bulk of my frustrations. If anyone has any tips they would like to share, please do (seriously, please do). And if anyone has already posted information, please let me know, I scanned through the class page and didn’t see anything.

My First Transcription

Our project has finally gotten on its way. I have just finished transcribing our first scrapbook, for Pi Delta Gamma, and I have to say it was very interesting being able to read all that was written throughout it. All that I have left to do is run OCR on the scrapbook so that it can be searchable and I am going to plan on doing that in the next few days. Hopefully by the time I can run OCR on this first scrapbook I’ll be able to get in a groove enough that it won’t be difficult at all. We’ll see how everything goes though.

Class Meeting – Feb 20

Today in class we met and discussed some of the technical aspects of our project; we also took time to discuss where we are in our schedule and what is due next. We discussed more specifically how I will be carrying out my transcriptions, which I am going to start today. Ellen and Laura-Michael went to DTLT yesterday to talk about Omeka as well as any plug-ins that may be needed and relayed that information to the group. In discussing this information, we learned that we need to gather more information about OCR, as it requires a pdf as opposed to jpg files in order to work; we are going to look into Omeka plug-ins that will essentially fix this problem. We discussed our 10 minute progress presentation and have decided that we will be working building the presentation on Tuesday. We discussed the deadlines for next week and what we each will be doing between now and then. Overall, it was a productive meeting and we are happy with our current progress.

The Wonders of Wikipedia

When I look at Wikipedia, I honestly can’t help thinking of every past teacher of my from middle school onward telling us that Wikipedia can’t be trusted and so can’t be used as a resource. While, like any good student, I still shudder at the idea of using Wikipedia as a useful and legitimate source, it has been quite intriguing learning how Wikipedia works and why it is (slightly) more reliable than many students or teachers believe. While there is and always will be the threat of some teenager looking for something to do scrambling onto the first Wikipedia page he sees and throwing in ridiculously false information just for the fun of it, I think what the administrators do to control the quality of information is a force to be reckoned with. I’ve always known that anyone with Internet access and a keyboard could wreak havoc on the site, I can say that I’ve never known what role administrators and random readers and scholars do to try and combat these Internet trolls. Looking at the history pages, while confusing to the inexperienced reader, gives a great understanding of just how many times a page is edited in the course of its existence. One of the pages I chose to look at was George B. McClellan; I figured his role as a major historical figure would make him a target for trolls and a spotlight for those trying to defend against them. Not surprisingly, I was right. There have literally been hundreds of edits to the page since its creation in 2005. On the surface, the history page is gibberish to me, but it seems obvious that although much of the early history would be attributed to the gradual creation of the page, a good deal of it is surely the work of admins trying to protect the integrity of the page.

Although in my opinion Wikipedia should never be considered a completely credible source, I do admire the attempt at many of these administrators and readers who try to make the information available as accurate as possible. Looking at the vast amount of history that can go into one page, it is obvious that there are people out there who do want to combat the inaccuracies and make pages that (theoretically) could be used by a student or anyone else looking for information. Personally, I think their attempt at the very least makes Wikipedia a good source to get information if you’re just looking around for introductory information on a subject. Who knows, maybe there could come a day when Wikipedia could be considered as reliable as the Encyclopedia Britannica.

It’s a Start

Over the past week, my group has been finishing up our contract and planning out when we will plan for things to be finished. We’ve decided that my job will be to transcribe the sections of the scrapbooks which are handwritten and to run OCR on them to make the content searchable. We have staggered our deadlines so that I will have transcription on the scrapbooks completed a week or two after they have been scanned into a PDF file. For our advertisement and audience, since we are working with the UMW Special Collections, we are planning to work with them and possibly have some of our work showcased in the new Convergence Center when it opens, though that plan is still in its “maybe” stage. At this point, it seems like our target audience will be campus alumni or anyone searching for information about the campus in the past. That being said, we are going to try to make the website as interactive and interesting as possible while still providing enough background information about the school and the country in the 1960s that anyone looking for concrete information for a project or paper can still use our website.  I am looking forward to working up close with these scrapbooks and seeing what becomes of our project.