Update-Book 2 Scanned and Uploaded

I just uploaded the rest of the second book today (Young Republicans 1967), and got over half of the third book (Home Ec 1964-67) scanned today. I have also uploaded his data on the second book to dropbox. We should be done with book 3 soon after we get back from break. This puts us right on schedule with our milestones if not slightly ahead.

OCR: Helpful Tool or Student’s Worst Nightmare?

For my part of the scrapbook project, I have been assigned transcription and running OCR on documents to make them searchable and (to use a lovely term of my own invention) copy/paste-able. Now, the concept of OCR in nutshell is for the program to scan through a PDF, find copyable information, and allow the user to then copy and paste that information, thus making my job as transcriber easier. Or so I thought. I have spend maybe half an hour with this technology and I already and frustrated. I have the concept of getting it to work just fine, its the outcome that is getting on my nerves, regrettably.  Now, no technology is perfect and it never will be. However, I have come to the sad conclusion that I could transcribe the documents by hand faster than it takes me to copy and paste the information and go through to fix all the errors.  Now, my frustration does not come without reason, and I am going to put my issues out there in the hopes that some other poor soul is also working with OCR and can help me out. Problem #1: Many of my documents are actually newspaper articles that the makers of the Pi Delta Gamma scrapbook placed in it to mark important milestones of the year. Now, the problem, is that many of these newspaper articles use the classic column format, breaking the story up into multiple narrow columns that span two or three to the story rather than a straight paragraph format.  On the side of OCR, the problem is that you cannot pick and choose which parts of the document you highlight. The program, once you start highlighting a word, will block off a chunk of the document. However, it cannot recognize this column format and spans the box all the way across, segmenting and dividing three columns’ worth of information. Problem #2: When I do successfully transplant information in the tiny portions I’m subjected to because of the above problem, I think at most it is able to squeeze out an “at” or a “the” somewhere that doesn’t have a mistake. Beyond that, it for some reason scrambles letters within a word, scrambles letters between words, inserts symbols that I’m not 100% sure how the program even came up with them, or just misses words entirely. There are a couple little problems besides these two, but these are the bulk of my frustrations. If anyone has any tips they would like to share, please do (seriously, please do). And if anyone has already posted information, please let me know, I scanned through the class page and didn’t see anything.

Making OCR Do Your Transcriptions (Or At Least Parts of Them)

For our project we put Jess in charge of transcribing the scrapbook data, which is made up of both handwritten information and print materials such as newspaper clippings. We don’t want to make Jess re-type the already typed material, so she’s going to use OCR to grab that text and incorporate it into her manual transcriptions. Some of the typed stuff is really short and probably not worth using this process, but a few pages are almost entirely printed, making this a good option. Here’s how to make Adobe do some work for you.

 

Convert your image into a PDF. In our case we converted a copy of the image, since we still need the image to upload later.

Select the “Tools” button from the upper right corner of the viewing window.

Select the “Recognize Text” button from the Tools drop down menu, then the “In This File” button.

A window will open asking about sample size and language- these settings should be fine in their default positions, so just click the OK button.

The program will run, and then you need to save the file again to make sure you capture this info.

In the Recognize Text menu, select the “Find First Suspect” button and from the window that pops up select the text sample offered to you.

This will allow you to edit the printed text in the pdf file. You can’t see this data normally, because it’s hidden, but here you can make changes. The main purpose of this tool is to allow you to correct any errors that the OCR program made in reading the text, but here you want to copy the whole text and simply paste it into a word document or wherever you want to use this text.

Check for errors, because the OCR program isn’t infallible, but for the most part, the text should be pretty accurate.

Then and Now Update

This week our group met during our regular class time to discuss our website theme, our individual assignments, and our ten minute group presentation on Thursday.

Luckily, thanks to Jess’s awesome selection skills, we have found a theme!  A major plus with it is that it’s FREE and adjusts for all devices (iPad, iPhone, tablets, etc.) so people won’t have issues accessing it based on what they’re accessing it from.  We went ahead and installed it on our site, looked at how it interacted with the templates we have already set up, and talked about what minor changed we wanted to make to it.  Another big discussion was about our main photo that appears at the top of the page throughout our site.  We want it to be one of the more popular locations on campus (i.e., Lee Hall, Ball Circle, Campus Walk, Monroe Hall) and we would like to make it a “blended” photo because it is the photo that people see when they first visit our page and we want to make the page interesting and appealing.

Not fantastic quality, but we found our shot for Seacobeck! Now to just get the window open and screen moved...

Not fantastic quality, but we found our shot for Seacobeck! Now to just get the window open and screen moved…

Next we reviewed what everyone’s individual assignments are–mainly photo distribution–and discussed any questions or concern we had.  We realized that we didn’t have a relatively old “then” photo for Seacobeck (oops!), so we went on the Digital Archives page, located one, and decided to look for the angle of the shot we needed to capture Seacobeck in the same manner it was captured in the “then” photo.  Since the photo was clearly taken from Monroe (or relatively close to it), we headed upstairs to look for a window take our photograph out of.  Well, we started with the fourth floor–major fail.  We didn’t know they didn’t have windows up there.  So we headed down to the third floor, met several nice professors who we happy to allow us to look out of their office windows for a good shot, and ultimately found our perfect location: the geology lab on the third floor.

After finding our location, we headed back to the classroom to go over our game plan for our ten minute presentation on Friday.  We assigned a section of the presentation to each person and talked about what we want to make sure we cover.

Overall, we had a pretty successful meeting! I’m looking forward to what is to come.

Meet and Greet (Week 7 Update)

Earlier today, Jack, Leah, Candice, and I met in lieu of our class to discuss our progress so far and to put together our first weekly progress report.  We are pleased to say that we had no trouble at all figuring out what to discuss!  I mean, we did in the sense of exactly what to include and 10 minutes isn’t actually very long, but we had no problem with information, since we have accomplished so much already!  Our progress report might be a little different from the other groups since we’re focusing more on research and narratives than digitization, but we will still be incorporating some digitizing into our final website and project.

We also decided on some pretty FANTABULOUS WordPress themes!  Sorry for the crazy expression, but the themes we found for both our site and the overarching Century America site are pretty awesome!  Both are free (which is a plus) and extremely customizable, which is exactly what we want.  Hopefully it’ll look even more incredible once we add in all of our information and sources.

We also checked today to see where we were in regards to the milestones we had set for ourselves – the good new is, we are right on task!  We have messed around with our site, as well as decided on a theme, and have finalized the information that we are going to include and what sources need to be digitized (with permission) for the final site.  Hopefully we will be able to get permission soon from both the CRHC and the Library of Virginia since our goal is to have all images and documents digitized the week after spring break.

All in all, I would say we had a brilliant brainstorming/organizing/work session today, and are moving along swimmingly in regards to what we need accomplished.  Our next huge goals are to digitize sources and to begin writing blurbs for the individual pages we are responsible since we will need plenty of time to edit these tricky wordings.  Hopefully all will go well, and that we can keep going strong following spring break!

Belated Wikipedia Post

I have no idea why this post decided it didn’t want to get published, but I blame Willard Hall’s internet. Anyway, sorry it’s late.

I love Wikipedia. I use it as a resource whenever I am not sure what something is or if I want a quick introduction to a subject before exploring it more thoroughly. I especially like to look at their references for sources for a paper or project. It was really interesting to go to several pages and look at the history. The page I looked at was Henry VIII, which was once a featured page on Wikipedia because of it’s acuracy. What I loved most about the history was all of the people removing statements with the comment “Ancestry.com does not count as a reliable resource.” Other comments reminded posters that they needed to site scholarly sources, such as books and articles. We talked about how Wikipedia has an idealistic hope that Wikipedia will be self checked, I think there is some truth to it rather than idealism. People are interested in keeping Wikipedia honest and as truthful as possible. I do disagree that Wikipedia is objective. No matter how you word something, there will be an interpretation.

Even though I am fond of Wikipedia and like that it is free and accessible to a lot of people, I still would not use it as a credible source. It should be read with caution and should only be the starting place for further research.

Group Meeting Today: Feb 25th

Today, my group met and discussed our project. First, we did some emailing for Friday so that we can go scan some objects at the museum. We also are hoping to get some interviews done on Friday as well. After we did that, we talked about our presentation and started working on it. During our conversation, Victoria got the Sketchfab plug-in to work on our site and uploaded some of the scans so that we could demonstrate some of the work we have been doing.

We also had to work on getting the files from the James Monroe Museum so we could have the starting point information for our research. Hopefully Jarod will email those to us soon, but if not we can at least do some research about Monroe and the time frame, but we would like to have more specific information on the objects themselves.

Group Meeting 2-25-14

In group today we the basic layout of our presentation. We also touched upon uploading our pictures for the presentation. We have worked through the intiatial problems of uploading pictures to google. We have a vision of where we want the presention to go, and we discussed the updates on OCR (Jess and Laura). Further discussion of Alex’s progress on the Bibliography. A very productive meeting, and the group progress seems to be right on track.

The Greatness that is Wikipedia

One of my favorite daily activities is actually to search through various historical Wikipedia pages and browse the content history. It is amazing to watch how a page truly transforms overtime and matures into a great resource for fast information. I easily use Wikipedia ten times a day, whether it be to look up historical information or what movies Morgan Freeman is currently working on, Wikipedia is a great source for such things. Though many educators will bar you from using Wikipedia do not doubt its great powers. It is the ultimate starting point for any project. You can see how much or how little there is on a topic. The less there is the more original your topic probably will be. Additionally, looking at the history and discussions on a Wikipedia page will allow you to see how the pages came about and who and why people are changing them on a daily, monthly, or yearly basis.

It will be interesting in the future to see a senior ten or twenties years from now do their thesis on how Wikipedia changed the way we gathered, collected, and shared information. We must remember that Wikipedia is not funded privately or through the government, but it is run by the public. People have come together for the greater good to collect and share free information for any eyes to gaze upon and learn from. When you come upon a discussion page on Wikipedia you can see the collectiveness of several individuals trying to make a page more accurate and usable for viewers. They collaborate everyday to find the best and most reliable resources for each page. This collaboration shows how Wikipedia depicts a culture of open information and sharing. Though the occasional troll comes along and ruins a page for a few minutes, there is always someone who responds and correctly fixes the information. These discussion and history pages show how we view information in our culture; it is vital and import to everyday life. Anytime you come across and go on a Wikipedia binge like I always do, I suggest taking a look at the discussions and history to see how that page came to life and how it transformed over time to provide you with the most adequate information.