Visualize History

How it all went down.

Getting data from the internet

I’ve been telling all my friends about my recent issues with gathering data in an appropriate way.  Each and every one is shocked.  They all know how to google some fact and they also know the internet knows everything there is to know about anything.  I try to explain the problem but have some trouble.  So I’ll do it in writing instead.

The basic problem is that the internet is too powerful.  You can express whatever you want, but, more importantly for us, you can express it however you want.  You can lay things out differently, use different fonts, change the order of things.  That doesn’t even include animations, flash, images and crazy tricks that I barely even understand.

I’ll give you an example from my new data sources.  There are two sources I’m using that both reference the assassination of Abraham Lincoln.  I will paste the code.  Don’t worry if you don’t understand.  Do notice how obviously different the two are from each other.

Compare:
http://valley.vcdh.virginia.edu/reference/timelines/timeline1865.html

   1: <tr bgcolor="#cccccc">
   2:     <td><b>04-14-1865</b></td>
   3:     <td align="left"></td>
   4:     <td align="left"></td>
   5:     <td align="left"></td>
   6:     <td align="left"></td>
   7:     <td align="left">Lincoln assassinated.  Assasination attempt against Seward &amp; his son. The Stars &amp; Stripes raised over Ft. Sumter.</td>
   8: </tr>

with:

http://www.historyplace.com/civilwar/index.html

   1: <center>
   2:  
   3:   <p><a name="shot" _base_href="http://www.historyplace.com/civilwar/"></a><b><font color="#0000a0" size="+1">Lincoln Shot</font></b> 
   4:   </p>
   5: </center>
   6:  
   7: <p><b><font color="#ff0000">April 14, 1865</font></b> - The Stars and Stripes
   8: is ceremoniously raised over Fort Sumter. That night, Lincoln and his wife
   9: Mary see the play "Our American Cousin" at Ford's Theater. At
  10: 10:13 p.m., during the third act of the play, John Wilkes Booth shoots
  11: the president in the head. Doctors attend to the president in the theater
  12: then move him to a house across the street. He never regains consciousness.
  13: </p>

There are many complicated and less complicated differences, but its obvious to even the most untrained eye that these two pages will require two difference programs.  In fact, what I’ve found is that for a project like this one, one that requires so much detail in its information, cannot use a general purpose algorithm to understand the vast information available on the internet.

So, you might ask, what am I doing?  Well, I’m adding the data.  One.  Site.  At.  A.  Time.

No comments yet. Be the first.

Leave a reply