Archive for the 'Uncategorized' Category
A New Chapter
Today I did a dumb thing. I accidentally deleted the blog. Completely. I have backups but they are in an awkward format so I may or may not be able to restore the old posts.
Luckily for me, though, I am writing this post to mark a new chapter in the site. I have been finished with school, and therefore the technical requirements, for quite some time now. I have been a long trip, during which I had plenty of time to reflect on the project, especially on where I want to take it. I have decided that I will be unable to stop myself from working on Visualize History, no matter how hard I try.
Therefore, I have put the site as it stood up for public display at http://proto.visualizehistory.com/?v=alpha. Feel free to visit, play around, or do whatever you want. I won’t be changing this version at all. I consider it to be a highly successful proof of concept - though I may be biased - but nothing more than a proof of concept. I wanted to explain my vision through code, and I think the above version satisfies that goal.
That being said, I have a clear and deep vision of where the project should go. I am starting a job in September so the amount of time I can devote to this passion of mine may be strictly limited. Nonetheless, it is my intention to continue work so long as it does not interfere with my professional responsibilities or my social activities.
One thing that I learned from the project is the value of user input. Shortly I will put up a place where you have input on the site, both in a public and private way. When I get a moment to do that, I will.
If you would like to contact me in the meantime, I think the best way would be through facebook, to prevent spam. So: http://www.facebook.com/profile.php?id=304561.
No commentsHistorical Dates and Date Comparison
Dates present a complex problem for an application like Visualize History. I did not expect them to be this complicated, and now find myself struggling to implement historical dates as they should be.
The problem arises because of the way computers think of dates. To a computer, a date is an exact moment, for instance, 4:22 PM EST on Sunday, April 20. 4:22 EST is not the same as 4:23 EST, nor is it the same as April 20th.
Humans almost never use dates in this way. When I say "what are you doing for July 4th?" I don’t mean to ask what are you doing on July 4th at midnight, but rather, the entire day. Similarly, when I say the Great Depression happened in the 30s, I don’t mean any specific moment during the 30s but rather all 10 years.
The problem manifests itself most obviously when dates are being compared. Imagine a Visualize History session in which you are examining the signing of the Declaration of Independence. The date for that event, as we all know, is July 4, 1776. The computer stores this date as midnight on the 4th. So, following the computer’s way of storing dates, the session might show the signing of the declaration, which it thinks happened at midnight, as happening before John Hancock placed his John Hancock on the bottom of the declaration. Of course no reasonable person would even follow that, much less believe it.
The first solution that comes to mind is, presumably, to specify the date of the Declaration more precisely, perhaps including the exact time of signing. There are two problems with this solution. The first is practical, namely that data is rarely saved so precisely. Mostly people know the day when the Declaration was signed, but people rarely know the time, especially not down to seconds. Secondly, many events are not precise. The Great Depression can be said to have ended at an exact moment, including times, yet we can be fairly certain that some events happened during the Depression and some did not.
The next solution that should come to mind is store historical events as having a start and an end, so that when deciding whether an event happened during the depression, you could ask whether it happened before the end of the depression. This solution is the correct one, in abstract terms; it represents the way humans use dates to discuss history. Unfortunately, it does not lend to an easy or efficient implementation. Most computer languages have built in Date libraries, but none that I know of treat Dates in this way.
My proposed solution is in part based off MySQL and in based of KML files from Google, which both specify a precision along with their dates. For example, I might say that a date is July 4th, 1776, and that for comparison purposes, the year, month and day (but not hour, minute, or second) are significant. Comparing such a Date object would take some mildly complex logic, but with some thought could be made to work as humans think of dates.
If you are not convinced, take an extreme example. If you study evolution (or biblical history, if you prefer), you often encounter dates like 10,000 BC. It is implied that 10,000 BC is not distinct from 10,001 BC or 9,999 BC but is to be compared with 1,000 BC or 100 BC. So 5,000 years is the difference between dates that interests us. Yet now, we the difference between September 10, 2001 and September 12, 2001 as incomprehensibly large. So it’s all relative, I suppose.
JavaScript limits the process even more, since it stores dates as the number of seconds since the Epoch, January 1, 1970. That makes storing dates earlier than about 1900 fairly difficult and dates earlier than 1000 AD cannot be stored at all. A lot of interesting history happened before 1000 AD, so a different system needs to be used.
I’m not sure that I have time to implement this date scheme as I hope to, but I wanted to write it out to illuminate the issue.
No commentsDesign is beautiful, and challenging
For the last few days I’ve been working o the site design for Visualize History. I think I’m done, but only because I have no more ideas. I am horrible at design. By design I mean graphic design, including fonts, colors, background images, and layout, and how all those things interact with each other. I don’t know why and I’m usually frustrated to no end but the way my brain is wired does not compute design well at all.
It usually frustrates me even more because I have an intuitive feel for user interface design. I can picture how a user will interact with my programs but not how they will look at the site.
For example, I just added a feature which shows the events for different topics in different colors. The event was difficult to add but trivial in concept and blatantly obvious to me. It required adding a new library and some code changes across 3 files. It took me about 25 minutes including testing for bugs. I spent the last two days trying to redesign the layout of the site. The same 25 minutes got me nowhere, and in the 2 days I spent on it, I managed to add two images and change the font of my header.
I suppose that is why a programmer is not a programmer but a UI designer or a artist or a software engineer.
No commentsGetting data from the internet
I’ve been telling all my friends about my recent issues with gathering data in an appropriate way. Each and every one is shocked. They all know how to google some fact and they also know the internet knows everything there is to know about anything. I try to explain the problem but have some trouble. So I’ll do it in writing instead.
The basic problem is that the internet is too powerful. You can express whatever you want, but, more importantly for us, you can express it however you want. You can lay things out differently, use different fonts, change the order of things. That doesn’t even include animations, flash, images and crazy tricks that I barely even understand.
I’ll give you an example from my new data sources. There are two sources I’m using that both reference the assassination of Abraham Lincoln. I will paste the code. Don’t worry if you don’t understand. Do notice how obviously different the two are from each other.
Compare:
http://valley.vcdh.virginia.edu/reference/timelines/timeline1865.html
1: <tr bgcolor="#cccccc">
2: <td><b>04-14-1865</b></td>
3: <td align="left"></td>
4: <td align="left"></td>
5: <td align="left"></td>
6: <td align="left"></td>
7: <td align="left">Lincoln assassinated. Assasination attempt against Seward & his son. The Stars & Stripes raised over Ft. Sumter.</td>
8: </tr>
with:
http://www.historyplace.com/civilwar/index.html
1: <center>
2:
3: <p><a name="shot" _base_href="http://www.historyplace.com/civilwar/"></a><b><font color="#0000a0" size="+1">Lincoln Shot</font></b>
4: </p>
5: </center>
6:
7: <p><b><font color="#ff0000">April 14, 1865</font></b> - The Stars and Stripes
8: is ceremoniously raised over Fort Sumter. That night, Lincoln and his wife
9: Mary see the play "Our American Cousin" at Ford's Theater. At
10: 10:13 p.m., during the third act of the play, John Wilkes Booth shoots
11: the president in the head. Doctors attend to the president in the theater
12: then move him to a house across the street. He never regains consciousness.
13: </p>
There are many complicated and less complicated differences, but its obvious to even the most untrained eye that these two pages will require two difference programs. In fact, what I’ve found is that for a project like this one, one that requires so much detail in its information, cannot use a general purpose algorithm to understand the vast information available on the internet.
So, you might ask, what am I doing? Well, I’m adding the data. One. Site. At. A. Time.
No commentsWith Renewed Purpose
I’ve always said that there is nothing like a shower, a shave and a good night’s rest. Any of those three is great, and I’ve had all three in the last 12 hours. Pretty impressive, I know.
As a result, I am ready to face the project again, with my inspiration renewed. I’ve been working on my sources, which, like everything else, you can find on the wiki. I’m trying to focus my search on data that fits my project. Here’s an outline of the data I’m thinking about right now.
The Valley of the Shadow is an exhibit produced by the University of Virginia that looks at two nearby towns, one in Virginia (and therefore the Confederacy) and one in Pennsylvania (and therefore in the Union). It tracks events from the perspective of each side, and gets into a fair bit of detail for that very specific region.
Civil War Timeline is a general timeline of the war from TheHistoryPlace.com that covers general facts like battles, truces, and governmental actions. It would be difficult to present the Civil War without "The Facts" from a place like this.
American Presidents is a dataset prepared through the Simile Project at MIT and provides detailed information on the 43 Presidents of the United States, including religious and affiliations, birth and death times and places, and term lengths. It is always helpful to keep in mind who was running the country during these big events.
The last source I’m seriously considering is the U.S. Census, which provides a whole mess of historical data. My advisor first suggested it to me, and now I’ve had a chance to look at the data a little bit. I’m not sure it exactly fits my model, and it seems to me more like a GIS dataset. I can’t imagine how I would display percentages and especially changing percentages in my current setup.
So that’s the plan. I’ll start working on it today, probably, and I’m sure that list will get updated a lot.
No commentsStumblin’, Bumblin’, Fumblin’
Prepare to come about. Ready about. Hard a-lee.
I met with my advisor today and she and I had one of the most productive meetings of the project so far. Now that was probably because I was so lost (in stays if we’re sticking with the metaphor) yesterday that I wrote down a long list of items in two columns and said “pick one”. She did, and we talked about where the project is and where it should be.
The Good News: Where the project is
As I mentioned a couple of days ago, I’ve made great progress recently. I’ve added external data with examples, completely revamped the object model and in general made everything better. You can see the Presidents lined up by their hometowns and at the same time see which states existed when. You can view the U.S. Holocaust Museum’s kml file of the concentration camps, seeing all the data that the Museum added. It is nice to be able see that in action.
The Bad News: Where the project is going
As you recall, my goal in this project is to solve this problem of visualizing historical data in the abstract. Having decided that one or several sources of data would be necessary to achieve such a goal, my advisor and I sought out such sites. But alas, none were found. What to do?
Ok I’m done talking like that. We decided it would be best to find one area and one time period and digitize many sources for that time period. The time period I’ve chosen is 1850-1890 (you: “Wow this guy really likes the Civil War”). I hope to look at a number of sources and find a nice way to parse them all without having to store them all. Right now I’ll start with historical census data and a few others sites that I’ve found and we’ll see where that takes me.
So I’m a little bummed that I seem to have failed in this particular piece of the project. I’m glad to have a well-defined goal again but I was really hoping I could do it all. I suppose this is why software companies exist.
No commentsMuch Progress (aka Look What I Can Do)
I’ve finally started to get some content up on the site and I have to say it’s a great feeling. I’ve always been able to picture what the site should look like, and while it’s not even close, I’m pretty amazed at how cool it is to see it coming together.
Now I’m sure this is frustrating for you, since, as you know, you cannot see the site coming together. Still, you should take my word for.
If you don’t, let me tell you that I’ve added two types of data: data uploaded by hand, and, more importantly, third-party kml files. As some of you surely know, kml is the file type for Google Earth. Part of the new specification allows for dates to be associated with places, and is that way kml can be converted to historical data fairly easily in concept. For example:
- These United States: http://code.google.com/apis/kml/documentation/us_states.kml
- Concentration Camps: http://www.ushmm.org/googleearth/camps.kml
- Law and Order: CSI: http://myweb.students.wwu.edu/~kennym2/EGEO_451/lab1/locations_3.kml
Those files and many others can now be display in Visualize History, with the Time Slider, polygons, blurbs, mouse effects, links and more. They are all hosted on someone else’s site, relieving Visualize History from the burden of storing and updating the historical data.
A lot of work needs to be done before I will open up the site, but it is nice to make quantifiable progress.
No comments"What a nifty way of doing things!" says I
I’ve been committing a fairly substantial mistake on Visualize History, and while I’d usually be the last one to admit any mistakes I make, I’ll admit to making this once since I think it may be fairly common among programmers and, in addition, real human beings.
The problem stems from finding out a new and cutting edge way of attacking some problem, in our case AJAX. “What a nifty way of doing things!” says I, and now every problem needs an asynchronous call. Now you may not care about the technical advantages or disadvantages of AJAX, nor should you. The big point is that for 7 or 8 years I’ve been doing things one way, and now everyone seems to want to do things a new way, and so do I. But the old way has worked pretty well, and, well, you and I both know not to fix things if they ain’t broke.
I remembered that Joel Spolsky wrote about something similar and in trying to find it, found this instead:
The MSDN Magazine Camp is always trying to convince you to use new and complicated external technology like COM+, MSMQ, MSDE, Microsoft Office, Internet Explorer and its components, MSXML, DirectX (the very latest version, please), Windows Media Player, and Sharepoint… Sharepoint! which nobody has; a veritable panoply of external dependencies each one of which is going to be a huge headache when you ship your application to a paying customer and it doesn’t work right. The technical name for this is DLL Hell. It works here: why doesn’t it work there?
I’ve read Joel’s article, and I agree with it. It resonates with me, and yet I still fall for these tricks all the time. I fall for them when there is not company pushing on me, whether by convention or by advertising. I fall for them all the time, and I don’t know why.
The problem is very closely related to the problem as discussed by Raymond Chen which he calls “solving one problem by creating a bigger problem.” It’s not exactly the same but for some reason I do both, and I think for a similar reason. Once I understand a concept very well, or at least like a concept a lot, I find that I can jam any problem into that concept, no matter how dissimilar they may seem. So if that means installing cygwin on my PC so that I don’t have to learn Windows command line syntax, then so be it.
So this post is to remind you (and by that I mean me) not to force new techniques on old problems where they don’t really belong, and to treat a new technique as just another technique. There’s more than one way to do it, but that doesn’t mean try all the ways, or even just one way all the time.
No commentsSorry, I hid the site
I’m on break right now and I’ve been working and re-working the site. Things are going up, down and all-around so I made a possibly saddening decision to hide the site from the general public. I think people that have been arriving at the site recently have done so accidentally and I think I’ll wait until there is at least some actual data available before I let you see things.
No commentsWhat is version control?
(This post is the first in a mini-series titled “Should I make my code open source?”)
For me, the biggest reason to open up Visualize History is certainly version control. Don’t get scared if you don’t know what that is, because you really do.
Imagine that you are a senior in college writing your senior thesis. You’ve collected your research and you write your first draft, which you turn in right before Thanksgiving break. Over break, you manage to forget about the draft completely, and, when you return to school, your advisor says I’d like you take it in a new direction. So you delete pages 4 through 19 and rewrite them completely. You save the file, and have deleted the old version. Then you hand it back in, and the professor says “oh, wow, the old paper was so much better”. You panic. But if you are using source control, all you need to do is check for old versions. You can browse the changes, combine the new and old versions, and copy and paste from both files as if there were different.
Some of you may be thinking well luckily I’m not a dumb college kid who throws away his senior thesis anymore. I would have saved a copy. So, my second example comes from the law firm of someone I know quite well, who for this post we’ll call Alan. Alan tells me that at his firm, they write and rewrite drafts, sometimes making 30 or 40 versions of one file. “Don’t worry”, he tells me, “we have a system. When I change a file, I add a number to the end of it. Today I just finished ‘brief about something boring.v23.doc’.” Version control has an answer for these people too. Not only does it automatically add the version numbers for you (”Wait is this draft version 23 or did Dave finish 23 and I’m on 24?”), but it understands who you are. That means if your assistant is correcting your spelling while you are adding an appendix on things your client can’t get caught doing, the version control system can merge those changes, automatically. It can highlight lines that have changes, and it remembers every change ever made, and who made it. Not only that, you can control who is allowed to see or to change a file, and you can undo all changes made by one person. So when Bill quits out of frustration over the ridiculous naming system, but before doing so, inserts random words in every brief he has access to, you can laugh at crazy Bill and fix his spiteful nature in about 8 clicks.
There isn’t a software company that I know of that doesn’t use version control, which we often call source control in the business. But there is no reason to limit it to code. I have, for example, several Word documents and PDFs on my site in the documentation section. I would not be surprised if Microsoft’s next versions of Office included very obvious source control manipulations. And you should all use it.
For Visualize History, I use the open-source industry standard, which is called Subversion. There is an extension for Windows called Tortoise SVN which I like a lot. I think Tortoise SVN does a great job of making an otherwise overly-technical idea into part of the average user’s normal experience. The tool, though, is not the important part to me. I think the idea that I may want to look at an old version of my documents, and that someone else has made that easy and free is a pretty cool idea.
No comments