August 20, 2008

Genealogical Record Keeping Systems

Filed under: Genealogy — Doug @ 11:02 pm

Thinking about the user interface for my genealogy program, I thought it would be good to take a look at some of the existing record keeping solutions, as that is where I’ll be starting.  The record keeping systems allow you to store evidence, but do not include lineage-linked views.  In other words, you can store census records, birth certificates, etc., but you won’t be able to display a pedigree from them.

Here are some very quick overviews of what I’ve seen after playing with each one a few minutes each.


Clooz is a package I’ve mentioned before, as it was the one package I had used briefly (years ago).  There is a list of object types (census, people, buildings, sources, research log) down the left side and a list view on the right side.  There is a centralized list of people, and adding a census entry consists of adding a census record and then linking folks into the census record.  All data entry is done in dialogs, making it easy to get a handful of dialogs open at the same time (census record dialog, link people dialog, add person dialog).

Clooz 2.1 Screen Shot

Clooz 2.1 Screen Shot

The Clooz home page describes the package as follows:

Clooz 2.1 is a database for systematically organizing and storing all of the clues to your ancestry that you have been collecting over the years. This is not another genealogy program. It is an electronic filing cabinet that assists you with search and retrieval of important facts that you have found during the ancestor hunt.

It uses an access database behind the scenes, and has some integration with Legacy, but that only works with version 6, and I’ve upgraded to version 7.  There is a free download on their site, limited to 29 days or 15 launches.


Custodian is a program I hadn’t seen before.  I found it somewhat by accident in an article that contrasts it with Clooz.  Similar to Clooz, it has a list of object types down the left, but it is an MDI application, so when you go to add new items, a child window pops up, which is a big list view with buttons down the side.  Most data entry is done right in the list view, until you get to something like a name, which requires editing in a dialog.

Custodian 3 Screen Shot

Custodian 3 Screen Shot

The package is very colorful, using lots of backgrounds and shading, which I found made it hard to read at times.  The data is stored in password-protected access database files, under Program Files, of all places.  There is a free download on their site so you can try it before you buy.  The trial limits data entry to ten records per section.


I found a link to Bygones on Cyndi’s list, which has a section for these types of packages.  It has an interesting look, in that it appears to be a piece of paper.  I found that made it a bit hard to know where to enter data.  It is written using FileMaker Pro, but I don’t know if the look is typical.

Bygones 0.9d Screen Shot

Bygones 0.9d Screen Shot

Their home page has a handful of slide show tutorials, which I probably need to watch, as I wasn’t quite sure how to use the package.  I did watch the first half of the introductory slide show, and it looked interesting.

It is a free download as a self-extracting zip, with no installer.


GenScribe is a Mac program that appears to be closer to what I had envisioned, in that it displays a nice representation of a census record.  The opening screen is a list of buttons for various operations.  It has a list of work to be done at a specific venue, source records, and index records.

GenScribe Census Screen

GenScribe Census Screen

I’ll have to fire up the Mac I have on loan from work and experiment with how they do the data entry.

There is a free trial download, and the full product only costs $12.


I’m not sure if I’m really qualified to make any summary statements after just a few minutes of playing around, but I’m going to do so anyway.

My impression is that these packages, in general, suffer from the same problem as lineage-linked packages, in that they don’t adequately differentiate between evidence and conclusions; they just do it from the other end of the spectrum.  For example, in Clooz, you link individuals in your database to populate specific census records.  Well, what if the person named “John Doe” in the census isn’t really the same “John Doe” you have in your database?  This can be seen below, as the “Person” dialog has a list of the census records to which the person has been linked.

Clooz Person Dialog

Clooz Person Dialog

My goal is to enter census data, birth certificates, and the like nearly verbatim so that the viability of the census record is intact, even if the person isn’t really part of my ancestry.  I should be able to link and unlink evidence and conclusions without “touching” the evidence at all.

Will I really be able to pull it off, and put together something that works better than these packages?  Probably not, but I hope to at least learn a lot along the way.

Trying to move forward on my genealogy app

Filed under: Genealogy, Software — Doug @ 7:42 am

I’ve been a bit stuck on the genealogy project, for a couple of reasons.  First, I’ve been spending a fair amount of time watching the olympics, which hasn’t left much time for coding fun.  Second, I’ve only been playing a bit with a domain model, and domain driven design just doesn’t quite feel right for this project.  It’s probably my lack of experience with the paradigm, but it seems more applicable to complex business models (lots of interactions and state changes) than it does to a single-user data repository.  I don’t regret spending time with it, as I’ve learned a lot, but I think I need to try a new tack.

Yesterday, I attended the Minneapolis Silverlight User Group meeting, which reinforced the fact that WPF is very cool, and I should probably try to use it for this project.  I played around a bit last night with Family.Show, which is a very cool, glitzy app, but it doesn’t even allow entry of sources.  (It’s a WPF reference app, and in that it succeeds very well, but it’s a long way from a full-blown genealogy app, and they readily admit that.)

My new approach is going to be to get something working, include plenty of unit tests, refactor mercilessly, and try to do the simplest thing.  I’m curious to see where that will take me.  The first thing I’m going to implement is the ability to store 1880 US Federal census records.  I know I want to use SQLite, so I’ll be using that out of the gate.  I’ll likely wind up using NHibernate as well, but I’m going to hold off adding it until I get something working (I’ve never used it, and I’ve got enough new things to learn).

I’ve only begun to explore WPF, so I’m sure the first incarnation will be ugly, smelly code, but hopefully I can refactor it into something decent.  In the very first iteration, I’m not even going to try to maintain a clean separation of concerns; I’m just going to hit the database directly, which will make writing unit tests nearly impossible.  That will be one of the first things I’ll need to fix.

Initially, the app may look and feel a bit like a WPF version of Clooz, but before I get too far, I’ll want to start hooking the evidence to the conclusions to fully realize the design I laid out in my previous post.

August 9, 2008

Genealogy data model from 30,000 feet

Filed under: Genealogy — Doug @ 2:18 pm

Here is an overview of the genealogy model that I’m proposing.  There are still many details to be worked out, but this is a good place to start the discussion.

Genealogy data model from 30,000 feet

Genealogy data model from 30,000 feet

The administration section contains all the stuff that relates more to the process or the program, and less to the genealogy itself.  Examples:

  • Searches – records of searches done on genealogical sources.  For example: on 23-Aug-2006, the 1850 census for Champaign County, Ohio, was searched for Smith and Sheue surnames.  No extracts were found.
  • Tasks – a todo list.
  • Revision History – records of changes to the data (imports, modifications, etc)
  • Surety Schemes – part of the GenTech Data Model.  It defines ways to classify the “quality” of a source/extract.

The “other family trees” section needs a better name, but the gist is that you should be allowed to import other family trees (via GEDCOM, for example) and link them into your tree as a “guide”.  What do I mean by “guide”?  Well, many programs treat a GEDCOM file as hard evidence, and import it into the conclusions section.  The problem is that you have no idea whether the person that constructed the tree followed sound research techniques, or just threw a bunch of names into a hat.  The information contained in them can still be useful as clues to build up your tree, so the model allows them to be imported and linked into the conclusions to help guide your searches.  They will not carry the same weight as evidence, however.

That said, the tool should allow import of a high quality GEDCOM file (or other format) and apply the data to evidence.  There are problems with the GEDCOM format, however, that make this frought with peril.  (More about that in an upcoming post.)

The evidence section contains lists of sources and the data extracted from those sources.  Examples:

  • Source – a document that contains information relevant to the family history in some way.  This might be a census, land deed, newspaper clipping, letter, interview transcript, etc.  The GenTech Data Model provides the ability to build a hierarchy of sources, which would be useful for things like a census, where one would start at the Federal level, followed by States, then Counties, etc.
  • Image – a copy of a source page – scanned document, microfilm image, screen shot of an online database, etc.  It would probably also be handy to store text documents, but I’m not sure where those fit into the model as yet.
  • Extract – The data extracted from a specific source page (or set of pages).  For example, a census record includes many households, and some households may span more than one page.  An extract would contain all the information about one household, broken down into facts.  (Much more about this in an upcoming post.)

The conclusions section contains assertions made by the researcher about the evidence, and allows another researcher to analyze their reasoning.  This is done by creating a Persona and linking it to individuals in evidence records (or individuals in other family trees).  For example, you could create a persona and tie them to the “John Smith” that appears in the 1850 Ohio Census, and then tie them to the “John Smith” that appears in the 1880 Iowa Census, along with the reasoning for doing so.  This essentially asserts that the 1850 and 1880 individuals are one in the same.  As more instances of this individual is found, they would be tied to this same persona.

The GenTech Data Model does this by creating multiple personas and layering them on top of each other, built up by assertions.  In my mind, it would work just as well to create one persona, and tie them to multiple individuals in the evidence.  This will make it easier to do things like display a pedigree of family group sheet.  In the GenTech scheme, unraveling the layering of assertions is expensive (from a computing standpoint).  It was also unclear to me how they expected to handle a revision where a lower-level assertion was removed (a reinterpretation of the evidence), as this would affect all the assertions layered on top of it.

What is the difference between evidence and conclusions?  Evidence is not subject to interpretation, unlike conclusions.  Well, that’s not entirely true, as extracting the evidence from a source relies on interpretation of the handwriting.  The interpretations at the conclusion level are much more broad, however.  For example, how can one conclude that someone in the 1880 census is really the same person as someone in the 1850 census?  That sort of conclusion is difficult, and the reasoning behind it needs to be accessible for later researchers (or the same researcher a few years down the road.)

One other thing that I’ll mention to wrap up this post.  Not only is it possible in the conclusions to state that a persona is the same person as someone in the evidence, but it is also possible to state that someone is not the same person as someone in the evidence.  For example, we could tie a persona to our 1850 individual, and then note that this persona is not the same as the individual in the 1880 census.  This “negative” information is just as valuable as the “positive” match information.

August 7, 2008

Yet another genealogy tool?

Filed under: Genealogy — Doug @ 3:05 am

Yes, the project I’m currently poking away at is yet another genealogy tool.  There are a boatload of them already on the market (free, open source, commercial), so why does the world need another one?

Most “family tree” tools are just that – they help you create a pretty family tree.  I’ve used many of them, and the one thing they seem to lack is a nice way of dealing with all the research that may or may not be related to the family tree, or even worse, conflicts with the data in the family tree.

Don’t get me wrong.  There are some very well done tools out there: Legacy, The Master Genealist, Family Tree Maker, plus many more.  They don’t quite do what I want them to do, however.  (Or at least, not in the way I’d like to be able to do it.)

Let’s say, for example, that you’ve found your ancestor in the 1850 census and the 1880 census.  The 1850 census shows a birthplace of Ohio, but the 1880 census shows a birthplace of Pennsylvania.  Which one is correct?  Most family tree programs force you to enter one or the other (TMG being a notable exception).  What if one of those individuals isn’t in fact your ancestor after all?  How do you document which one is correct, but still retain the data for the person that is not your ancestor?  Most allow you to shoehorn in the data, but it isn’t always prominent – it gets buried down in notes or other fields away from the main screens.

At the other end of the spectrum is another nifty tool called Clooz.  This gives you the ability to store all your genealogical information (census records, birth certificates, etc) and quickly and easily search through them.  The part that it lacks, however, is a way to tie together these disparate facts into a family history.

I’ve made a couple of aborted attempts to create my own genealogy tool based on the GenTech data model (GDM).  The GDM works very hard to separate evidence, conclusions, and administration within the model.  The lack of clear separation is one of the things that frustrates me with the models of many existing tools.  The way that the GDM models conclusions uses a concept called assertions, which can be layered on top of each other.  It is a very powerful concept, but I could never figure out a way to extract out a family tree from the model.  That led to at least two projects being aborted.  This time around, I’m not going to try and follow the GDM.  I’ll use it as a guide, but I’ll go my own way as needs dictate.

There are a number of new technologies and tools that I think give this new incarnation a shot at being successful.  These include new development methodologies, such as Behavior Driven Development (BDD), which is an amalgam of Test Driven Development (TDD) and Domain Driven Design (DDD); new tools, such as NHibernate, iBATIS, SQLite, and Windows Presentation Foundation (WPF); and new patterns/models, such as Aspect-oriented programming (AOP), Inversion of Control/Dependency Injection, and DataModel-View-ViewModel.

This project will give me the opportunity to explore and play with all those tools and technologies.  Right now, I’m reading Jimmy Nilsson‘s book Applying Domain-Driven Design and Patterns: With Examples in C# and .NET, while trying to work my way through Charles Petzold‘s book Applications = Code + Markup: A Guide to the Microsoft Windows Presentation Foundation (I learned Windows 2.0 from his first windows book – yeah, I’ve been writing code for a while).

I hope to start playing with some code before too much longer, and I’ll blog about it here.

Blog at