29 March 2013

The Life of PIE

Around 3600 BCE (give or take a millenia or two), humans of an area that may have included India and Eastern Europe (give or take a few thousand miles in any direction) spoke what linguists call Proto-Indo-European (PIE). As people spread out, conquered or were conquered, and assimilated the indigenous people around them, they began to live too far or too isolated from other related tribes. Their new environment was different from their now-distant cousin's world. Because of those environmental differences, their language gained new words, invented slang, and deleted words no longer needed. I know what you're thinking... "Come on, lady! You said you were doing a DNA series!" Well, I assure you, this is a great tie-in. Sit back and enjoy the ride.

A (very) small representation of the "family tree" of Proto-Indo-European languages

Proto-Indo-European is essentially the mother-tongue of thousands of languages. PIE was the parent of Proto-Albanian, Proto-Armenian, Proto-Anatolian, Proto-Balto-Slavic, Proto-Celtic, Proto-Germannic, Proto-Greek, Proto-Indo-Iranian, Proto-Italic and Proto-Tocharian. To be sure, some researchers add more "children", some add less. Sadly, Proto-Anatolian and Proto-Tocharian have no living languages, having been replaced by the language of conquering peoples thousands of years ago. Proto-Albanian and Proto-Armenian mothered modern Albanian and Armenian, respectively (no siblings for these children!). All the others added several grandchildren (and a few great great great great grandchildren) to PIE's family tree. So what does this have to do with genetics? While a true scientific corollary between genes and language is still controversial, its hard to argue that a rough pattern doesn't exist. As man emerged from a singular origin point, he adapted to new environments. He became hairier in cold climates, had more sweat glands added in humid ones. He developed disease immunities and food allergies. Not every mutation is good, but any mutation that makes it to the next generation is a "winner".

I love talking languages almost as much as I like talking genealogy (sometimes it's a dead heat). One thing I enjoy using when talking about language is my first name, Starr. There is no language that I know of that doesn't have a word for "star". So it's a perfect way to illustrate the connectivity of language. And I can use that to illustrate some of the concepts of genetic genealogy that you need in order to set expectations and choose the right tests for your research. You'll note in the pictures above and below that some words are in red. Each is the word for "star" in the neighboring language. If the large tree above is too small (click to enlarge), Proto-Indo-European has the word H'ster. As you follow the branches, you'll see similarities follow through to almost every connected language. Because of accent, slang or some other mutation due to unique environments, each language changes the word just a little bit. But PIE still lives inside.


I put Persian and Urdu in their alphabet so you could see the spelling similarities and the other three in Roman characters to show a similarity of pronunciation.

Naturally, no one speaks PIE now. In fact, PIE existed (if it did exist) long before written word and we speak offshoots that have small similarities to PIE. So how do we prove PIE? Well, linguists noticed that Spanish, Italian, French and Romanian (and many others) all had similar words. This branch was easily connected to Latin (the language of Roman conquerors who tore through most of Europe all the way to England), because scholars and churches still used Latin. the gradual change from the mother-tongue to what's called the Romance Languages was documented by their written records. Knowing when a document was created, researchers were able to identify when a spelling or total word change (mutation) happened and connect it to an earlier form of the language until they reached the purer form of Vulgar Latin (there is a so-called Church Latin that is a bit more formal). Scientists took other languages and studied their words, grammars and date of first recorded use to help group the languages together and link them to similar but now dead languages. Parent languages were determined by having the same or similar words as all their resulting languages, but missing the differences of invented words. If 3 languages have the same or similar word for bicycle, but different words for car, then their parent language had no word for car, but a similar word for bicycle. As I point out in the photo above, Sanskrit uses the word Tara for "star". Vedic Sanskrit must have a similar word, because Sanskrit's sister, Prakrit, has children that use the same sounds in their words (Sitara and Takara). Also note that Persian is Urdu's 2nd cousin once removed, but their spelling for the word star is very similar. To be that close, linguists argue, Proto-Indo-European (or an intervening now dead language) had to have had a similar word and alphabet. By tracing when the differences pop up, linguists can get a rough guess of when a population diverged from it's siblings.

While language "mutation" doesn't follow gene mutation exactly, the way geneticists determine parent genes is very similar to how linguists determine parent languages. When a gene strand, let's call it Ted, mutates, scientists mark the mutation. So after generations, four people decide to test their DNA. Judy has TedAB, Frank has TedABC, Mikki has TedABD, and Bernardo has TedAE. Obviously, Judy, Frank and Mikki are related more closely since they have the A and B mutations. Frank's family tree is documented to, say, Italy. Mikki is from Russia and Judy has documented Indian heritage for at least 10 generations. So scientists reason mutation C is an Italian mutation and D is an Russian mutation. At some point in the past (probably earlier than Mikki's family has documented), Frank and Mikki's ancestors were in India. Bernardo's family is less connected because he doesn't have the B mutation. At some point in ancient unknown history, Bernardo's family left the main group with the TedA mutation prior to the B mutation. Using historical documents and as many living test subjects as one can, scientists build an algorithm that guesses the most likely migration pattern of the Ted gene. Bernardo's family is connected to a large population of E mutations who still live in China. A scientist's best guess would be that the original TedA group split up with some going to India and some to China. But where did A come from? With more tests, scientists find a group of West Africans who have TedF. No A. An isolated tribe in South Africa is discovered and tested. They have TedFG. Aha! So their family originated from somewhere in West Africa. But where is the original Ted gene? As far as anyone could tell, since no one has found an unmutated Ted gene, the origins have to be somewhere between North and West Africa. That's a lot of mileage to cover. The more people who are tested, the more who have good documentation of their own personal family migration, scientists can make the picture of when and how Ted began to mutate clearer.

Note how similar all those "sibling" languages are. (And not too far from their "cousin" Greek.)
When Richard III was recently discovered, scientists used two documented living relatives of Richard's sister and tested their mitochondrial DNA. Mitochondrial DNA is passed directly from mother to child with no interference from the father's side (with few exceptions). Their mutations are specific. One can watch the family tree of mtDNA grow and see where the changes were made. The more mutative markers that match between two people's mtDNA, the more closely related their direct maternal line. It's still a bit of a best guess, but because of matches in the mtDNA (and other physical proofs), scientists feel confident enough to declare they have found Richard III. When you have your own DNA tested, whether it be the ethnicity (autosomal) test, Y chromosome, or mitochondrial DNA, you'll be matched against living people who have taken the test. You'll be connected to people who have the same mutations. The more mutations in common, the closer you're connected to that person. To find out if you are related to someone who is deceased (whether it be your unknown great grandfather or Charlemagne), you'll be matched to living people who have a documented proof for their connection. If there is no hard proof that a living person is connected to the deceased person in question, you will not be able to prove a connection yourself.


Compare this to their Latin and Greek "cousins". Also note, Irish and Scot Gaelic have another name for "star" that is Seren or close to it. I wanted to show a variant that also has a similar "cousin".


Now, chances are you're reading this blog in English. Are you from England? Are your parents? If I went through your tree for 10 generations, would I find only English ancestry (1022 directly related people all from England)? I'm guessing not. English is of German origin. This is better seen in Old English rather than Modern English. Why? Because English has had influence from several languages since it's beginning. We no longer use "thee" and "thou", which interestingly were originally spelled with a letter that looked very similar to a y. When we dropped that letter, we replaced it with y (and that's how "thou" became "you"). We borrow from other modern languages for "taco", "kimono", and "aloha". We have thousands of borrowed or improved words from Latin, because it was the language of scholars and conquerors for so long. Prior to today's post, you may not have realised that English, Hindu and Albanian are cousins. But if you heard all three, you may have noticed similar sounds, words, or alphabets. Many people can mistakenly believe they are understanding a foreign language, because of these similarities that transcend written history.

In genetics, mutations come in several forms. A gene can be deleted (goodbye to "thee" and "panchymagogue"). Or a gene can be mistranscribed in only one spot (what's called an SNP or single nucleotide polymorphism). This would be like the confusion of there/their or your/you're in English. A gene can be inverted (copied upside down). Similar to my new pet peeve: people who literally use literally wrong. I've already mentioned borrowed words. In genetics, that would be translocation or insertion. Every human gets 23 chromosomes from mom and 23 from dad. One pair is the sex chromosomes of XX or XY. The other 22 pairs are called autosomal and they provide most of our genetic makeup. In a chromosome pair, sometimes a gene will be transferred from mom's gene to dad's gene (or vice versa). Sometimes they'll swap genes. What this means is that when a cell is divided to make the egg or sperm, the half that is made the egg/sperm may have more or less of mom or dad in it because of which chromosome makes it to the new cell. The child made from this combination isn't an exact 25% of each grandparent and can be missing an ethnicity marker or have more than documentation would allow. You'll notice in the photo above the different words in Irish and Scots Gaelic that don't really seem to fit. There are many words for "star" in many languages and Irish Gaelic also uses "seren" like Welsh does. And because of the influence of English (due to German invaders and modern prevalence), Welsh also accepts "star" in conversation. Irish and Scots share a different word that suggests an indigenous tribes of human predate Celts and were replaced (the now defunct father-language translocated it's native word into the Celtic mother-language). And the English word "star" has been inserted into the Celtic languages.


These are so close to each other, they might as well be the "John Smith" of language!

Ideally, both linguists and geneticists want to find isolated groups of people to test. The more remote and insular, the better. Because of the large influence of Latin, it shows up in English despite English's Germannic origins. The above enlargement of the Proto-Balto-Slavic branch of PIE's family tree shows what scientists really hate to find. Proto-Balto-Slavic covers Central to Eastern Europe and a great deal of Asia. Where did it start? How long ago did it break away from PIE? Is Macedonian really a cousin of Croatian, or are they siblings? Where's the conclusive proof? There is none really. It's all a best guess. Researchers look at when languages were first documented. They identify the earliest known ethnic identity that is specifically different from it's neighbor. And then they guess. (Seriously.) Anyone who has studied the history of the countries involved here knows that the borders changed more frequently than Taylor Swift's boyfriends. So where does one draw the line? Where does Lithuania end and Poland begin?

In genetics, this problem runs rampant. Anyone dealing with high Scandinavian or missing German ethnic markers knows what I mean. The Euro-Asiatic land mass was large and relatively accessible. People were coming and going and conquering and being conquered all of the time. One group would win today only to lose tomorrow. So testing for their specific location markers is difficult. A mutation may have originated in Scandinavia, but the vikings ran rampant all over the place (more than once). And we all know about Genghis Khan! While the labs get more tests and refine the algorithms used to decide where Central Europe turns into Eastern Europe, we just have to be patient. As far as Native American markers, many people want to know the specific tribe they belong to. That isn't possible, because so many tribes traded and intermarried leaving no definitive mutations that point to one tribe over another.

Albanian and Armenian have changed a bit from their beginnings, but have no siblings. All Proto-Anatolian and Proto-Tocharian languages are now extinct, but notice how close they are to others like Welsh!
 The photo above shows two still living languages (Albanian and Armenian) that have no siblings. Their respective orgin tribes remained isolated and insular to the point that they remained purer to the original PIE. Before you start pointing it out, Yll may have at one time been Hyll and before that something closer to H'ster. Note also the two dead branches of Anatolian (which had the documented language of Hittite among others before going extinct) and Tocharian. Both of them share similarities to other branches of the tree that were geographically isolated from them. Linguists argue that the only way that is possible is if they have a common ancestor. Researchers hope to one day connect PIE to it's sibling languages from around the world into a higher Proto language and go higher still. Few people give any credence to studies that try right now as there is so much we don't know that it's all guesswork. It'd be the genealogical equivalent of connecting your grandmother to Adam and Eve.

Geneticists are also trying to find our origins, but have the same problems. Humans move and mate a lot. It's difficult to detrmine if the markers in a group's DNA are exclusive to them or a larger group. It's near impossible to be definitive on whether it proves a connection to nearby neighbors or indicates a deeper, older connection to a long dead group. All reasonable research shows a common origin point of Africa and a general migration patern from there, but genealogists want something more definitive. And it's just not there yet. It really is more guess than science right now. Should we give up on it? Not at all! The more tests taken by more people, the more information we have to narrow down the results. The science, just like humans, is in constant evolution.

And I just realised I also illustrated why no one can define an origin point for a surname either. Dang, I'm good.
-Hoshi

21 March 2013

Before We Begin, Where Do We Start?

In high school, a teacher once said, "Children know everything, because they don't know what they don't know." That's very true. An infant is unaware of the existence of anything outside of it's experience. If he can't see it, it ceases to be. That is why babies love Peek-a-boo! A baby is watching you leave and reenter existence and he finds that fascinating. As soon as he is able to recognise that "out of sight doesn't mean out of reality", the game becomes dull and useless. A toddler "knows" that Earth is populated by people who pay attention to him and ends at the edge of town (or however far his parents have taken him). When he goes to school, he learns the world is a huge place with people he may never meet and places he may never go (and yet they exist all the same). As the child advances from elementary to high school and on to college, he is introduced to a progressively larger world. He learns a great deal, but the biggest epiphany is, "There is more that you'll never know than you'll ever know."

This is a recurring theme in many spheres of humanity, so naturally I have a genealogical corollary. When we are children, our "world" is our immediate family. Our mother, father, siblings, grandparents, aunts, uncles, cousins.... whatever relations we see frequently. As far as we "know", everyone has the same family structure. As we grow, we meet new relations and learn how they connect to us. We learn generational differences (uncle vs. great uncle), family "sides" (maternal vs. paternal), and how different our family dynamic is from anyone else's. If we are bitten by the genealogy bug (hello, Dear Reader) or just really anal, we learn the difference in 1st and 3rd cousins and what the heck "removed" means.

We also evolve what facts we "know". As a child, I "knew" my mom, dad, siblings and cousins were born in the U.S. If I had been asked where my family was from, I would have said "America". I was about 9 when I started to help my dad with our family research. I learned that great grandma Brown was Scottish and grandma Gibson's parents were Native American. My dad, like most adults, believed children can only handle so many facts at once (and simple ones at that), so he was vague with the details of where our family was from. He knew enough to say we were from lots of places, but not very specific. So at this stage, if asked about my family, I would have said, "We're basically from everywhere." I could be specific as far as Scottish or Native, but I didn't really know more. By high school, my interest in family history was flourishing just as dad's was waning. I now "knew" from the research of many family members that great grandma Brown's parents were Russian, grandma Gibson's parents were 100% Native American, great grandpa Brown was English, and great great grandma Householder was also 100% Native American. The rest of the family was German, Irish and possibly Spanish or French. For a good long while, that was what I "knew".

It can often be a jarring experience for the child-turning-adult when their world grows larger and they learn all the things they don't know. Anyone who has moved to a drastically different environment than what they grew up with (college, first-time work experience, new state/country) can certainly attest to the suddenness and disorientation of "not knowing". Many can suffer a crisis if what they "know" is totally divergent from the new reality. This happens in genealogy all the time, so it's best to prepare yourself for it. When I started my own serious adult research, my father thought I could handle the truth about my grandfather's paternity (basically that it was anyone's guess). I also uncovered greater details about truths I had previously "known". Great grandma Brown was Scottish born. Her parents weren't Russian, but Lithuanian and possibly Jewish. My grandmother had a previous marriage and two aunts and an uncle were the fruits of that union (not really a secret, just not discussed when I was a child). I uncovered murders, betrayals, and scandals galore. And more than a few "nonpaternity events" were shook out of the tree! (A "nonpaternity event" is when one proves through documentation or genetic testing that a child's known father is not their father.) If I hadn't been prepared through a solid family foundation and years of training in objective research, many of these new facts could've shaken my inner core of self-worth. I could have ended up either having a crisis of identity (if my great great grandfather was a slaver, what does that make me?) or denied the blatant truth (the documents must be lying; someone made a mistake). I would have blunted my own research in order to stick with what I "knew" rather than adapt to the ever-growing world my family really lived in.

Not that long ago there was no Internet. In those days, genealogists had to travel to repositories and relatives to gather information. Or they had to write to them and wait for a response. If possible, they may call and get the information a little more quickly, but when compared with how fast the Internet is, these were the days of slow progress. This genealogical "world" was very small. One could spend a lifetime just gathering the information for one line of a branch of the family. What we could "know" was limited by distance, time and cost. Then Internet was invented and with it came message boards. Now we could expand our world to include people with similar research interests and converse with them almost in real time. As businesses started to realise the benefit of providing genealogical services online, our world was once again expanded. Now records and people are available almost immediately. The costs have drastically reduced as well, allowing us to gather more records than our budget previously could handle. Genealogical DNA testing isn't really new, but it is recent and still in the early stages of usefulness. Now our own biology is helping us to bridge the documentation gaps and confirm what we "know". Every day our skills grow as we learn something new about our family or genealogy in general. Every day our world is a little bit bigger and we know more about all that we don't know. There's a bit of a learning a curve, but everyone seems to follow the same general process whether they consciously know it or not. While I consider this my first post in a DNA series, it's really a post about expanding your mind to accept that's there's more that you'll never know than you'll ever know.

Step 1- Forget What You Know
Do you know who your parents are? For the average person, I just wrote the silliest sentence in history. But do you really? If the evidence that you know your parents is that two people who you called "mom" and "dad" raised you, you could have a nasty shock waiting for you. Every day someone experiences a nonpaternity event that refutes what they knew. Every day, a person starts their research despite the wailing and gnashing of teeth of some family member only to find out that said family didn't want to tell them that they or their parents (or grandparents) are adopted. While some can take it in stride, many are ill-equipped and insist the records or DNA must be lying (because we all know grandma didn't, right?).

So start from the very beginning and question it all. This isn't the time to grill your mother Dateline-style, but to find the paper trail. (Where's your birth certificate, Mother? If that is your real name.) Interview relatives about your childhood (and theirs). Ask for birth certificates, announcements, photos, video, etc. I live by the motto of "if I know it, I can show it." Any "fact" that I don't have a record for is preceded by "allegedly", "supposedly", or some other qualifier of "as far as I know". One proof isn't enough, the more independent sources that confirm a fact, the better. Knowing something isn't black and white in the genealogy world. There are many shades of grey in between. Despite all the advances in the DNA options available to genealogists, it doesn't replace documentation. If you have reliable documentation starting with yourself and going back to your earliest known ancestor, the DNA can help to guide your next steps. Without documentation, the first contradictory genetic profile will confuse and most likely upset you. Let's be honest, no one's family tree is unbroken. Somewhere in the past is an unknown father (or two, or twelve). Proving what you know now before taking a DNA test (or two, or twelve) can help you to better interpret the results.

Step 2- Know Thyself
It's a common occurrence in genealogy for a person to have a crisis of identity when they learn something new about their ancestors. The one I see the most is the "if my ancestor was a slave owner, how does that change who I am?" In reality, it doesn't. So you're great grandmother was a gun-toting, cigar-smoking, man-hating, politically conservative yet religiously liberal woman. You are who you are and knowing or not knowing those things about her doesn't change you in the least. It's very important that you accept this basic concept before taking a DNA test. DNA isn't 50 years old, or 100 years, or even 1000. It's older than old. Millions of years of evolution scrambling and combining and changing and rearranging. You may "know" your family lines can all go back to 1700's Germany, but your DNA knows that your family goes farther back than that. I plan this as a series, so I'll get into the DNA test types and ethnicity and all that your DNA "knows" as far as we "know", but right now I just need you to accept that DNA will go back farther than you can possibly document. Know that whatever you find in records or genetics doesn't change who you have always been, it just adds to you.

Step 3- Follow, Don't Lead
As I said, some people take the new facts so hard that they deny their veracity. An ethnicity test could show that your DNA isn't as Native American as you previously thought. Sometimes our documents and our DNA can contradict. Well, they contradict as far as we "know". But again, since DNA goes farther back than recent history and reliable documentation, what we perceive as a contradiction could simply be a clue to a deeper history. If we aren't prepared to accept the evidence as it is, we will try to shape it to the world that we "know". I "know" that my great grandmother's parents were Lithuanian because documents say so (censuses and death records). I recently found birth records for 5 of their 7 children that state their marriage was in Poland. Now, this could mean that the other documents are misleading and that they are really Polish. Or it could mean they lived close to the border of Poland at the time they married and used the closest or most convenient officiant. DNA could confirm Eastern European descent (still not going to differentiate between Polish or Lithuanian and I'll explain in the next post why) or it could throw me for a loop and say that they were Turkish, Jewish, Sub-Saharan African or something else entirely! My great grandmother was born in Scotland, so she's still Scottish. Her parents (if I can confirm their birthplaces) will still be Lithuanian. All the DNA will tell me is the deeper truths. It won't change who we are or were, only add.

Step 4- Flow With the Know
Sometimes I like to pretend I'm a lawyer trying to prove a case in court. I gather the facts via solid and reliable documentation. Most of the time, I am the judge and jury as well, but sometimes the family at large or the genealogical community plays the part of judge. The better my documents, the stronger my case. If I've tried to lead my evidence to the facts I want to believe in, it'll be obvious and my case will be dismissed. In the end, I'm trying to prove what I know "beyond a reasonable doubt." There's always more I don't know, and some day a descendant may uncover it. Until then, I know what I know until I know better.

My next posts will go into as much detail as I can about DNA testing for genealogy purposes. I'll cover each type of test separately and give you some links to a few places you can try testing. Thanks to a new occupational endeavor, I can finally start taking the tests (or beg my brother to do so) and will periodically update you with reviews of specific tests and my personal experience with them. Even if it is a long while before my own experiences start rolling in, I am sure that you'll have enough pros and cons from the upcoming DNA series to choose the test(s) most beneficial to you with enough understanding of how they work to set your expectations to the appropriate level. This week is about setting the expectation that what we know is always evolving, the world is always growing and family is always family. I'll give you time to let that absorb in, because I also want you to evaluate what you currently know. How good is your current documentation? How reliable are your sources? How deep has your research been for each individual? Do you look only for census records or do you include newspapers, church records, military documents, etc.? Are you prepared for unsavory truths (illegitimacy, crime, war)? Are you willing to weigh each source on it's own merits no matter your emotional attachment to the individuals you are researching?

And are you willing to learn? The only way to truly bring to light the reality of our ancestors' lives is to learn not only about them, but about the processes of research and genealogy in general. There's a lot to take in on any one subject and general family research will bring you into contact with a great many of them. I know how to research in the U.S. and U.K. With the new information about my great grandmother's parents, I now have to research in Poland and Lithuania. I have to accept the fact that I don't know about naming systems, vital record repositories or general history of the areas in question. I may only scratch enough of the surface to get what I want, or I could become an all-knowing expert on Baltic genealogy (probably not). No matter what I give you in this series of DNA, there will be more I don't know. If you want to be well-versed in genetic genealogy, you'll have to do some research. You'll have to read about the various tests, genome science, anthropology, companies that provide the tests, how to share your results, geography and history...... no matter what, "There is more that you'll never know than you'll ever know."

The point is to know when you don't know enough,
-Ana