A (very) small representation of the "family tree" of Proto-Indo-European languages |
I love talking languages almost as much as I like talking genealogy (sometimes it's a dead heat). One thing I enjoy using when talking about language is my first name, Starr. There is no language that I know of that doesn't have a word for "star". So it's a perfect way to illustrate the connectivity of language. And I can use that to illustrate some of the concepts of genetic genealogy that you need in order to set expectations and choose the right tests for your research. You'll note in the pictures above and below that some words are in red. Each is the word for "star" in the neighboring language. If the large tree above is too small (click to enlarge), Proto-Indo-European has the word H'ster. As you follow the branches, you'll see similarities follow through to almost every connected language. Because of accent, slang or some other mutation due to unique environments, each language changes the word just a little bit. But PIE still lives inside.
I put Persian and Urdu in their alphabet so you could see the spelling similarities and the other three in Roman characters to show a similarity of pronunciation. |
Naturally, no one speaks PIE now. In fact, PIE existed (if it did exist) long before written word and we speak offshoots that have small similarities to PIE. So how do we prove PIE? Well, linguists noticed that Spanish, Italian, French and Romanian (and many others) all had similar words. This branch was easily connected to Latin (the language of Roman conquerors who tore through most of Europe all the way to England), because scholars and churches still used Latin. the gradual change from the mother-tongue to what's called the Romance Languages was documented by their written records. Knowing when a document was created, researchers were able to identify when a spelling or total word change (mutation) happened and connect it to an earlier form of the language until they reached the purer form of Vulgar Latin (there is a so-called Church Latin that is a bit more formal). Scientists took other languages and studied their words, grammars and date of first recorded use to help group the languages together and link them to similar but now dead languages. Parent languages were determined by having the same or similar words as all their resulting languages, but missing the differences of invented words. If 3 languages have the same or similar word for bicycle, but different words for car, then their parent language had no word for car, but a similar word for bicycle. As I point out in the photo above, Sanskrit uses the word Tara for "star". Vedic Sanskrit must have a similar word, because Sanskrit's sister, Prakrit, has children that use the same sounds in their words (Sitara and Takara). Also note that Persian is Urdu's 2nd cousin once removed, but their spelling for the word star is very similar. To be that close, linguists argue, Proto-Indo-European (or an intervening now dead language) had to have had a similar word and alphabet. By tracing when the differences pop up, linguists can get a rough guess of when a population diverged from it's siblings.
While language "mutation" doesn't follow gene mutation exactly, the way geneticists determine parent genes is very similar to how linguists determine parent languages. When a gene strand, let's call it Ted, mutates, scientists mark the mutation. So after generations, four people decide to test their DNA. Judy has TedAB, Frank has TedABC, Mikki has TedABD, and Bernardo has TedAE. Obviously, Judy, Frank and Mikki are related more closely since they have the A and B mutations. Frank's family tree is documented to, say, Italy. Mikki is from Russia and Judy has documented Indian heritage for at least 10 generations. So scientists reason mutation C is an Italian mutation and D is an Russian mutation. At some point in the past (probably earlier than Mikki's family has documented), Frank and Mikki's ancestors were in India. Bernardo's family is less connected because he doesn't have the B mutation. At some point in ancient unknown history, Bernardo's family left the main group with the TedA mutation prior to the B mutation. Using historical documents and as many living test subjects as one can, scientists build an algorithm that guesses the most likely migration pattern of the Ted gene. Bernardo's family is connected to a large population of E mutations who still live in China. A scientist's best guess would be that the original TedA group split up with some going to India and some to China. But where did A come from? With more tests, scientists find a group of West Africans who have TedF. No A. An isolated tribe in South Africa is discovered and tested. They have TedFG. Aha! So their family originated from somewhere in West Africa. But where is the original Ted gene? As far as anyone could tell, since no one has found an unmutated Ted gene, the origins have to be somewhere between North and West Africa. That's a lot of mileage to cover. The more people who are tested, the more who have good documentation of their own personal family migration, scientists can make the picture of when and how Ted began to mutate clearer.
When Richard III was recently discovered, scientists used two documented living relatives of Richard's sister and tested their mitochondrial DNA. Mitochondrial DNA is passed directly from mother to child with no interference from the father's side (with few exceptions). Their mutations are specific. One can watch the family tree of mtDNA grow and see where the changes were made. The more mutative markers that match between two people's mtDNA, the more closely related their direct maternal line. It's still a bit of a best guess, but because of matches in the mtDNA (and other physical proofs), scientists feel confident enough to declare they have found Richard III. When you have your own DNA tested, whether it be the ethnicity (autosomal) test, Y chromosome, or mitochondrial DNA, you'll be matched against living people who have taken the test. You'll be connected to people who have the same mutations. The more mutations in common, the closer you're connected to that person. To find out if you are related to someone who is deceased (whether it be your unknown great grandfather or Charlemagne), you'll be matched to living people who have a documented proof for their connection. If there is no hard proof that a living person is connected to the deceased person in question, you will not be able to prove a connection yourself.
Now, chances are you're reading this blog in English. Are you from England? Are your parents? If I went through your tree for 10 generations, would I find only English ancestry (1022 directly related people all from England)? I'm guessing not. English is of German origin. This is better seen in Old English rather than Modern English. Why? Because English has had influence from several languages since it's beginning. We no longer use "thee" and "thou", which interestingly were originally spelled with a letter that looked very similar to a y. When we dropped that letter, we replaced it with y (and that's how "thou" became "you"). We borrow from other modern languages for "taco", "kimono", and "aloha". We have thousands of borrowed or improved words from Latin, because it was the language of scholars and conquerors for so long. Prior to today's post, you may not have realised that English, Hindu and Albanian are cousins. But if you heard all three, you may have noticed similar sounds, words, or alphabets. Many people can mistakenly believe they are understanding a foreign language, because of these similarities that transcend written history.
In genetics, mutations come in several forms. A gene can be deleted (goodbye to "thee" and "panchymagogue"). Or a gene can be mistranscribed in only one spot (what's called an SNP or single nucleotide polymorphism). This would be like the confusion of there/their or your/you're in English. A gene can be inverted (copied upside down). Similar to my new pet peeve: people who literally use literally wrong. I've already mentioned borrowed words. In genetics, that would be translocation or insertion. Every human gets 23 chromosomes from mom and 23 from dad. One pair is the sex chromosomes of XX or XY. The other 22 pairs are called autosomal and they provide most of our genetic makeup. In a chromosome pair, sometimes a gene will be transferred from mom's gene to dad's gene (or vice versa). Sometimes they'll swap genes. What this means is that when a cell is divided to make the egg or sperm, the half that is made the egg/sperm may have more or less of mom or dad in it because of which chromosome makes it to the new cell. The child made from this combination isn't an exact 25% of each grandparent and can be missing an ethnicity marker or have more than documentation would allow. You'll notice in the photo above the different words in Irish and Scots Gaelic that don't really seem to fit. There are many words for "star" in many languages and Irish Gaelic also uses "seren" like Welsh does. And because of the influence of English (due to German invaders and modern prevalence), Welsh also accepts "star" in conversation. Irish and Scots share a different word that suggests an indigenous tribes of human predate Celts and were replaced (the now defunct father-language translocated it's native word into the Celtic mother-language). And the English word "star" has been inserted into the Celtic languages.
These are so close to each other, they might as well be the "John Smith" of language! |
Ideally, both linguists and geneticists want to find isolated groups of people to test. The more remote and insular, the better. Because of the large influence of Latin, it shows up in English despite English's Germannic origins. The above enlargement of the Proto-Balto-Slavic branch of PIE's family tree shows what scientists really hate to find. Proto-Balto-Slavic covers Central to Eastern Europe and a great deal of Asia. Where did it start? How long ago did it break away from PIE? Is Macedonian really a cousin of Croatian, or are they siblings? Where's the conclusive proof? There is none really. It's all a best guess. Researchers look at when languages were first documented. They identify the earliest known ethnic identity that is specifically different from it's neighbor. And then they guess. (Seriously.) Anyone who has studied the history of the countries involved here knows that the borders changed more frequently than Taylor Swift's boyfriends. So where does one draw the line? Where does Lithuania end and Poland begin?
In genetics, this problem runs rampant. Anyone dealing with high Scandinavian or missing German ethnic markers knows what I mean. The Euro-Asiatic land mass was large and relatively accessible. People were coming and going and conquering and being conquered all of the time. One group would win today only to lose tomorrow. So testing for their specific location markers is difficult. A mutation may have originated in Scandinavia, but the vikings ran rampant all over the place (more than once). And we all know about Genghis Khan! While the labs get more tests and refine the algorithms used to decide where Central Europe turns into Eastern Europe, we just have to be patient. As far as Native American markers, many people want to know the specific tribe they belong to. That isn't possible, because so many tribes traded and intermarried leaving no definitive mutations that point to one tribe over another.
Geneticists are also trying to find our origins, but have the same problems. Humans move and mate a lot. It's difficult to detrmine if the markers in a group's DNA are exclusive to them or a larger group. It's near impossible to be definitive on whether it proves a connection to nearby neighbors or indicates a deeper, older connection to a long dead group. All reasonable research shows a common origin point of Africa and a general migration patern from there, but genealogists want something more definitive. And it's just not there yet. It really is more guess than science right now. Should we give up on it? Not at all! The more tests taken by more people, the more information we have to narrow down the results. The science, just like humans, is in constant evolution.
And I just realised I also illustrated why no one can define an origin point for a surname either. Dang, I'm good.
-Hoshi
I am also a student of linguistics. I enjoyed your discussion here. I did want to point out that You is actually the object form of Ye. What actually happened with Thou/Thee/Thy/Thine was a social shift. This singular/familiar form was gradually lost and replaced by the plural/formal form as a result of the rise of the middle class and social mobility in England during the Renaissance. I have been thinking recently about the use of the objective form of you as a subject and the loss of the subject form. You may be partially right in that thou and you would have sounded similar, ye sounding also like the objective thee. Anyway, I just wanted to add that minor correction while letting you know I appreciate your post.
ReplyDeleteSorry for the lateness of my reply. Thank you for your comments and it's always good to hear of another enthusiast out there. I appreciate your correction. I had read a book about the shift once and couldn't remember the details as well as I had hoped, but thought the example would be adequate to illustrate my point.
Deletewhat is your Y-haplogroup ?
ReplyDeleteMy Y-haplogroup is R-P312.
ReplyDeleteForgive me. My Y-haplogroup is now R-DF27. I'm planning to test for SNP Z225. FTDNA is behind in designating and showing these new SNPs in its haplotree.
ReplyDelete