What has modern genetics to say about the Basques? From time to time you get information in the press and it is often contradictory. I will provide my understanding on this issue. The source of information used is the book 'History and Geography of Human Genes' by Cavalli-Sforza. The book is considered the classic reference on the issue.


In moderns genetics relationship is studied by studying the percentage of absence of presence of significative genes. You can make maps depicting the percentages for different peoples and you could try to look for relatives. The problem is that there are plenty of genes and the distribution of each one is very different. This makes very difficult the interpretation of data. It could be the case that according to one group of genes Basques appears to be related to the inhabitants of Toledo (Ohio) and according to others to the inhabitants of Toledo (Spain). So to data have to be used with care and a good comparison shall use as many genes as possible.

I will use music as an example to understand the concepts. Each gene is a note; so the total description that includes all the genes is a song.

This will allow to define the concept of genetic distance. Genetic distance between two populations is just how different the corresponding songs are. Because a song is made up of many notes it could happen that some parts of the tune sounds rather similar but the overall composition is still very different. Genetic distance is expressed as a value between pair of populations. More or less in the same way that it is possible to define distances between any pair of points on the Earth, or it will be possible the define the degree of difference between songs by adding the differences note after note.

To be able to put a little bit of order on the chaos the second concept use is called Principal Components. They can be understood as the different musical instruments that can by used to play the different songs. Each instrument will have its own harmonies that will fit better or worse in the interpretation of the different songs.

In principle each song can be described as a combination of just a few well chosen musical instruments. This is much simpler that its 'true' description using a very high number of musical notes.

The musical instrument that is dominant in a given song or set of songs is called the first principal component. The second instrument will be the second principal component and so on. To be able to describe 'completely' all the notes we will need a very high number of instruments. Nevertheless to use just the few most important instruments will cover most of the information.

The key argument is that Principal Components have to be historically important because they are able to put order into the chaos and this has to have a meaning. So the book does most of the historical interpretation based on them.


After the previous explanations it is possible to provide the main finding of the book on the Basques:

1) They are the focal point for the fifth European principal component.

2) They are outliers. An outlier is a group whose genetic 'song' includes little of the most usual instruments or includes them in a uncommon combination.

3) They do not lie too far away. That means besides being special, the Basque song is not too different from the Spanish or French.

The conclusion of the book is that:

Genetics points to a persistence of Paleolithic type in the Basque region that was once more widespread over Europe.


Genetic distance of Basques to selected groups.

Basque to Berber 392

Basque to Middle East 246

Basque to English 119

Basque to Spanish 104

Basque to French 98

Basque to Lapps 629 (largest distance between Europe)

Basque to New Guineas 2149 (largest distance anywhere on Earth)

According to the book, the European collectives that are 'outliers' in order of strangeness are: Lapps (distance to the center of around 250), Sardinians (200), Basques (100) and Icelanders (80).

Europe first and strongest principal component focus is located on the Middle East. According to him, it represents the expansion of early agriculturalists from there into Europe. It carries 28% of the information

Second component has a maximum in East Russia and Lapland. It has a minimum in Spain. It is associated to a possible immigration from the East or to adaptation to climate. It carries 22% of the information

Third component has a maximum in Ukraine and a minimum in Spain and north Scandinavia. It is associated with the Expansion of the Indo-Europeans. It carries 10% of the information.

Fourth. It has a maximum on Greece. It is associated with the Greek expansion on classical times . It carries 7 % of the information

Fifth. It is center exactly in the Basque country. It is associated to the remains of the original European Paleolithic population. It carries 5.3 % of the information.

The book also includes detailed study on Spain and France. It is interested to point out that the first principal component of Spain is centered in the Basque country. It can be said that the most important part of the Spanish genetic make-up is how many Basque genes you have. The least Basque areas are Galicia, Portugal and Gerona, Andalucia and Valencia have still reasonable level of 'Basqueness'.


All the sentences below are taken directly from the book without any comment or modification.

The simplest interpretation of the third and seventh principal components is to associate them with the diffusion of the Kurgan culture which according to Gimbutas spread Indo-European languages to Europe.

The sixth principal components show a concentric gradient with a maximum located on the eastern shore of the Black See where several north Caucasian languages are spoken. There also local maxima at Lapland and the Basque area. Interestingly the fifth principal component showed a similar phenomena.

It is difficult to exclude the possibility that the expansion of proto-Caucasian (proto-Basque) speakers was later than the first expansion of anatomically modern man to Europe

The similarity between Basques and some other populations, in particular in the Caucasus, which appear in two Principal Components may be relics of the pre-Neolithic background. An Italian linguist (Trombeti 1923) has identified a pre-Indoeuropean substratum common to Basque and Caucasian populations. Lapps show some relationship in more than one map. This can indicate that parts of the Caucasic background of Lapps is of Paleolithic origin.

The Caucasus, the Basque region, and other mountains in Europe may still retain some detectable genetic traces of the upper Paleolithic.

The archaeological tool kit in the Basque region shows a well-defined Azilian culture, there are local peculiarities in the Mesolithic as well. There has been a continuous local development since the Magdalanian with a certain degree of success. It is clear that in this area hunting-gathering remained a reasonable alternative to farming for a longer time and may have delayed the full acceptance of agriculture.


In my opinion, the capital sin dealing with is to claim that a people is connected with another because this or that gen appears very similar on both. Going back to the history of the songs. Two songs can be very different but to include a couple of very similar compasses.

A good example of this approach is provided by an article that somebody passed to me. It appeared in the Spanish magazine Tiempo in Marzo 1997. I think that also Mundo produced something on the same line.

The base of the article is that according to some research if you consider only the HLA groups of genes, the genetic distance of Berbers and Spanish is small and that the three groups share a number of rare combinations. The fact that these local similarities exists is important and it deserves an explanation but it cannot be deduced from them that Basques descend from North Africa through Iberia that is more or less what the article does.

The article is reinforced with a linguistic part where most of the space is dedicated to the Iberic from Basque interpretations of Alonso Garcia, that were ridiculed some time ago in this list by Larry.

The article contains strange statement like 'According to Solanas Penya, the Welsh, Irish, Basque and Berber languages are member of the hamitic group, but others think it is related to Caucasian and according to Jorge Alonso it can be assimilated to Iberian and Berber'.

The critic to this approach is that HLA is only a group of genes i,e, a part of the song and that a meaningful comparison has to take into account all the music. This is what Cavalli Sforza has done arriving to completely different conclusions. Another example: Basque has the dubious honor of having the highest incidence on the world of the mutant F-508 associated to the deadly cystic fibrosis illness. Denmark has also a very high incidence but from this isolated event we cannot deduce that Danish and Basque are the two most closely related peoples on the world.

If you look to the list of genetic distances provided above you will se that genetic distance between Basques and Berbers is high. The distances above have been elaborated include HLA genes and many more. So they carry much more information.