I have had a debt with this blog for quite a while, in not presenting the results of Delfino and Salas, a very interesting paper (Spanish version here, English version here) that has taken a look at the recall vote form a different angle than previous studies. In some sense this has actually been good, because now an old friend and colleague of mine, Rodrigo Medina, has expanded the work of Delfino and Salas, showing that it is indeed quite difficult to explain away some of the surprising results from the recall vote. I will jump back and forth between the two papers in my discussion and presentation.
As was the case with other studies of the recall vote, I will try to explain some of these results in as simple a manner as possible. I will do it in sections, so as not to make it too long. Today I will talk about the correlations between the number of signatures for the petition to recall Hugo Chavez and the number of Si (Yes) votes (Vote to recall) at the same voting center, separating the centers into whether they were manual or automated in how the votes were processed and counted.
The first thing to look at is what is the correlation between the Si vote in the recall referendum and the signatures gathered in order to call for the referendum. Onw would think that given the difficulties and limitations in gathering the signatures, as well as the rejection of many signatures by the CNE, the signature values at each center represent a floor. The Carter Center actually cited in its reports the strong correlation obtained between these two variables in the centers which were automated, but said nothing about the manual ones. Indeed, when one calculates the correlation between these two variables for automated centers, one obtains a very strong correlation between the two as shown by Delfino and Salas in their Table I or Medina in his Figure 1, which shows that the correlation coefficient is a remarkably high 0.989, as shown in the following figure (left side) from medina’s paper:
Figure 1: Left: Yes votes versus signatures at the automated centers. Right: The same for the manual centers, ecluding points from abroad.
For those not too familiar with the concept of correlation, the “cloud” of points in Fig. 1 (left) would be a straight line if the coefficient were 1.00 and would be a circle of points if it were zero. That the correlation is so high is somewhat surprising. First of all, the number of signature centers was restricted; there were only 2600 centers for the signatures versus 8300 voting centers. Moreover, the number of signatures that could be collected in the process was only 30% of the voters, limiting the total possible, while more than 80% of Venezuelans participated in the recall vote. The forms were on top of that distributed uniformly throughout Venezuela, rather than according to the distribution of voters. Additionally, there were many factors why some of the signatures were missing or not taken into account, such as the CNE invalidating a lot of them, the signatures being public, forms were lost and there were pressures for people to withdraw their signatures. The vote in the recall process on the other hand was supposedly secret.
In contrast to this result, in the centers where the voting process was manual, show on the right of Figure 1, the correlation was much less stronger, being only 0. 9264. In the figure, the votes abroad were plotted as squares and not taken into account in the calculation because as can be seen they were much different than the other manual centers for reasons that do not have much to do with the study. They were simply excluded.
If you think about what these correlations mean, there is no reason a priori for much a big difference between automated and manual centers. What the correlation is simply telling us is that in centers with few signatures, few people voted against Chavez and in those with lots of people signing, lots of people voted against Chavez. In fact, what determines whether a center was automated or not is largely the total number of voters at taht center, so there is no reason why centers in similar areas in terms of socio-economic conditions would have different behavior, but they do, as we will see later
The surprising differnec between manual and automated centers can be shown better by making the scales similar in the two plots above as was done by Medina in his Fig. 2 to show the behavior when the number of voters and signatures was small in both cases:
Fig. 2. Plots of the number of yes votes as a function of the number of signatures when the number of signatures is less than 600 for both manual (left) and automated centers (right)
Note how different the two are. In the manual case, the dispersion is larger broadening out as it increases. In contrast, in the automated centers it actually narrows down as it reaches zero. This is truly unusual as you would expect fluctuations to be larger as the number of signatures becomes smaller (as the number of signatures goes to zero, there is a higher possibility that a few people will show up and vote against Chavez in some ceneters). In fact, the manual centers behave the way you would expect, the smaller the number the signatures the larger the variations one would expect in the total number of anti-Chavez votes in that same center. In technical terms: fluctuations should be larger as the number of people that signed was smaller.
There is another way of showing how anomalous this is, as done by Delfino and Salas. You order the centers according to the fraction of people that signed the petition to recall Chavez, from the smallest number of signatures to the largest number, in both manual centers and automated centers. Now, you calculate the correlation for only the 150 centers with the smallest number of signatures, that is, you calculate the correlation for the centers 1 through 150 and that is your first point for which you calcualte the correlation. Then, you do the same between numbers 2 through 151, then 3 through 152, then 4 through 154 etc. First of all, since it is a matter of numbers, you would expect the same qualitative behavior in both the manual and automated centers. Second, you would expect more fluctuations at the lower end of the graph since you are calculating the correlations only a range, thus the centers with the lowest number of signatures should show the largest fluctuations. However, this is not what happens as shown in the figures below: The manual centers show the expected behavior, but the automated centers show practically no change in the correlation as the size increases. This certainly makes absolutely no sense, as the number gets smaller in both cases the correlations should definitely fluctuate.
Fig. 3 Correlations calculated for 150 centers as the number of signatures in each center increases, that is, first the correlation is calculated for the 150 centers with the lowest number of signatures, then the smallest center is dropped and the next one with more signatures is included in the sample and so on. Note how in the manual centers (top left) the fluctuations in the calculated correlation go even lower than 0.5 moving around significantly and then increasing to a fairly constant value above the sample #1400. In contrast, the automated centers have the same value for the correlation.
The behavior of the automated centers is simply absurd in Figure 3.
Finally, Medina looked at some interesting correlations in municipalities that you would expect to be quite similar:
Figure 4: Three municipalities that should have the same proportionality between the number of signatures and the Si (Yes) vote against Chavez, from left to right: Naguanagua (left), Duaca (right).
Let us look first at the graph on the left of Figure 4 corresponding to Naguanagua. There are two very clear lines: In one, that with small crosses the number of Yes (Si) votes in the recall is almost perfectly proportional to the number of signatures to hold the recall vote against Chavez, all point practically falling in a straight line. In contrast in the manual centers of the same municipality, the line has a slope which is much larger. Thus, in these centers, the number of people voting to recall Chavez is larger than the signatures while in the automated centers is roughly the same and follows the same proportionality. Curious, no?
In the middle figure, corresponding to the Duaca municipality, the automated centers follow once gain proportionality with the number of signatures. But those centers, in which automation failed, curiously fall all over the place.
What this all suggests and will be explained in future articles, is that basically, the number of votes in the automated centers, was somehow interfered with and the final outcome was simply a number generated in such a way that it would be proportional to the number of signatures at that center. Meanwhile the number of votes in the manual centers were the real ones. In the next post on the subject, the correlations will be looked at in a different way that brings our better the significant differences between the automated and manual centers.