Archive for the 'RR Models' Category

On Mathematical Models of the recall vote and fraud: Delfino, Salas and Medina part II: Making the inconsistencies in the results quite evident

May 12, 2006


While in part I of my presentation of the Delfino, Salas and Medina results, I emphasized the correlation between the signatures collected to call for the referendum to recall Hugo Chavez and the number of actual Si (Yes) votes to recall at the recall referendum, I only did that in order to use as simple a language as possible as an introduction to the topic.

What Delfino and Salas did was to plot the data in a different manner in order to bring the anomalies out better in the data.

What they actually plotted was a “normalized” parameter k equals to

      Yes(Si) Votes
k= ——————

       Signatures


As a function of another “normalized” parameter f

      Signatures
f= ——————
      Total Votes

The reason for plotting the data this way, is that it magnifies those voting centers in which the number of Yes (Si) votes is much larger than the number of signatures at that center. Think about it. First of all f is limited to be between zero and one, the maximum number of signatures at one center can only be at most the number of voters at the same center. On the other hand, given the difficulties, limitations and methods for obtaining the signatures as discussed in part I of these articles on Delfino, Salas and Medina, there should be a number of centers where with a low number of signatures, but a high number of Yes (Si) votes, where people did go out and vote but could not sign the petition. Additionally, this would be emphasized in those centers with low f, since f measures the number of signatures. In those centers with difficulties to gather the signatures, the number of people signing should be small, but you would expect the number of people voting Yes (Si) to vary significantly, to fluctuate!

Well, remarkably this does not happen in the automated centers as shown in Fig. 1 (left) but does in the manual centers shown in Fig. 1 (right):

                                   f                                                                                                     f

Fig. 1 (Left) k versus f for automated centers (Right) The same k versus f but for the manual centers separating the centers abroad from the data set, because there were special difficulties for gathering signatures for those living abroad.

What is most remarkable about Fig. 1 (Left) is that despite the difficulties in obtaining the signatures, the data for the automated centers is quite uniform and there are very few centers where the number of actual SI (Yes) votes exceeds significantly the number of signatures. Only in seven automated centers is k >2 which is remarkable given that there were forms for only 30% of the people to sign, while everyone could go and vote. In fact, only seven of the automated centers exceeds k=2 but none of them do it by much.

In contrast with the result for the automated centers, in the manual centers the number of pints falling above k> 2 is large and you can see points as high as k close to 10, as would be expected from a process that was so difficult as that of the signatures. This is what you would expect, as only 30% of the people could sign, while close to 70% of them actually voted in the recall referendum. This should generate the type of fluctuations you see in the manual centers but is absent in the automated centers. This is very strange and makes no sense!

If the result above is strange, in my own mind, it is its inconsistency with the next graph that proves the the fraud. The opposition came out of the recall vote absolutely demoralized, three months later in October tehre were regional elections. There was not only a campaign to promote abstention, but abstention more than doubled, going from 30% in the recall referendum to over 70% in the regional elections in October 2004. Despite this, take a look at what happened if we plot the pro-opposition votes as a function of the recall signatures below on the left, in the same centers that were automated for the recall vote:

                                           f                                                                                        f

Figure 2. Left: Opposition votes normalized to the number of signatures k, at each center as a function of f the fraction of signatures to voters at each center for the regional election in October 2004. Right: The automated centers once again just for direct one to one comparison.

To me this graph is absolutely compelling: There are more than three dozen points above k=2 in contrast to the seven at the recall vote. There are points as high as k=6, this despite the fact that abstention was double in the regional election what it was in the recall, that it was the opposition that mostly abstained in it and nevertheless the opposition actually increased the number of votes with respect to the signatures in dozens of voting centers, all at once! In fact, I repeat the same plot for the automated centers in the recall next to that regional election just so that you can see how different the two results are.

Personally, I would like to challenge the Carter Center or whomever they designate to even attempt to explain how the results of the regional elections could be what they were compared to the recall vote in Figure. 2 and what was the mysterious mechanism by which opposition voters in so many centers came out in larger number that October to give those results, despite the higher abstention and the demoralized opposition. Where were this people the day of the recall? Why didn’t they go vote and then all of them in synch showed up in October 2004? This simply has no other explanation that the Delfino Salas hypothesis, which I advanced in my conclusions of Part I on the correlations. :

The official results of the recall vote in the automated centers were forced to follow a linear relation with respect to the number of signatures obtained at each of those centers in the recall petition.


For the sake of completeness, I also include below the graphs of the votes against Chavze in the 1998 and 2000 elections, both at the peak of Chavez’ popularity. Despite this, values as high as k=4 or even above can be seen in both cases. These are magically and mysteriously missing from the automated centers in the recall vote:

                                       f                                                                                                f

Fig. 3 Results for the 1998 and 2000 opposition votes as a function of the signatures in the recall petition k, as a function of the number of signatures collected in each center normalized to the total number of voters at each center.

Next, part III: We get a little dense to show that the statistical characteristics of the result of the recall vote show mathematically that the data came from a single set of numbers and not two as expected, indicating the results were obtained from the signatures used to petition the recall.

On Mathematical Models of the recall vote and fraud: Delfino, Salas and Medina part I: The correlations

May 7, 2006


I have had a debt with this blog for quite a while, in not presenting the results of Delfino and Salas, a very interesting paper (Spanish version here, English version here) that has taken a look at the recall vote form a different angle than previous studies. In some sense this has actually been good, because now an old friend and colleague of mine, Rodrigo Medina, has expanded the work of Delfino and Salas, showing that it is indeed quite difficult to explain away some of the surprising results from the recall vote. I will jump back and forth between the two papers in my discussion and presentation.

As was the case with other studies of the recall vote, I will try to explain some of these results in as simple a manner as possible. I will do it in sections, so as not to make it too long. Today I will talk about the correlations between the number of signatures for the petition to recall Hugo Chavez and the number of Si (Yes) votes (Vote to recall) at the same voting center, separating the centers into whether they were manual or automated in how the votes were processed and counted.

The first thing to look at is what is the correlation between the Si vote in the recall referendum and the signatures gathered in order to call for the referendum. Onw would think that given the difficulties and limitations in gathering the signatures, as well as the rejection of many signatures by the CNE, the signature values at each center represent a floor. The Carter Center actually cited in its reports the strong correlation obtained between these two variables in the centers which were automated, but said nothing about the manual ones. Indeed, when one calculates the correlation between these two variables for automated centers, one obtains a very strong correlation between the two as shown by Delfino and Salas in their Table I or Medina in his Figure 1, which shows that the correlation coefficient is a remarkably high 0.989, as shown in the following figure (left side) from medina’s paper:

Figure 1: Left: Yes votes versus signatures at the automated centers. Right: The same for the manual centers, ecluding points from abroad.

For those not too familiar with the concept of correlation, the “cloud” of points in Fig. 1 (left) would be a straight line if the coefficient were 1.00 and would be a circle of points if it were zero. That the correlation is so high is somewhat surprising. First of all, the number of signature centers was restricted; there were only 2600 centers for the signatures versus 8300 voting centers. Moreover, the number of signatures that could be collected in the process was only 30% of the voters, limiting the total possible, while more than 80% of Venezuelans participated in the recall vote. The forms were on top of that distributed uniformly throughout Venezuela, rather than according to the distribution of voters. Additionally, there were many factors why some of the signatures were missing or not taken into account, such as the CNE invalidating a lot of them, the signatures being public, forms were lost and there were pressures for people to withdraw their signatures. The vote in the recall process on the other hand was supposedly secret.

In contrast to this result, in the centers where the voting process was manual, show on the right of Figure 1, the correlation was much less stronger, being only 0. 9264. In the figure, the votes abroad were plotted as squares and not taken into account in the calculation because as can be seen they were much different than the other manual centers for reasons that do not have much to do with the study. They were simply excluded.

If you think about what these correlations mean, there is no reason a priori for much a big difference between automated and manual centers. What the correlation is simply telling us is that in centers with few signatures, few people voted against Chavez and in those with lots of people signing, lots of people voted against Chavez. In fact, what determines whether a center was automated or not is largely the total number of voters at taht center, so there is no reason why centers in similar areas in terms of socio-economic conditions would have different behavior, but they do, as we will see later

The surprising differnec between manual and automated centers can be shown better by making the scales similar in the two plots above as was done by Medina in his Fig. 2 to show the behavior when the number of voters and signatures was small in both cases:


Fig. 2. Plots of the number of yes votes as a function of the number of signatures when the number of signatures is less than 600 for both manual (left) and automated centers (right)

Note how different the two are. In the manual case, the dispersion is larger broadening out as it increases. In contrast, in the automated centers it actually narrows down as it reaches zero. This is truly unusual as you would expect fluctuations to be larger as the number of signatures becomes smaller (as the number of signatures goes to zero, there is a higher possibility that a few people will show up and vote against Chavez in some ceneters). In fact, the manual centers behave the way you would expect, the smaller the number the signatures the larger the variations one would expect in the total number of anti-Chavez votes in that same center. In technical terms: fluctuations should be larger as the number of people that signed was smaller.

There is another way of showing how anomalous this is, as done by Delfino and Salas. You order the centers according to the fraction of people that signed the petition to recall Chavez, from the smallest number of signatures to the largest number, in both manual centers and automated centers. Now, you calculate the correlation for only the 150 centers with the smallest number of signatures, that is, you calculate the correlation for the centers 1 through 150 and that is your first point for which you calcualte the correlation. Then, you do the same between numbers 2 through 151, then 3 through 152, then 4 through 154 etc. First of all, since it is a matter of numbers, you would expect the same qualitative behavior in both the manual and automated centers. Second, you would expect more fluctuations at the lower end of the graph since you are calculating the correlations only a range, thus the centers with the lowest number of signatures should show the largest fluctuations. However, this is not what happens as shown in the figures below: The manual centers show the expected behavior, but the automated centers show practically no change in the correlation as the size increases. This certainly makes absolutely no sense, as the number gets smaller in both cases the correlations should definitely fluctuate.

Fig. 3 Correlations calculated for 150 centers as the number of signatures in each center increases, that is, first the correlation is calculated for the 150 centers with the lowest number of signatures, then the smallest center is dropped and the next one with more signatures is included in the sample and so on. Note how in the manual centers (top left) the fluctuations in the calculated correlation go even lower than 0.5 moving around significantly and then increasing to a fairly constant value above the sample #1400. In contrast, the automated centers have the same value for the correlation.

The behavior of the automated centers is simply absurd in Figure 3.

Finally, Medina looked at some interesting correlations in municipalities that you would expect to be quite similar:

Figure 4: Three municipalities that should have the same proportionality between the number of signatures and the Si (Yes) vote against Chavez, from left to right: Naguanagua (left), Duaca (right).

Let us look first at the graph on the left of Figure 4 corresponding to Naguanagua. There are two very clear lines: In one, that with small crosses the number of Yes (Si) votes in the recall is almost perfectly proportional to the number of signatures to hold the recall vote against Chavez, all point practically falling in a straight line. In contrast in the manual centers of the same municipality, the line has a slope which is much larger. Thus, in these centers, the number of people voting to recall Chavez is larger than the signatures while in the automated centers is roughly the same and follows the same proportionality. Curious, no?

In the middle figure, corresponding to the Duaca municipality, the automated centers follow once gain proportionality with the number of signatures. But those centers, in which automation failed, curiously fall all over the place.

What this all suggests and will be explained in future articles, is that basically, the number of votes in the automated centers, was somehow interfered with and the final outcome was simply a number generated in such a way that it would be proportional to the number of signatures at that center. Meanwhile the number of votes in the manual centers were the real ones. In the next post on the subject, the correlations will be looked at in a different way that brings our better the significant differences between the automated and manual centers.

The August enigma by Brunilde Sansň

August 19, 2005

My good friend Bruni, whom I “met” via this blog ( We have never
seen each other live!), wrote this wonderful post in her website in Spanish
about her experiences with the Recall Referendum and its aftermath.
Talking to her, the idea came up of translating it, I am not sure who
suggested it, and I offered my blog to post it in that language. I
think it is a wonderful story as told from her point of view, even if
the ending is not what we all wished for. I am honored to have it
become part of my blog.

The August enigma

By
Brunilde Sansň


(The
original story appeared in Spanish in Cuentos
Intrascendentes
)

I dedicate this unimportant tale to
all those that, in spite of the pitfalls, passionately used their knowledge,
their means and their free time in a restless search for the truth.

This is a story
that many Venezuelans lived on different ways.
Some with anguish, others with joy, others with stress and others with
surprise, but definitively, nobody with indifference. It is the story of another one of the riddles
[1,2 ] that the Venezuelans have lived during the government of Chávez. This is my story.

I am truly
amazed that the Venezuelans elected a caudillo, a putchist, a populist as a President. I never had to vote for him , but I would
never have voted for him anyways. My
idea of a country, with strong and democratic
institutions, has nothing to do with the autocratic, clumsy and myopic
conception that the obscure lieutenant colonel Chávez began to sell to the Venezuelans on February 4, 1992 when he accepted the defeat on TV after his
failed coup d’état.

I never had to pronounce myself about Chávez
because I have been living abroad for
many years. I never had to deal with the destruction of the
institutions, with the chavista divisionism, nor with the hatreds or the daily
wearing down that the Chavismo introduced in the country. I never had to watch “Alo Presidente”. Chávez was five flying hours away from my
life, and although I suffered of stellar tantrums when reading the news or talking with friends, I knew that
I was so far away from the Chavism that, in practice, I did not have to live
it.


I followed the “firmazos” (signature collection) and the trickiness
and swings of the signature verifications. I prepared myself for any result in
the Recall Referendum. My impression was
that everything was possible and that I had to be ready for the possibility of
a Chávez victory. But nothing could prepare me to which came
later.


My brother,
who lives in California,
had come with his wife and their baby for a few days of vacation with me. By pure chance, his stay coincided with the
Referendum.


I have two
brothers. One is a lawyer that is quite
funny and amusing. The other is a very
serious and strict professor of statistics.
Luck wanted that it was the statistician the one that accompanied me during the Referendum. I have always been amazed about the turns of
life. If my brother had chosen another branch of mathematics when he decided
his career, perhaps my perception of the Venezuelan situation nowadays would be
different.


We pleasantly
spent the Referendum day. In the dining room
of my house, we installed our laptops and followed the news from heroic
bloggers that were posting on-line from the voting centers. We also checked the
mails from our Venezuelan friends. All reported monumental waiting lines, some
had taken up to 9 hours to vote. The
news were good, exits polls seemed to clearly favor the opposition. Before going to bed we watched on the web a
press conference of the members of the opposition alliance “Coordinadora
Democrática”. They were all smiles and had difficulty containing their joy,
they implied that they had won, and easily. We also saw Jorge Rodriguez, from
the CNE, giving declarations. He looked tired and serious. He had the body language and the expression
of somebody that has been defeated.


We went to
sleep but first we reminded ourselves that we should not celebrate yet because,
with Chávez, one never knows.


We were
goat mouth.


At six
a.m. my husband woke us up with the incredible news that Chávez had won with
exactly the opposite of the votes that were being predicted. We then
read the stunning declarations given by Carrasquero, the president of the CNE, and
followed all news that CNN could present.
We later called some friends who confirmed the situation. They were scared because of the magnitude of what
was happening. We could imagine how unclear
the situation was in Caracas
at that time.


We began
then to systematically call all those that could provide us with information and
data. We wanted to know the truth. We
made no less than forty international calls in a few hours. It was not an easy task. I had cancelled my local long-distance
service provider account a week before and now we had to dial seven extra
digits every time we made a long-distance call. My brother loudly complained about my bad timing every single time he dialed, but in the end, we
were able to contact statisticians, engineers, computer scientists,
mathematicians and physics friends, all those that we thought could give us information or send data to
us. The idea was to collect the data to
analyze how large was really the discrepancy between exits polls and the official
results.


Finally,
my brother was sent the data of the exit
polls from Súmate and Primero Justicia.
The dining room of my house became then a true computing center. My brother and my sister-in-law (also a Statistics
Professor), seated in front of their laptops and meticulously analyzed the data
collected from the exits polls. They
intensely worked whole days, for several days, despite the little baby that was
crawling from the living room to the dining room and that cried sometimes to
complain for being always among uncles and cousins.


My husband
took care of checking up news on the Internet.
I made telephone calls, checked my mail and prepared fast meals, in the style
of a boot camp. My children played with
the baby, made errands or were asked to go fetch reference books. Because of my engineering background, I was
able to understand the results that my brother and my sister-in-law were
obtaining. In most of the voting centers,
there was a significant discrepancy between the exit poll results and the
results published by the CNE, and that, for all the states. The differences could not be explained by factors
such as the selection of the voting centers.
My brother and my sister-in-law, with the professional rigour that
characterizes both of them, repeated each analysis meticulously several times
to make sure that there were no errors.
The study that shows such discrepancies, appears in the report that I
include below as a reference [ 3 ].


As for me, I had heard that there was going to
be an audit based on a random sampling of boxes. The word “random” always rings an alarm
in my mind. Those that work with
stochastic simulations, know that randomness is an ephemeral concept that many
people do not quite grasp. A random
number generator is simply a formula that can be programmed. The formula
produces a sequence of numbers that are generated from a seed. If the seed does not change, the outcome of
the program will be the same sequence.
Random generators have a period after which the sequence of numbers is
repeated. That is why they are usually called pseudorandom generators. A good generator is distinguished from a bad
one based on the length of the period and on how predictable are the numbers
being generated. It is well known that many commercial generators, like the one
in Excel, are far from being random.


When I knew
of the audit, I tried to alert on the issues of the randomness of the ballot box numbers that were going to be
generated. To whom? To whoever I thought should be alerted: I
wrote mails, I made phone calls, I made posts in Venezuelan blogs. My
colleagues made fun of me on that issue.
How was I going to think that the Carter Center
was not aware of that?


I had my
doubts. To my knowledge, the Carter Center’s
experience was in manual elections and is led mainly by social scientists. This was a problem of mathematics and
computer science. I know computer
scientists that often forget about the importance of the seed when they have to
generate a random number.


I wanted to know what program they were going
to use and how they were going to choose the seed.


My worries
were useless. The opposition decided not
to participate in the audit, that the Center Carter and the O.A.S. carried out
with only the presence of CNE personnel.


During the
days that followed, the scientific world related to Venezuela entered a state of
effervescence. Several academics around
the world began to produce studies on different aspects of the results obtained
by the CNE. Many of those studies were thoroughly explained and analyzed in [ 4
]. As for me, I mainly wanted to know
how the seed of the generation in the last audit had been chosen. I wrote to the Carter Center
and I received a cold but courteous answer indicating that since the opposition
was not present in the audit and they considered themselves to be only observers,
the seed had been chosen by the CNE.


I could not believe it. The only guarantee that Venezuelans had to
make sure that the audit was indeed random, fell into the hands of the very institution
whose transparency was in doubt!


I got infuriated with all the actors: with the opposition for not being present,
with the Carter Center and the O.A.S. for letting that happen and with the CNE
for accepting to have a leading role, when what was needed from them was to
show their credibility.


Some time
later, I read in the Carter
Center report that even
the computer and the program used in the audit came from the CNE. Also, in an
email of one of the people involved with the audit, it was explained that the
CNE program had to be used because the input data was not ready in the format
required to use the Carter
Center’s Excel based
program. Some techies on the web cried out then that as everything had been run
on the CNE’s computer, there was not even the need to know the seed beforehand
since a script could have simulated that the program was running and could have
produced as an output a pre-established sequence. As for me, I was rather scandalized
that the Carter Center could even consider using an
Excel program, that is one of the worse in matters of random generation.


The idea
that crossed my mind then was the realization of how naďf can we be when facing
well-known institutions and how, little by little, that naivety is
disappearing. It reminded me of the time, when I was a little
girl, when I used to think that technology
was an invincible thing, created by infallible geniuses. Now that I know the limitations of people,
systems, calculations and models, I ask myself, whenever I get into an
airplane, how many errors were left in the design software and if the
reliability experts really used a good seed when they ran the program to
generate catastrophic events.


I no longer
have blind confidences, my motto is now “The Devil is in the
details”.


In the
days that followed, we saw with frustration and astonishment that the Center
Carter and the O.A.S. accepted as finals the results of the audit and that the
international community and the international press hurried, without asking any
further questions, to spread the good news of the victory of Chávez.


At the end
of one week of intense work, we all decided to go to St.Lambert, a municipality
located South of the island
of Montreal. It is a yuppie area, totally away from the cosmopolitan
interests of a city like Montreal. We had lunch in an pleasant terrace and we
decided to forget about Chávez, statistics and random numbers. We had already lost our family vacations in
contacting, alerting, studying and compiling data and we thought we could spend
a single day of relaxation. We took some
pictures as tourists and slowly walked from the restaurant to the main street that,
by chance, was closed for the town’s summer party. It was a languid party because there were not
many people around. Hardly two clowns
and a kiosk with music tried to animate the few that were present and that
were taking advantage of an end-of-summer sunny day.


The music was not very good, but it animated a
little. Suddenly we stopped in
disbelief. The kiosk singer began to
sing notes that we recognized immediately although the song was distorted by
the incongruous Jazz type adaptation and the foreign accent of the singer. Astonished, we heard, as in a fair far far
away from Venezuela and in an atmosphere also far away from cosmopolitan Montreal that the loudspeakers were screaming: “Yoooo, nací en esta rivera del Arauca
vibrador..”, it was the “Alma Llanera”, Venezuela’s informal national
anthem. We waited for the song to finish
and then we asked the singer why he had chosen that song.


“Because I like it”, was his simple
answer.


With
superstition and optimism, I interpreted that as a positive sign. When I got back home, I rushed to my email to
verify if there were some encouraging news.


I was mistaken. In spite of the skepticism of the Venezuelan
people concerning the Referendum results, the CNE never opened the ballot
boxes, and the international observers never forced it to do it either. To me,
that was the most suspicious fact of all. It seemed to me unprecedented that,
from a human point of view, faced with the mathematical and technical
lucubrations of tens of statisticians,
engineers, physicists and computer scientists, the CNE directors did not take
advantage of the opportunity to open the boxes and shut us all up.


If I had
been in the board of the CNE, I would
have relished opening every single box and show every single vote to the army
of nerds whose sophisticated models were showing discrepancies in the results.
Frankly, I am still astonished that the directors of the CNE refused themselves
such a satisfaction.


And, from
the institutional point of view, not opening the boxes was the most dangerous
gesture for democracy: a democratic
institution in charge of the elections of a country must, before anything,
provide the population with the certainty that the system works and that is
right and transparent. By refusing to go
beyond the audit in spite of the strong doubts that weighed on the results, the
CNE sealed forever the box that contains the solution of the August enigma and
condemned the Venezuelan people to a perpetual doubt on the validity of their
democracy and their electoral system.


Everything was not negative. The experience made me appreciate the
efficiency and the dedication of Venezuelans of great talent who put their knowledge,
their reputation, their abilities and their time in the passionate search for an
answer to the enigma of August.


So this is
how, after a year of analysis, readings and memories, I have decided to embody
my memory in this true story that ends with the Alma Llanera.

Unfortunately
for democracy in Venezuela,
it is a story that, for the time being, has just become an unimportant tale.

[1] S. La
Fuente and A. Mesa. “El acertijo de Abril.- Relato periodístico de la breve
caida de Hugo Chávez”.


[2] F.
Toro. “Venezuela’s 2002 Coup Revisited: The Evidence Two Years On” .


[3] R.
Prado and B.Sansň, “The 2004 Presidential Recall Discrepancias Between Exit
Polls and Oficial Results”. Technical
report, AMS-2004-6.


[4] M. Octavio,
“Venezuela Referendum Studies”. The Devil’s Excrement blog.

Follow

Get every new post delivered to your Inbox.

Join 11,831 other followers