A Technical Analysis of An Exit Poll in Venezuela
In Venezuela, the presidential recall referendum was held on August 15, 2004. According to the official electoral council, President Hugo Chávez retained the presidency with 59% of the votes. These results have been endorsed by the designated international observers: the Organization of American States and the Carter Center. However, the opposition has cried "Fraud!" and one of the principal piece of evidence is an exit poll which showed the mirror opposite result with 59% voting to reject the president.
According to the Chicago Tribune,
Alejandro Plaz, president of Súmate, which has worked closely with opposition groups to observe the referendum process, said a quick count by his organization also showed Chavez winning Sunday's referendum. At the same time, Plaz said a nationwide exit poll commissioned by Súmate of 20,000 voters showed the opposition winning 59 percent of the vote. "Our results have given us a doubt the machines produced an outcome that reflects the will of the people," Plaz said.
But Carter, a veteran election monitor, discounted the Súmate and other exit polls, which he explained are "quite unreliable" in predicting electoral outcomes. "There is a high chance even in the best of circumstances that exit polls are biased," Carter said Monday.
The brief pronouncement by Jimmy Carter does not do justice to the technical reasons behind that statement. This article is a technical analysis of the Súmate exit poll based upon publicly available information.
By themselves, exit polls are neither automatically right or wrong. Exit polls have been conducted around the world. In some cases, they have been right on the spot; in other cases, they have been wrong beyond all reason. Therefore, it matters to have good design, implementation and analysis in place. Here are some pointers.
First of all, an exit poll is not meant to be a parallel voting system. Exit polls are small samples of voters that can be used to project overall results on a timely and accurate basis. Therefore, the proper selection and projection of the sample is extremely important. When the sample is improperly chosen and/or incorrectly projected, the results may be biased.
Secondly, according to the Miami Herald,
Experts in elections questioned the reliability of exit polls. ''Exit polls are not to be trusted,'' said Horacio Boneo, who has monitored elections in more than 60 countries. Boneo says it is not unusual for voters to be untruthful. ''Sometimes, people will tell the person doing the poll what they think the person wants to hear,'' he said.
In The Economist (via Venezuelanalysis.com), Jennifer McCoy of the Carter Center noted:
In countries as polarised as Venezuela, exit polls are risky. They require those conducting them to avoid bias in choosing whom to query, to avoid socio-economic bias in their dress and speech, and to work in a wide variety of neighbourhoods. They also require voters to tell the truth—despite intimidation and strong peer pressure on both sides. Any of these elements could have been lacking.
That is why it is important to make sure that the exit interviewing process be unobtrusive (i.e. guaranteeing the privacy of the responses) and the interviewers be unintimidating and trustworthy. When the process is intimidating, the exit polls will not reflect true conditions.
DESCRIPTION OF DATA
Initially, the Súmate exit poll was billed as being validated by a reputable North American company Penn, Schoen and Berland exactly to a tee. Later on, it would turn out that this was a deceptive positioning. The Súmate and PS&B polls are one and the same -- PS&B designed the poll and Súmate volunteers executed the poll in the field. The mispositioning is irritating, but does not affect this technical analysis.
The presidential recall referendum was scheduled to run from 6am to 4pm on August 15, 2004. As it turned out, a historically unprecedented number of voters turned out to cast their votes and there were long lines that went on for miles with people waiting for up to 10 hours. As a result, the official electoral council extended the voting hours repeatedly and promised that all who wish to vote would be able to do so. The voting did not stop until past midnight.
The official electoral council also decreed that exit poll results were not allowed to be published while the voting was going on. This is a reasonable requirement that exists around the world in order to allow all voters to vote without being influenced by allegedly foregone results. But at 7:30pm EST, Penn, Schoen & Berland issued the following press release from its office in Washington DC:
PRESS RELEASE ON FINAL EXIT POLL RESULTS FROM PENN, SCHOEN & BERLAND
New York, August 15, 2004, 7:30pm EST - With Venezuela's voting set to end at 8:00pm EST according to election officials, final exit poll results from Penn, Schoen & Berland Associates, an independent New York-based polling firm, show a major victory for the "Yes" movement, defeating Chavez in the Venezuelan presidential recall referendum.
With more than 8 million Venezuelans having cast their ballots so far, the results of a national exit poll show that Chavez has been ousted by referendum.
The Penn, Schoen & Berland Associates exit poll shows 59% in favor of recalling Chavez (the "Si" or "Yes", anti-Chavez vote) and 41% against recalling Chavez (the "No", pro-Chavez vote).
The poll results referred to in this release are based on an exit poll just concluded in Venezuela.
This is a national exit poll conducted in 267 voting centers throughout the country. The centers were selected to be broadly representative of the national electorate in regional and demographic terms.
In these centers, 20,382 voters were interviewed. Voters were selected at random but according to a strict demographic breakdown by age and gender to ensure a representative mix reflective of the national electorate. Those voters who were randomly selected to participate in this exit poll were asked to indicate only their vote ("Si" - for "Yes" - or "No") on a small ballot which they could then personally drop into a large envelope in order to maintain secrecy and anonymity. Data was sent by exit poll workers to a central facility in Caracas, Venezuela for processing and verification.
The margin of error for these final exit poll results referred to in this release is under +/-1%.
Some additional information about the Súmate exit poll was published on their website:
Sin embargo, para certificar este resultado, afirmó Palacios que decidieron hacer encuestas a boca de urna o exit poll en 300 centros, que es una muestra bastante grande y muy superior. Para ello recibieron asesoría de expertos de universidades nacionales y organizaciones internacionales, en cuanto al procedimiento de la selección de los centros adecuados, decidiendo escoger de acuerdo a técnicas del azar 19 en Distrito Capital, 1 en Amazonas, 13 en Anzoátegui, 5 en Apure, 18 en Aragua, 9 en Barinas, 9 en Bolívar, 28 en Carabobo, 1 en Cojedes, 13 en Falcón, 7 en Guárico, 11 en Lara. Estos centros reflejan el sentir del universo de Venezuela, con vocación hacia el Sí y hacia el No, rurales, urbanos, grandes y pequeños.
Palacios indicó que reclutaron y adiestraron durante más de un mes a los encuestadores para asegurarse que aplicaran las debidas técnicas de inteligencia emocional con el fin de que los ciudadanos respondieran adecuadamente para garantizar la confidencialidad. Se realizaron 267 consultas en todo el día, desde las 9 de la mañana hasta las 5 de la tarde, y una adicional a las 11 de la noche, obteniendo 12.097 encuestas a favor del Sí y 8.285 encuestas a favor del No, lo cual refleja una cifra de 59,4 a favor del Sí. Expresó quedesde las primeras horas de la mañana, desde las 7:00 a las 9:00 a.m. la cifra a favor del Sí era de 56 por ciento, a las 11 de la mañana era deun 59 por ciento, a la 1:00 p.m. un 61 por ciento, observando una tendencia clara que indicaba que a medida que avanzaba el día la proporción de ciudadanos votando por el Sí era mayor.
Esta fue la información proporcionaba una alta confianza en los sectores de la oposición de que el resultado final de las actas iba a ser favorable a las personas que apoyaban el Sí. Sin embargo, dijo Palacios, en la noche cuando se comenzó a recibir los resultados se produjo esa importante desviación con respecto a los resultados de las encuestas. Por ello decidieron ir al detalle, en aquellos centros donde se hizo la encuesta, buscando el acta de totalización logrando conseguir algunas desviaciones importantes. Dio como ejemplo el centro de votación Nº 1.323, frente a la Plaza Concordia, en el Distrito Capital, con 2.734 electores inscritos hasta el pasado julio de 2004, donde el acta de totalización recibida vía telefónica indica 905 ciudadanos votaron a favor del Sí (53 %) y 800 a favor del No; sin embargo, de las 80 encuestas o exit pool que se hicieron 68 de ellas apoyaron la opción del Sí, que representó un 85 por ciento. Esto quiere decir, que entre los resultados de la máquina y las encuestas hay una diferencia de más de 32 puntos porcentuales, que es más alto que la estadística indica para este tipo de procedimiento. También revisaron personas que firmaron en ese centro para solicitar el revocatorio presidencial, detectando que más de 1.075 provienen de ese centro de votación; sin embargo, en el acta de totalización apenas aparecen votando 905 personas por el Sí, lo cual genera una diferencia de 170 ciudadanos, lo cual es muy poco probable.
You will note the deceptive (but arguably technically correct) use of 300 voting centers by Súmate which seem to make it different from the 267 mentioned in PS&B, as well as not mentioning the 20,832 PS&B number of respondents even though Súmate's 12,097 SI's plus 8,275 NO's add up to exactly 20,832. As I said, this was irritating. They were better off by being forthright up front.
Based upon these press releases, the following technical points can be made:
Sample Location Selection
There are approximately 8,500 voting centers across Venezuela. It is neither feasible nor necessary to cover all of them because a carefully selected sample can be quite accurate. According to the PS&B press release: "This is a national exit poll conducted in 267 voting centers throughout the country. The centers were selected to be broadly representative of the national electorate in regional and demographic terms."
The Súmate statement identifies these locations as "19 en Distrito Capital, 1 en Amazonas, 13 en Anzoátegui, 5 en Apure, 18 en Aragua, 9 en Barinas, 9 en Bolívar, 28 en Carabobo, 1 en Cojedes, 13 en Falcón, 7 en Guárico, 11 en Lara". By my count, that adds to 134 sample locations. Do they mean that each point consists of a pair of voting centers in the same local area? This is not clear, but it is confusing because 134 is certainly not the 300 centers that they claimed.
Generally, there are two ways in which these voting centers can be selected.
In the first approach, the voting centers are selected proportionate to size (=number of registered voters or previous voter turnout). Thus, larger voting centers have higher probabilities of being selected and vice versa. The result is self-weighting -- provided that the abstention rate is uniform across all voting centers (and they are not!).
In the second approach, the voting centers are selected with equal probabilities. Thus, larger voting centers have the same probability of selection as smaller ones, and this is known as disproportionate to size.
In either case, it is recognized that these voting centers will not have the identical number of actual voters on this day, and therefore the results ought to be carefully weighted afterwards. This point will be discussed in detail later.
This exit poll is issued under the brand name of Penn, Schoen & Berland, which has three offices in the United States and none in Venezuela. It is not expected that PS&B would be able to import several hundred of its own interviewers, who would all need to speak fluent Venezuelan-accented Spanish, into this country for this one-day-only study. So this project must have been sub-contracted to one or more local field suppliers. In fact, they did not employ professional interviewing services. Rather, they counted on volunteers from Súmate.
According to Associated Press via Yahoo! News (U.S. Poll Firm in Hot Water in Venezuela, by Andrew Selsky on August 19, 2004)
Critics of the exit poll have questioned how it was conducted because Penn, Schoen & Berland worked with a U.S.-funded Venezuela group that the Chavez government considers to be sided with the opposition. The firm had members of Súmate, a Venezuelan group that helped organize the recall initiative, do the fieldwork for the poll, election observers said. Schoen said his firm "worked with a wide variety of volunteers that were provided by Súmate" but that they "were trained to administer the poll."
Venezuelan Minister of Communications Jesse Chacon said it was a mistake for Súmate to be involved because it might have skewed the results of the poll. "If you use an activist as a pollster, he will eventually begin to act like an activist," Chacon told The Associated Press.
Roberto Abdul, a Súmate official, said the nonprofit organization received a $53,400 grant from the National Endowment for Democracy, which in turn receives funds from the U.S. Congress but did not use any of those funds to pay for the exit polling. The issue is potentially explosive because even before the referendum, Chavez himself cited Washington's funding of Súmate as evidence that the Bush administration was financing efforts to oust him — an allegation U.S. officials deny.
Chris Sabatini, senior program officer for the National Endowment for Democracy, defended Súmate as "independent and impartial." "Exit polls are notoriously unreliable," Sabatini said by telephone from Washington. "Just because they're off doesn't mean that the group that conducted them is partial to one side."
The brand name Penn, Schoen & Berland does not automatically confirm quality in the fieldwork. And it was a political mistake to have used Súmate, which is perceived to have a partisan interest in the outcome.
Margin of Error
The press release ends with this sentence: "The margin of error for these final exit poll results referred to in this release is under +/-1%." How did this number come about?
Here it is necessary to introduce some statistical terminology. The margin of error in a survey is related to sampling error. A sample is subset of a population. When you draw a sample, you get a certain estimate from this sample (e.g. 60%); if you now go and draw another sample from this population, you will get different estimate (e.g. 59%). So the margin of error reflects how widely these sample estimates can differ.
Among other things, the margin of error depends on the sample size. If I draw a sample of only one person from the total population for an exit poll, I would expect a large margin of error (basically, the margin of error is going to be +/-100% and the estimate is useless). If I draw a sample of seven million people from the 10 million people who actually voted, I would have a very tiny margin of error because I have virtually covered the entire voting population. In practice, there are limitations on resource availability, and PS&B reports having interviewed 20,832 voters.
If these 20,832 voters were a simple random sample from the 10+ million voters, then the margin of error is calculated as follows:
Let P be the percentage of persons who chose SI (=59%)
Let N be the sample size (=20,832)
Then the margin of error = 2 x SQRT [Px(100-P)/N] = 0.7% rounded up to 1%
The literal interpretation is that there is a 95% chance that the actual percentage of SI's is between 58% and 60%.
So far so good? Unfortunately, this formula is only good if this is a simple random sample, and this sample definitely is not. Conceptually, a simple random sample can be drawn by putting the identity numbers of the 10 million actual voters in a box. you stir, shake and mix the numbers up and then you select 20,832 of them at random. This is physically not possible because you don't have such a list and you must interview the people on the list when they exit the voting center.
Instead, PS&B says that the exit polls were conducted at "267 voting centers throughout the country. The centers were selected to be broadly representative of the national electorate in regional and demographic terms." This is a two-stage sample, also known as a cluster sample, and it incurs an additional source of sampling error.
On one hand, if I told you that I intend to cover all 8,500 voting centers, then there is no margin of error involved related to any sampling of voting center because I have total coverage.
On the other hand, if I told you that I intend to select one voting center in the country and conduct my exit poll there. You should be deeply disturbed at the margin of error involved. If that one voting center happens to be in the upper-class El Hatillo, the SI vote may be 95%; if that one voting center happens to be in the lower-class Petare, the SI vote may be 10%. The margin of error from the first stage of selecting a voting center is so big as to make any subsequent survey work useless. I might as well as not bother.
Within the limits of the resources, PS&B opted to visit 267 out of the 8,500 voting centers. There is a sampling error component that is absent from the +/-1% margin of error. A correct margin of error will require PS&B to go back and apply the formula appropriate for a two-stage cluster sample formula. It will lead to an increase in the margin of error, but I don't have the information to tell you what it might be.
Let me give you an extreme illustration. Suppose that the country is so geographically polarized that all voters at a voting center will vote the same way (i.e. either 100% or 0% for SI). At each of the 267 sampled voting centers, you will only have to ask one voter what his/her preference is. All other voters at that voting center will have the identical response. That being the case, it wouldn't matter if you interviewed 267 persons (one per voting center), or 20,832 persons or 500,000 persons. Your sample will behave as if the sample size was only 267 and the margin of error will be +/- 2x[59x41/267]% = +/-6%. PS&B's margin of error should be higher than +/-1% but not as much as +/-6%.
Still, +/-6% is outside of the difference between the official 41% for SI versus Súmate's 59% for SI. However, this misstatement of the margin of error reflects poorly on the technical competence of PS&B.
PS&B claimed to have completed 20,832 exit interviews at 267 voting centers. On the average, they interviewed 20,832 / 267 = 78 persons per voting center. This is enough to arouse concern about how the work was conducted.
The recall referendum was scheduled to be held between 6am and 4pm on Sunday, August 15, 2004. In practice, due to the long lines, some voting centers stayed open until past midnight in order accommodate all voters.
What is the typical field worker's task assignment for the day? He/she is told to proceed to a certain voting center in the country, which may be remote, and then told to ask voters how they voted as they leave. This will take all of 5 seconds to do. Over the 10 hours that were officially scheduled, plus whatever additional hours, they somehow only managed to complete 78 interviews per voting center. There are 8,500 voting centers in the country and about 10,000,000 people voted. The average number of voters per voting center is about 10,000,000 / 8,500 = 1,176, of which 78 (=6.6%) were interviewed. The productivity is incredibly low.
In the Súmate press release, there is an illustrative example for voting center No. 1323. There were 2,734 registered voters of which 1,705 checked in to vote but Súmate was able to interview only 80 of them. What is going on here? What is the basis by which the 80 were interviewed? Did the other (1,705 - 80) = 1,625 voters just spit at the interviewer when asked? Did they walk away? Were they ever asked? And, most importantly, did they vote just like the 80? Commercially, a business survey with a response rate of 6% (=69 / 1,176) is unacceptable, because the completed answers may not be representative of the other 93%.
Here are some possible explanations, and I don't know which ones are true:
- The workers were slacking off, and since this is a one-day study, there is no remedy for correcting their behaviors afterwards. In this case, the quality of the survey results is put into doubt.
- The non-response rate was extremely high -- that is, most voters refused to talk to the interviewers about how they voted. In this case, you have to wonder whether those who talked had the same voting patterns as those who didn't.
- The workers were not required to work the entire day but only certain hours. In that case, the selection of the work period becomes critical. You must take it on faith that people who come at other hours have the same voting patterns. In fact, given that the PS&B went out at 730pm, they must have stopped the interviewing some time before that (most likely, right on the dot at the originally scheduled 4pm). In this case, you have to wonder if the survey results were skewed.
- Some workers had to cover more than one voting center (e.g. 6am-9am in one place, 9am-10am travel to another place and work there from 10am to 1pm, 1pm-2pm travel to another place and work 2pm-4pm). This is an acceptable use of resources, but for the fact that no one was scheduled to work after 4pm. Of course, they could still be called to do overtime work.
- The workers were told to work in a controlled fashion over the day. The PS&B press release says: "Voters were selected at random but according to a strict demographic breakdown by age and gender to ensure a representative mix reflective of the national electorate." For example, four interviews (two older males, two older females, two younger males, two younger females) between 6am and 7am, another eight with the same age/gender breaks between 7am and 8am, etc. But do the interviewers get to choose which specific older male out of many? It is natural for people not to want to be yelled at, so they are likely to ask people who look approachable. This is why street-intercept and mall-intercept studies are in such disrepute. This method is also known as quota sampling, and has the distinguished history of never getting the UK voting results right.
- The workers were trained to sub-sample the voters. This is done by the VNS service in the United States that the television networks use. For example, if a voting center has about 1,000 voters according to historical data and 100 interviews are needed, the interviewer is told to intercept every tenth voter exiting. This is a probability sample, and is considered to be superior to quota sampling. The VNS interviewing procedures are described in great detail by Daniel M. Merkle and Murray Edelman in the article Nonresponse Response In Exit Polls: A Comprehensive Analysis that is collected in the book Survey Nonresponse edited by Robert M. Groves, Don A. Dillman, John L. Eltinge and Roderick J.A. Little (2002, John Wiley & Sons). For example, VNS requires its interviewers to record the age, sex and race of all persons designated to be interviewed (e.g. the tenth person): those who respond will fill the information out themselves and those who refuse to respond will be recorded by the interviews according to observation. Thus, VNS has a good sense about the response rates as well as the profiles of the responders versus the nonresponders.
Consider now the anecdotal example observed by Josh Gindin (Venezuelanalysis.com) at the upper-class Altimira voting center:
Twenty conscripts stand around outside the voting center, clipboard in hand waiting for unsuspecting citizens to emerge, fresh from having voted. “Good afternoon,” they purr, “would you mind telling us if you voted ‘Yes’ or ‘No’?” and “Yes, yes, yes,” is the most common response.
“How many ‘No’ votes have you received?” I asked, playing the naïve reporter.
“Let’s see,” she offered, tapping her tennis shoes, “there are no ‘Nos’ on this page, and one on this page. I have one ‘No’.”
“Just one?” I persisted.
“Well, I don’t know about the others, but I have just one,” she answered, then, spotting some emerging voters in the distance, she scampered of to collect more “Yeses.”
There certainly does not appear to be any evidence of either quota or probability sampling. They were grabbing anyone that they can! This is one person's observation in the field, and may not be endemic. However, this points out PS&B's vulnerability in using non-professional partisan volunteers to execute the fieldwork.
In the THEORY section, I pointed out the significance of guaranteeing confidentiality of the responses. If not, the voters may refuse to answer or else provide untruthful answers. The PS&B press release seemed to take care of this issue:
Those voters who were randomly selected to participate in this exit poll were asked to indicate only their vote ("Si" - for "Yes" - or "No") on a small ballot which they could then personally drop into a large envelope in order to maintain secrecy and anonymity. Data was sent by exit poll workers to a central facility in Caracas, Venezuela for processing and verification.
In Josh Gindin's on-the-scene observations, where were those large envelopes in which small ballots were dropped? All the responses were just written down on listing sheets! Since the PS&B press release went at 7:30pm on August 15, it is impossible to see how the envelopes with 20,832 'ballots' could have been shipped from all over the country to the central facility in Caracas to be opened up for processing, keypunched, verified and tabulated in time. Actually, if you look at the Súmate press release, actually had obtained periodic results at 59% at 9am, 59% at 11am and 61% at 1pm. In other words, the polls opened at 6am and, by 9am, they already had the initial results. Can you see those envelops being processed according to PS&B's description of methodology by this timeline?
Instead, to make the 7:30pm release date, the more realistic methodology is the one observed by Josh Gindin: the interviewers were checking off the responses as SI or NO on a list. At 4pm (and at 9am, 11am and 1pm), they counted up the total number of SI's and NO's on all the listing sheets at the voting center so far, and then they telephoned the results into the Súmate headquarters. Someone then added up the counts across the voting centers in the country and dropped the result into the pre-written press release (note: if they are smart, they would have written two opposite versions beforehand). There is no getting around the fact that the PS&B statement about the 'ballots' in the sealed envelopes is a lie. It is not physically possible for them to do that. And I am not aware of anyone holding PS&B into account on this.
Not all voting centers have the same number of registered voters eligible to vote there. The CNE recognized that lower-class areas have four times as many registered voters per voting center than upper-class areas. In principle, they should have set up more voting centers. However, they did not want to create confusion among people about where to go, and so the voting centers remained the same as before.
Imagine the following scenarios. In voting center #1, the percentage of SI was 90% from 100 exit interviews. In voting center #2, the percentage of SI was 10% from 100 exit interviews. What is the average percentage of SI?
If the two voting centers had equal number of actual voters, the average percentage would be a pooled average = 100 x (0.90x100 + 0.10x100) / (100 + 100) = 50%.
But if voting center #1 has 1,000 actual voters and voting center #2 has 4,000 actual voters, the average percentage would be 100 x (0.90x1,000 + 0.10x4,000) / (1,000 + 4,000) = (900 + 400) / (5,000) = 26%.
So you cannot just take the number of SI's in the total sample and divide by the total number of respondents as your estimate. That answer would be biased towards the voting centers with fewer actual voters (to wit, the upper-class areas) and away from those with more actual voters (to wit, the lower-class areas) and the percentage of SI's would be overstated. PS&B's press release does not say what they did.
This is compounded by a second problem: the centers are unlikely to have equal number of interviews. As Josh Gindin noted, it is likely that Súmate got much better productivity from the workers and higher response rates from the voters in their strongholds and vice versa. In the previous quote, he noted that there were about 20 Súmate workers at Altimira. If the number of interviews there equals the overall average of 69, then each of these workers got 3 or 4 interviews for the day's work. This productivity statistic strains credulity. Here is some more anecdotal evidence from Venezuelanalysis.com:
According to one of Súmate’s Altamira volunteers, “we are here to provide food for the people in line, to provide them with water, to help them in any way we can to facilitate the voting process. And to do exit polls, to see if they voted ‘Yes’ or ‘No’.”
“And you have volunteers providing food in all the lines all over the country?”
“Yes, absolutely. Everywhere,” responded another white-clad Súmate pollster.
“But I was just in Petare, a very Chavista neighbourhood, and I didn’t notice anyone from Súmate handing out food or water,” I said coyly.
“That’s because the people in those neighbourhoods don’t like the Coordinadora, not because the Coordinadora doesn’t want to help them,” she exclaimed, visibly perturbed.
“So if you can’t get into Chavista neighborhoods, you can’t do exit polls there, right?” I asked.
“No…” she hesitated, “I’m sure they are doing exit polls everywhere.” End of interview.
Just as we showed in the illustrated numerical example above, the bias can be corrected by weighting. I'll make up some hypothetical numbers:
Altimira: 1,000 actual voters, 400 completed interviews, 360 SI's (% of SI's = 100 x 360 / 400 = 90%)
Petare: 4,000 actual voters, 50 completed interviews, 10 SI's (% of SI's = 100 x 10 / 50 =20%)
Altimira is upper-class and has a higher percentage of SI's and a higher response rate; Petare is lower class and has a lower percentage of SI's and a lower response rate.
Unweighted percentage of SI's = (360 + 10) / (400 + 50) = 370 / 450 = 82%.
Weighted percentage of SI's = (1,000x0.90 + 4,000x0.20) /(1,000 + 4,000) = (900+800)/5,000 = 34%.
You can see what a vast difference is made by proper weighting.
Did PSB/Súmate weight their data? From the Súmate press release, they said the results were distributed as 12,097 SI's and 8,285 NO's. And 100 x 12,097 / 20,382 = 59.4% was what they claimed for the SI's. This is a straight average. In other words, they did not weight by voting center according to the voter turnout. Their estimate is therefore potentially quite wrong. Could a professional polling firm such as PS&B be so stupid? Well, all I can say is that I am looking at a straight, unweighted average.
While the number of registered eligible voters at each voting center was known beforehand, the relevant number to be used in the weighting is the number of actual voters on Sunday, August 15, 2004. This is not the same thing at all, because the abstention rates differ across voting centers. In a country like the United States, there are extensive databases of voter turnout over time, so that it is possible to estimate the number of actual voters at the voting center level with reasonable accuracy.
When Hugo Chávez was elected President, 6 million people voted in total of which 3.8 million voted for him. The recall referendum this time was said to be unprecedented in the history of Venezuela because about 10 million out of 14 million registered voters came out to cast their votes. And this was not a presidential election with multiple candidates; this was the culmination of years of political struggle. In other words, any historical data about voter turnout in Venezuela are inoperative for this day. Weighting by the number of registered voters would have been better than no weighting, but it is still not guaranteed to be correct.
According to the report titled En busca del cisne nego: Análisis de la evidencia estadística sobre fraude electoral en Venezuela by Ricardo Hausmann and Roberto Rigobon, they have determined the following statistics:
- The percentage of SI's according to the National Electoral Council across the country = 41.1%
- The percentage of SI's according to the National Electoral Council at the voting centers in which Súmate conducted their polls = 45.0%
- The unweighted percentage of SI's according to the exit poll of Súmate = 59.5%
- The weighted percentage of SI's according to the exit poll of Súmate = 62.0% (note: it is not specified by Hausmann and Rigobon just how the numbers are being weighted, but it is at least an acknowledgement that it needs to be weighted)
The first thing to note that the Súmate sample had higher percentage of SI's than the national average. This is not necessarily an indication of fraud by Súmate to select favorable voting centers. When the sample was drawn originally, the only information available was the number of registered voters whereas it was the turnout that really counted. Given the post-facto turnout led to a significant result from this sample of voting centers, it just points out the importance of weighting the exit poll numbers properly.
The most important thing here is this: after weighting, the percentage of SI's went up! Therefore, you are now required to believe that Súmate was able to conduct more interviews than needed in places where the percentage of SI's were lower (i.e. lower-class neighborhoods) and, conversely, fewer interviews than needed in places where the percentage of SI's were higher (i.e. upper-/middle-class neighborhoods). Do you believe that? I know that I am vexed.
In conclusion, the evidence gleamed from the press releases from PS&B and Súmate as well as anecdotal evidence all point towards some serious problems in this exit poll. Some of the problems are beyond repair at this point (e.g. poor quality fieldwork by the interviewers), while others can be used to address academic investigations (e.g. re-weighting the data). There is a lot less than meets the eye here, and it seemed absurd to hold this exit poll up as the standard of truth versus an audited vote count.
(posted by Roland Soong, 9/2/2004)
(Return to Zona Latina's Home Page)