Julie de Jong, 2016


If researchers wish to have survey interviews carried out by an interviewer, but face-to-face interviews are not possible, conducting interviews via telephone either through a landline or mobile telephone can be an alternative. Multinational, multiregional, and multicultural survey (“3MC” surveys) use different standards to determine whether telephone penetration is adequate in a study country. For example, the Gallup World Poll generally uses a telephone survey only in countries where telephone coverage represents at least 80% of the population (Gallup, 2015). Telephone interviews are generally less costly than face-to-face methods, and can be completed in a shorter amount of time. However, response rates are generally lower and depending on the available sampling frame for a country, a rigorous telephone-administered sample design can be difficult to develop. See Sample Design for a discussion of the challenges and limitations of a telephone-based frame and sample design.

As discussed in Data Collection: General Considerations, 3MC surveys sometimes employ mixed modes, depending on individual country constraints. However, it is important to note that mode effects may occur if the survey is carried out by telephone in some countries and face-to-face in others (see Study Design and Organizational Structure for discussion on mode effects).

Virtually all questionnaires administered by interviewers in telephone surveys are completed using an electronic computer-based instrument to record survey responses. This data collection mode is most commonly referred to as computer-assisted telephone interviewing (CATI). These guidelines assume that the interviewer will be using a computer-based instrument and will refer to the mode as CATI.

For additional discussion on the advantages and disadvantages of telephone surveys, see Study Design and Organizational Structure.


Goal: To achieve an optimal cross-cultural data collection design by maximizing the amount of information obtained per monetary unit spent within the allotted time, while meeting the specified level of precision and producing comparable results, within the context of a telephone survey.

1.   Develop the computer-based system(s) that the interviewers will use to administer telephone interviews.


Interviewers can conduct telephone interviews from either a central location or remotely. Software systems can be used to distribute sampled telephone numbers, to dial telephone numbers, to manage call records, and to record survey data. When using CATI, it is crucial to design and implement a system that interviewers can use to reliably collect survey data.

Procedural steps

1.1    Decide whether interviewers will work in a centralized and/or decentralized location.

1.1.1    Many survey research firms conducting telephone interviews maintain a “telephone lab”, which is a central calling center where center supervisors oversee a variable number of interviewers. Each interviewer has access to the electronic instrument and records responses directly in the electronic file. Interviews can be monitored in real time.

1.1.2    Sometimes interviewers work from other locations while having access to the electronic system set up by the survey research firm.

1.2     Develop a system and protocol for sample release management, including how cases will be transferred between interviewers when necessary.

1.3     Develop a protocol for dialing sampled telephone numbers. Some projects may use CATI systems that can dial telephone numbers automatically, while other projects may elect to have interviewers dial telephone numbers manually. In some countries it is against the law to use automation to dial specific types of telephone numbers (e.g., in the United States, it is illegal to use automation to dial mobile numbers). If using automation, be familiar with the local laws about its use.

1.4    Consider the cost structure for telephone calls in each study country. In the United States, respondents are responsible for the cost of incoming telephone calls on mobile telephones. However, in the Persian Gulf, for example, there is no charge and interviewers based in Nepal were able to telephone Nepali migrant workers living in Gulf countries for a migration survey without any cost to the respondents (Ghimire, Williams, Thornton, Young-DeMarco, & Bhandari, 2013).

1.5    Decide which telephone number and name will be displayed to the respondents in the caller ID, and whether the telephone number should be available if people call back the number.

1.6    Develop an electronic survey instrument used to record survey responses. There are numerous CATI software packages. However, it is also possible to use a web-based survey instrument, which may not be as suitable for more complex projects but is less expensive. Electronic survey instruments in a telephone survey share many of the same requirements as electronic survey instruments administered in the face-to-face mode. For in-depth discussion of these elements, see Guideline 3 in Data Collection: Face-To-Face Surveys.

Lessons Learned

1.1    While survey mode can affect survey responses, studies are not unanimous in the direction of the effect observed.

1.1.1    A survey of HPV awareness and knowledge, including sexual behavior, was conducted in Singapore, with half participating via CATI and half through an interviewer-administered face-to-face interview. Few differences between survey modes were found in the information disclosed (Smith et al., 2009).

1.1.2    A study in India evaluating accuracy of health data collection through several different interfaces found that telephone interviewing had greatest accuracy in phone interviews when compared to electronic forms on PDAs and text messaging (Patnaik, Brunskill, & Thies, 2009).

1.2    CATI can be particularly useful in a panel study setting, especially when there is frequent contact with respondents. Experiences vary by country, however.

1.2.1    In a study of farmers in Tanzania, researchers gave respondents pre-paid mobile phones for the duration of the field period so that they could receive a phone call from an interviewer and complete a survey every three weeks over a ten-month period, resulting in a high quality dataset (Dillon, 2012).

1.2.2    Researchers distributed mobile phones to female sex workers in India for use in a diary study on sexual behavior, which resulted in high response rates and high-quality data (Bradley et al., 2012).

1.2.3    Researchers on a panel study in South Sudan using CATI found that response rates were affected by irregular fluctuations in the mobile network (Demombynes, Gubbins, & Romeo, 2013).

1.3    Beyond the traditional CATI mode, interviewing via text message has been recently used. In this mode, the interviewer sends individual survey questions by text to the respondent, who sends his or her responses back by text to the interviewer (West, Ghimire, & Axinn, 2015; Lau, Lombaard, Baker, Eyerman, & Thalij, 2016).

2.   Train interviewers on interviewing strategies specific to telephone interviewing.


The nature of the interaction between the interviewer and the respondent depends on the mode of data collection. Some interviewing strategies that are accessible in a face-to-face mode, such as interpretation of body language, are not possible to implement over the telephone, contributing in part to lower response rates and potential for non-response bias. However, there are certain telephone-specific strategies that researchers can introduce to assist interviewers in completing telephone interviews.

Procedural steps

2.1    Consider the social context of the study country when hiring interviewers to administer a telephone survey, and whether selection of interviewer based on gender or other characteristics will affect response rates. See Lessons Learned 2.1 below as well as Interviewer Recruitment, Selection, and Training for additional discussion of interviewer recruitment considerations. 

2.2    Develop an introduction appropriate for the interviewer to read upon contact with the respondent.

2.3.1    The introduction is especially important and may differ depending on cultural norms, and the way the opening unfolds between the interviewer and respondent may have significant implications for both survey non-response and data quality (Couper & Groves, 2002).  The context of the interview can dictate identification procedures and pace of interview.

2.3.2    Establishing and maintaining rapport is especially important in achieving a telephone survey. Particular care should be taken in the translation stage to ensure an interviewer script that does not violate cultural norms involving politeness and linguistic encoding of status and social distance (Kleiner & Pan, 2006).

2.3.3    The introduction can be particularly critical in achieving cooperation in some countries. Previous respondent exposure to the telephone as a survey mode can differ across countries, and there can be discomfort in sharing personal information over the phone (Hughes, 2004).

2.3.4    In countries where there are linguistic differences depending on actors’ social status, translations must also recognize that interviewers and respondents are strangers and cannot rely on visual cues to establish social distance and appropriate linguistic level, necessitating the opportunity for some social interaction at the beginning of the survey to establish such social distance.

Lessons Learned

2.1    Gender norms of the study country can have a significant impact on response rates in CATI surveys.

2.1.1    In France, researchers have found that female interviewers generally have higher refusal rates in telephone surveys (Verger, Baruffol, & Rotily, 2001).

2.1.2    In Nepal, a highly gendered society, women generally prefer to speak to other women, and men to men, even over the telephone. However, in a CATI survey using Nepali-based interviewers contacting (mostly male) Nepali migrant workers in Persian Gulf countries, researchers obtained high response rates using predominantly female interviewers, because of the cultural perception that women would not call a male unless it was an important matter (Ghimire et al., 2013).

2.1.3    There is also anecdotal evidence that male respondents in the highly gendered countries in the Middle East are more likely to participate in a telephone survey when contacted by a female interviewer.

2.2    Immediate identification by name is standard telephone practice in the United States, but is uncommon in China (Kleiner & Pan, 2006).

2.3    Acceptable pace of the interview introduction can vary across even otherwise similar cultural contexts. For example, an examination of reaction to phone calls in Hong Kong and Beijing found that Beijing residents were more resistant to a fast-paced, business-like telephone conversation when compared to those from Hong Kong (Pan, Scollon, & Scollon, 2002). Similarly, a comparison of Greeks and Germans showed that Greeks prefer social interaction before reaching the main point of a telephone conversation, while Germans prefer to discuss the main point immediately (Pavlidou, 1994). 

2.4    Acquiescence bias differs across cultures and can be particularly problematic in a telephone survey where otherwise difficult issues can be exaggerated. For example, in many Asian cultures, people tend to avoid “no” answers to yes/no questions, particularly when there is an asymmetrical relationship between speakers as in a survey interview (Kleiner & Pan, 2006).

2.5    Introductory scripts can differ dramatically across cultures. For example, in Chinese, the use of expressions like “please” and “thank you” are not normally used in daily conversation and imply a large social distance between speakers. The mandated repetitive use of such words in a survey among Chinese speakers would be detrimental, particularly in a telephone survey where rapport is especially important, in sharp contrast to a survey in American English, where such phrases are acceptable and expected (Pan et al., 2002).

3.   Decide whether a subset of survey questions would best be collected in a self-administered section of the interview.


Interviewer-administered telephone interviewing is subject to social desirability biases similar to those in face-to-face interviewing. Interactive Voice Recognition (IVR) is a telephone mode where the computer plays recordings of the questions over the telephone to respondents who then respond by using the keypad of the telephone or saying their answers aloud. IVR can be used as a self-administered mode (SAQ) to administer a portion of an interview, otherwise conducted by CATI, which is particularly sensitive in nature and where accuracy might improve without the presence of an interviewer. It can also be used exclusively as a self-administered mode (SAQ), with the computer automatically telephoning the respondent and then completing the questionnaire (see Data Collection: Self-Administered Surveys) for further discussion of IVR in a completely self-administered mode.

Procedural Steps

3.1    Design the IVR system so that it is technically well-integrated into the CATI system in use by the project and that switching from the CATI to the IVR system is straightforward for the interviewer.

3.2    Decide whether to program the IVR system as touchtone, voice input, or a combination of the two.

3.2.1    When deciding on the programming, consider the target population. Studies in rural India and Botswana found that respondents with less education and lower literacy do better with touchtone, and cited privacy for touchtone preference as well (Kuun, 2010; Patel et al., 2009).

3.2.2   A study in  Pakistan found that a well-designed speech interface was more effective than a touch-tone system for respondents regardless of literacy level (Sherwani et al., 2009).

3.3    Devote sufficient time to the development of a high-quality IVR system to maintain respondent interest and continued cooperation.

3.3.1    The IVR system must have a high quality recording, as the respondent is likely to break off the survey if quality is poor.

3.3.2    See Oberle (2008) for a guide to the development of an IVR system and the associated speech characteristics which need consideration.

Lessons Learned

3.1    Consider the voice used for recording.

3.1.1    In a health helpline project in Botswana, researchers employed a well-known local actress for the IVR recording, and users reacted very positively (Kuun, 2010).

3.1.2    Depending on the social context, using an IVR recording of a male for male respondents and of a female for female respondents may elicit more accurate reporting, particularly of sensitive information.

3.2    Plauche, Nallasamy, Pal, Wooters, and Ramachandran (2006) developed an innovative approach to the challenge that dialectical variation and multilingualism poses to speech-driven interfaces for IVR in India, applicable to other settings as well. In their approach, people from specific villages are recorded during interactions, and their speech is semi-automatically integrated into the acoustic models for that village, thus generating the linguistic resources needed for automatic recognition of their speech.

3.3    A survey of teachers in Uganda resulted in a number of useful considerations when designing an IVR system to improve response rates and data quality (Lerer, Ward, & Amarasinghe., 2010). 

3.3.1    The IVR call began with the immediate information that “This is a recorded call from Project X. You are not talking to a real person.”

3.3.2    The IVR call provided very specific instructions about whether to use keypad or to speak 

3.3.3   Respondents were initially confused by the automation of the IVR system. Researchers had better results when using a chime to get respondents’ attention before the automated voice gave instructions.

3.3.4   Leveraging conversational and turn-taking conventions of normal conversation in the IVR system lead to more success than detailed instructions in eliciting desired user behavior.

3.3.5    An IVR system which projected a loud voice, with prompts recorded as if the speaker were using a poor cell connection, resulted in a survey that was easier for respondents to follow. 

3.3.6    When producing the IVR recording, use slow speech to get slow speech – respondents will emulate the voice, and resulting data will be easier to understand. 

3.3.7    The IVR recording included 3 seconds of silence before the recorded speakers says “thank you” and moves onto next question, which was reported as well-received by respondents.

