economic needs for improving intelligence within the food
TRANSCRIPT
www.gov.uk/defra
Economic Needs for Improving Intelligence
within the Food Authenticity Programme
Final report
November 2014
www.europe-economics.com
© Crown copyright 2014
You may re-use this information (excluding logos) free of charge in any format or medium,
under the terms of the Open Government Licence v.2. To view this licence visit
www.nationalarchives.gov.uk/doc/open-government-licence/version/2/ or email
This publication is available at www.gov.uk/government/publications
Any enquiries regarding this publication should be sent to us at
Contents
1. Executive Summary ...................................................................................................... 4
a. Literature review ........................................................................................................ 4
b. Methodology .............................................................................................................. 4
c. Case study ................................................................................................................. 4
d. Limitations .................................................................................................................. 5
e. Recommendations ..................................................................................................... 5
2. Introduction ................................................................................................................... 7
a. Food fraud .................................................................................................................. 7
b. Project objectives ....................................................................................................... 8
c. Structure of the report ................................................................................................ 8
3. Literature Review ........................................................................................................ 10
a. Food fraud ................................................................................................................ 10
b. The economics of fraud............................................................................................ 13
c. Fraud in other areas ................................................................................................. 14
d. Conclusions ............................................................................................................. 15
4. Factors that Affect the Risk of Fraud .......................................................................... 17
a. Economic factors and market characteristics ........................................................... 17
b. Production and distribution ....................................................................................... 18
c. Product characteristics and detection technologies ................................................. 19
d. Institutional and enforcement characteristics ........................................................... 21
5. Methodology ............................................................................................................... 23
a. Selecting a methodology .......................................................................................... 23
b. An econometric methodology ................................................................................... 25
c. Interpretation and use of the results ......................................................................... 29
2
d. Single and multiple products .................................................................................... 31
e. New types of fraud ................................................................................................... 32
f. Comparison of the proposed approach and the literature ........................................ 33
6. Case Study: Basmati Rice .......................................................................................... 36
a. Global market for Basmati rice ................................................................................. 36
b. UK market for Basmati rice ...................................................................................... 37
c. Basmati rice adulteration .......................................................................................... 38
d. Data ......................................................................................................................... 38
e. Descriptive statistics ................................................................................................ 40
f. Econometric results ................................................................................................. 44
7. Conclusions and Recommendations .......................................................................... 54
a. Case study ............................................................................................................... 55
b. Limitations ................................................................................................................ 55
c. Recommendations ................................................................................................... 56
8. Annex I: Detailed Review of Selected Literature ......................................................... 57
a. Food fraud – economics........................................................................................... 57
b. Food fraud – biological science ................................................................................ 61
c. Credit card fraud (empirical) ..................................................................................... 68
d. Credit card fraud (theoretical) .................................................................................. 68
e. Credit card fraud and computer science .................................................................. 69
f. Automobile insurance and car accidents ................................................................. 75
g. Consumer goods ...................................................................................................... 76
h. Fraud in general ....................................................................................................... 76
i. Insurance and tax fraud ........................................................................................... 80
9. Annex II: Methodologies Used to Study Fraud ........................................................... 84
3
a. Construction of risk indices ...................................................................................... 84
b. Econometric models ................................................................................................ 85
c. Data mining .............................................................................................................. 87
10. Annex III: Data Sources .............................................................................................. 90
a. Food fraud data ........................................................................................................ 90
b. Economic data ......................................................................................................... 95
c. Other data considerations ...................................................................................... 102
11. Annex IV: Econometric methodology ........................................................................ 103
12. Annex V: Linear Correlations .................................................................................... 106
13. Annex VI: Econometric Estimation............................................................................ 107
Executive Summary
4
1. Executive Summary
This report explores the scope for applying economic intelligence to the analysis and
prediction of food fraud in the UK. Food fraud is defined as “the deliberate placing on the
market, for financial gain, goods which are falsely described or otherwise intended to
deceive the consumer”.1 Given that for our purposes in this report we regard food fraud as
economically motivated, we explore whether it is possible to predict the likelihood of fraud
based on the economic variables that drive the potential profits to be made by committing
such fraud.
a. Literature review
The first part of the report consists of a review of the literature on food fraud, the
economics of fraud, statistical methodologies used to predict fraud and the potential data
sources that could be used with this purpose. Based on this review, it was possible to
identify some fundamental characteristics of food fraud in the UK and other countries; the
general approach to modelling fraud from an economic theory perspective; the factors that
have been postulated as contributors to the risk of food fraud; methodologies used to
detect and predict fraud, either based on economic analysis or other approaches; and
variables and data sources that can be employed in statistical models of fraud.
b. Methodology
After reviewing the different methodologies proposed in the literature, we consider that an
econometric approach would be the most suitable to predict the risk of food fraud. The
methodology section provides details of which variables could be included (based on our
data assessment), the estimation methods that can be employed and criteria for selecting
among the multiple possible models. In addition, we propose an approach to use the
estimation outcomes based on past fraud for prediction of future fraud. Based on the
evolution of observable economic variables, the model produces a prediction of the risk of
fraud.
c. Case study
The final section of the report focusses on testing the methodology via a case study. The
selected type of fraud is the adulteration of Basmati rice using other varieties of rice. The
selection of this type of fraud was based on well documented instances of past fraud and
availability of economic data. It was possible to gather monthly data for the period 2010-
2013 on previous incidents of food fraud, prices of Basmati rice in India and Pakistan (the
1 Elliot, Chris (2013) “Elliott Review into the Integrity and Assurance of Food Supply Networks – final
report”.
Executive Summary
5
two countries that produce this variety), the volume of production of rice, the volume of
exports of Basmati rice to the UK, the world price of long-grain rice and the consumption of
rice in the UK. After conducting a large number of regressions, we conclude that the only
variable that is statistically significant in predicting the risk of fraud is the gap between the
price of Basmati rice and other varieties of rice. Based on the estimations, we classify the
observations according to low and high risk of fraud. We find that this classification would
have predicted food fraud correctly with 66.6 per cent of accuracy.2 This level of accuracy
suggests that the test proposed in this report may contain useful information that would
indicate a higher risk of fraud. However, we would like to stress the indicative nature of this
accuracy level. Given its limitations, the fact that the test indicates high risk of fraud should
not be interpreted as conclusive evidence that fraud would occur. The 66.6% level of
accuracy is better than the level of accuracy obtained by using a trivial predictor based
only on the price ratio between the original Basmati rice and the world non-Basmati rice.
The maximum level of accuracy this predictor would yield is 62% (this is reached when the
threshold for the price ratio between the two prices of rice is set at 58% so that any price
ratio above 58% would be considered suspicious and require an investigation by the
authorities).
d. Limitations
The case study shows that the proposed methodology is feasible to implement. However,
the case study also served to illustrate the considerable limitations that could be faced
when applying the proposed methodology to a particular product or fraud type. The most
important limitation is the small sample size. The case study is based on 21 valid food
fraud observations over a time span of 3 years. The normal minimum sample size for
obtaining meaningful statistical result of more general (out-of-sample) applicability is 30 —
i.e more than the 21 available here. With 21 data points it was possible to conduct some
statistical analysis and there were potentially interesting indicative results. However
(unsurprisingly, given the data limitations), at most one explanatory variable was
significant in any model, the results being substantially weakened when more than one
variable was considered. Attempts to apply this methodology to other products may
encounter the same or even greater data limitations.
In addition to the number of observations, other data limitations include the use of low
quality or missing data and the difficulty (or impossibility) to measure variables that the
literature has identified as relevant, such as key features of the supply chain.
e. Recommendations
We consider that the proposed methodology — i.e. econometric modelling (especially the
deployment of OLS and logit models) including controlling for variables such as the level
2 The level of accuracy ranges from 50 per cent (the test has no capacity of predicting fraud) to 100 per
cent (perfect prediction accuracy).
Executive Summary
6
and change in price differences between authentic product and adulterated product and
the number of samples taken — is appropriate and solidly founded in the literature.
However, due to limitations in the data currently available, the results obtained when
applying the methodology might not be entirely satisfactory. The accuracy and reliability of
the results of the methodology would improve substantially with additional data in the
following categories. First, more data on testing and detection of past food fraud is
necessary. In the case study, the main constraint to the number of observations is the
number of months in which authenticity testing was conducted in the UK. We have used,
to the best of our knowledge, the most extensive data coverage available from the UK
Food Surveillance System (UKFSS). However, this source is fairly recent. We expect that
the quality of the estimations would increase rapidly as more data becomes available in
the near future.
Second, additional data sources on prices and other economic variables should be
explored (including, if available, panel data). The present report conducts an extensive
review of publicly available data sources. However, there might be relevant data available
from private providers. This data may allow the inclusion of new variables and improve the
quality of the data for the variables already considered in the case study. In addition, it
may provide measures for variables identified in the literature, such as the complexity of
the supply chain, for which an appropriate quantification has not yet been found.
f. Conclusion
Our provisional results are that the main identified risk factors are:
the level of price differentials between authentic product and close substitutes that
can be used as adulterants (i.e. ceteris paribus, the greater the difference, the
greater the risk of adulteration);
changes in price differentials between authentic product and close substitutes that
can be used as adulterants (i.e. ceteris paribus, a sudden increase in the differential
is associated with a greater the risk of adulteration).
Pending the gathering of more data and the development of more robust models, it would
be possible, in principle, to use these identified factors as a “rule of thumb” indicator of
where food fraud is more likely.
Introduction
7
2. Introduction
Europe Economics, with the collaboration of FoodChain Europe, is advising the
Department for Environment, Food and Rural Affairs (Defra) to explore the potential
benefits of using economic intelligence to assist existing efforts to address food fraud. The
central objective of the project would be to construct a conceptual economic model which
will be able to inform authorities about the areas where enforcement against food fraud
should be prioritised. The methodology developed in this report is preliminary and would
inform potential future work to develop the model further.
a. Food fraud
Some recent food fraud cases have received considerable media attention. Most notably,
horse meat DNA was detected in frozen beef hamburgers sold in several European
countries, including the UK in January 2013. A number of experts, including Prof. Chris
Elliott, have suggested that economic intelligence could exploit existing market data to
better direct enforcement efforts to detect and prevent food fraud. For example, the Elliott
final review recommends:
“The FSA should take the lead in the collection, analysis and distribution of information
and intelligence from a wide range of sources (including Governmental e.g. local
authorities, police, EU counterparts) acting as an ‘intelligence hub’. Through this
intelligence hub, the FSA needs to develop its links with the research sector to produce
and share horizon scanning analyses of the commodities or markets considered at most
risk from crime due to trade route complexity, commodity price fluctuations, crop failures,
fishing restrictions, the development of premium markets through labelling, and criminal
ingenuity.”
Food fraud is defined as “the deliberate placing on the market, for financial gain, goods
which are falsely described or otherwise intended to deceive the consumer”.3 This includes
the “substitution, addition, tampering, or misrepresentation of food, food ingredients, or
food packaging; or false or misleading statements made about a product, for economic
gain”.4 Common types of food fraud are adulteration, misbranding and counterfeiting.5
According to Spink and Moyer (2010), the main types of economically motivated
adulteration (EMA) of food are:
Dilution.
Substitution.
3 Elliott, Chris (2014) “Elliott Review into the Integrity and Assurance of Food Supply Networks – Final
Report”, July 2014. 4 Spink, John and Douglas C. Moyer (2011) “Defining the Public Health Threat of Food Fraud” Journal of
Food Science Vol. 76, Nr. 9. 5 Food fraud is, by definition, economically motivated. However, we should note that its consequences are
not only financial but they also include additionally public health and safety concerns.
Introduction
8
Artificially increasing weight.
Trans-shipment, disguising true country-of-origin.
Port shopping.
Theft.
Mislabelling, counterfeit, etc.
Food fraud generates costs to various members of society. These include not only the
direct costs to the producers that committed fraud, but also indirect costs to buyers of
sellers of that product and food in general and impacts on consumer confidence in the
integrity of the food supply. The following are some notable examples of these costs:
Final consumer: food quality and food safety, overpayment for non-authentic
products, ethical and religious considerations associated with consuming products
that do not respect their beliefs.
Retailer: reputational damage, costs associated with recall and disposal of
fraudulent merchandise, costs of performing quality assurance.
Other producers and the food industry in general: reputational costs due to
diminished consumer confidence.
By distorting prices, food fraud is able to interfere with the efficient allocation of
resources for the society as a whole.
Tax payer: costs of food authenticity enforcement costs, potential loss of tax
revenue from VAT and customs duties.
b. Project objectives
The project objectives are the following:
Conduct a review of the literature on food fraud and the literature on the use of
economic intelligence to detect and prevent fraud areas beyond food.
Develop a framework to estimate the risk of food fraud in various food products.
Scope potential data sources that would feed into the model.
Assess the feasibility of the application of the framework to a wide selection of food
products given the information constraints.
Validate the framework using a case study.
This project would be the initial phase of a potential future programme of follow on work,
yet to be confirmed, where the proposed framework would be applied systematically to
several other sectors beyond the case study.
c. Structure of the report
The remainder of the report is organised into the following main sections:
Summary of the literature review on food fraud and the economics of fraud.
Compilation of the factors that affect the risk of food fraud identified in the literature.
Introduction
9
Description of the methodology.
Testing of the methodology, using adulteration of Basmati rice as a case study.
Conclusions and recommendations.
The literature review identifies the approaches that have been used to construct economic
models of fraud, whether in food or other areas and limitations associated with these
approaches. This review informed our judgement in developing a methodology that could
be used to predict the risk of food fraud in the UK based on economic variables.
Based on the literature review, this report presents a comprehensive compilation of factors
that affect the risk of food fraud pointed out by various authors. These are classified into:
Economic factors and market characteristics.
Production and distribution factors.
Product characteristics and detection technology.
Institutional and enforcement factors.
Literature Review
10
3. Literature Review
The outcomes obtained in this report have been informed by a review of the literature on
food fraud, the economics of fraud, statistical methodologies used to predict fraud and the
potential data sources that could be used with this purpose.6 The literature covered
includes mainly scholarly articles or policy reports. Based on this review, it was possible to
identify:
Some fundamental characteristics of food fraud in the UK and other countries.
The general approach to modelling fraud from an economic theory perspective.
The factors that have been postulated as contributors to the risk of food fraud.
Methodologies used to detect and predict fraud (not necessarily on food), either
based on economic analysis or other approaches.
Variables and data sources that can be employed in statistical models of fraud.
This section presents a short review of the first two items. A more detailed review of
literature on fraud and how it is applied in economics is included as an Annex (Annex I).
The factors that affect the risk of food fraud that have been identified in the literature are
discussed in the next section. A review of the methodologies and data sources are
presented in two separate Annexes (Annex II & III).
a. Food fraud
The UK food and beverage market in 2013 was estimated to be worth £196bn.7 Food fraud
is relevant to a wide variety of food products. For example, Shears (2010) reports that in
1999, 8 per cent of on-licensed outlets in the UK were substituting at least one spirit
brand.8 The value of this particular fraud is estimated at £43 million per year.
Approximately 17,000 litres of fake vodka worth £1m were seized in one interception in
2013 alone.9 HM Revenue and Customs estimates that beer smuggling costs the Treasury
around £500m a year. In 2007 the FSA set up a food-fraud database. The amount of
testing has increased sharply in the last couple of years, with 1,538 cases of food fraud (in
all products) identified in 2013 alone. Other examples with well-documented instances of
food fraud include fish (e.g. salmon and cod), basmati rice, honey, olive oil and asparagus.
6 We note that the approach taken was of a standard literature review. For other approaches, see
http://www.civilservice.gov.uk/networks/gsr/resources-and-guidance/rapid-evidence-assessment/what-is. Other approaches to reviewing evidence such as Rapid Evidence Assessment were not followed due to the short timescales of this project and the wide scope of the economics literature on various types of fraud.
7 https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/315418/foodpocketbook-
2013update-29may14.pdf. 8
Shears, Peter (2010) "Food fraud–a current issue but an old problem" British food journal 112.2, 198-213. 9
The Economist (2014) “Food Crime: A la cartel” , March 15, 2014 available at: http://www.economist.com/news/britain/21599028-organised-gangs-have-growing-appetite-food-crime-la-cartel
Literature Review
11
In an effort to expand the literature on food fraud FERA has commissioned a report on the
identification of information concerning food fraud in the UK and elsewhere.10 The key
findings of the report include:
Identifying 35 different sources of information regarding food fraud including
individual companies, trade associations, consumer groups and private sector
laboratories.
Due to the wide range of food fraud cases that may occur, it is almost impossible to
identify a single source of information.
The UK Food Surveillance System (UKFSS) provides a good starting point for
future research that would monitor enforcement efforts in the food authenticity area.
The collection and analysis of food fraud data at a European and international level
is not satisfactory.
Food adulteration is mainly driven by economic incentives and although in most cases it
does not pose any health risks it should not be overlooked by public authorities.
Furthermore, there are a few cases which have proven that food adulteration can pose
considerable health risks. Such cases include the Czech Republic case whereby fake
alcohol caused 19 deaths to a small category of allergic consumers and the 2012 case of
substitution of almonds with peanuts in the UK that posed allergic consumers into severe
health risk. Consequently, the need for monitoring food authenticity on an on-going basis
becomes even more essential.
Despite the striking figures associated with food fraud, it has been recognised by experts
that the existing data on food fraud does not give a reliable estimate of its overall extent.
For instance, Everstine et al. (2013) find that there are gaps in quality assurance testing
methodologies that could be exploited for economic gain.11 They claim that large-scale
EMA incidents have been described in the scientific literature, but smaller incidents have
been documented only in media sources. For this reason, the authors have spent
substantial efforts in recent years to construct the EMA database (see Annex I for a
description). In a similar vein, Johnson (2014) emphasises that it is typically not possible
for enforcement agencies to prosecute every instance of food fraud given the wide variety
of known types of fraud and constraints in resources.12
The literature has made significant progress in identifying a large number of potential
determinants of food fraud. Fairchild et al. (2003) provide a typology of these factors.13
They note that one motivation behind economic adulteration is typically the opportunity to
reduce costs and increase profits per unit sold by increasing prices to the level of
10
Dennis, J., & Kelly, S. (2013) “The identification of sources of information concerning food fraud in the UK and elsewhere”.
11 Everstine, K., Spink, J., & Kennedy, S. (2013). Economically motivated adulteration (EMA) of food:
common characteristics of EMA incidents. Journal of Food Protection, 76(4), 723-735. 12
Johnson R. (2014) “Food Fraud and “Economically Motivated Adulteration” of Food and Food Ingredients, Congressional Research Service.
13 Fairchild, G. F., Nichols, J. P., & Capps, O. (2003). Observations on economic adulteration of high-value
food products: The honey case. Journal of Food Distribution Research, 34(2), 38-45.
Literature Review
12
unadulterated products, or to reduce input costs and lower selling price to increase sales
volume and/or market share. Cost differences can be significant enough that firms selling
adulterated product can cause economic injury to competing firms, sometimes selling
below product cost for pure products and sometimes driving producers and packers out of
business.
Spink and Moyer (2011) provide additional insights to the motivations for seeking food
fraud opportunities:
“Brand growth and increased brand recognition of a product actually increases the fraud
opportunity (that is, more victims, spending and brand equity). Finally the guardian or
hurdle gaps lead to a greater fraud opportunity. Guardians include entities that monitor or
protect the product and could include customs, federal or local law enforcement, trade
associations, nongovernmental organizations, or individual companies themselves.
Hurdles include components or systems that exist (or are put in place) to reduce the fraud
opportunity by assisting in detection or providing a deterrence.”
They note that “fraud opportunities could be reduced by increasing the risk of detection, or
increasing the costs of the necessary technology to commit the fraud and/or of developing
quality levels that would attract consumers. Countermeasures are intended to reduce the
fraud opportunity, but a refinement to a process or a narrowing of focus in detection could
inadvertently create new gaps that could be exploited by fraudsters. An example of this
uncertain nature is that fraudsters may shift ports of entry by conducting strategic “port
shopping” and by shipping fraudulent product through less monitored entry points.14
The Elliott Review notes that the global nature of the current food markets enables UK
consumers access to all types of products even when they are out of season. This means
that the supply chain for food has become much more complex as a number of these
products must be imported from abroad. Consumers have become used to variety, taste
and access at low cost. All of these factors have increased opportunities for mislabelling,
substitution and for food crime.
The literature that uses economic methods to study the adulteration of food products is
very limited. A notable exception is provided by Pouliot (2012). This study focuses on the
economics of adulteration in food imports, particularly by applying principles of economic
theory to analyse the case of imported fish and seafood in USA.15 The report aims to prove
that economic incentives can be the main driver of food adulteration in the USA. Economic
variables such as prices, supply and demand levels and country of origin were found to be
significant in predicting adulteration in food imports. Pouliot also makes reference to the
PREDICT forecasting system in the US, which assesses the risk of adulterated imports
and identifies those products that are more likely to be fraudulent, therefore helping
inspectors to concentrate their efforts on riskier imports. The PREDICT system employs a
14
Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood imports. Cahier de recherche/Working paper, 2012, 15.
15 Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood
imports. Cahier de recherche/Working paper, 2012, 15.
Literature Review
13
data mining technique which analyses information regarding the provenance of the
product, the type of product, weather information, the name of the exporting firm and
labelling information.
Using a theoretical perspective, Liang and Jensen (2007) construct a model of imperfect
food certification, opportunistic behaviour and detection.16 The analysis finds that farmers
are expected to respond to monitoring and enforcement very swiftly. Not only do the levels
of fraudulent activity decrease but also high-safety output increases. The study notes that
the optimal monitoring effort depends on the characteristics of the farms, such as size and
costs of production. Finally, fraudulent activity should be tackled with a combination of
penalties, sales bans and monitoring activities. Similarly, the Elliott Review recommends
an approach that would increase the difficulty for criminals to operate in food networks by
introducing new measures to check, test and investigate any suspicious activity. In
addition, this report suggests that those caught engaging in food fraud activity must be
severely punished by the law to deter further fraud.
b. The economics of fraud
Food fraud is motivated by economic gain. Therefore, to estimate the risk of fraud, it is
necessary to identify the economic profits that a potential food fraudster would have to
incur. The main components of the profits for fraudsters include:
Benefits: difference between prices of authentic and adulterant, multiplied by
volume.
Cost: penalties, reputational damage, develop new supply chain / technologies.
Probability of detection: according to research by Spink (2011), increasing the risk
of detection or increasing the cost of the technology required to adulterate a product
can reduce fraud opportunities.17
The benefits are straightforward to model: they consist of the gain per unit of final product
where replacing the authentic ingredient with a fraudulent one, multiplied by the number of
units sold of final product. The difference in prices could be given by the lower quality of
the adulterant or the availability of a surplus amount of this ingredient.
The costs of committing fraud are somewhat more complex. There are two sources of
costs for the fraudster. First, it may have to incur expense to make the substitution of the
authentic product feasible. These costs may include modifying the productive process,
logistics and research into how to modify the product most effectively. In addition, fraud
might constrain the type of markets in which a producer may operate without exposing
themselves to a high risk of detection. The second category includes the costs that will
have to be paid only in case of detection. They can be modelled as the probability of
detection multiplied by the penalties. The latter can include costs such as fines, bans and
16
Liang, J., & Jensen, H. H. (2007). Imperfect food certification, opportunistic behaviors and detection. Selected Paper, 175174.
17 Spink J, and Moyer D (2011), Defining the Public Health Threat of Food Fraud, Journal of Food Science.
Literature Review
14
reputation effects. It should be noted that a potential fraudster could face a trade-off
between these two costs. For example, a higher investment in the substitution process
might result in a lower probability of detection. This approach would be informed by the
FSA database of previous fraud cases to identify the relevant food products to be
modelled.
Becker (1974)18 provided a pioneering economic approach to understand criminal activity.
Becker emphasises the role of the costs and benefits of criminal activity, both from the
criminal’s and society’s perspective. From the criminal’s point of view, the relevant factors
that determine whether to commit offences are the potential gains and the probability of
conviction with its associated punishment. From the point of view of society, optimal
enforcement policies would depend on the damages caused by crime together with the
costs of increasing penalties or the level of enforcement. Becker discusses how different
combinations of penalties and probability of conviction might result in different levels of
crime and its associated costs to society. For example, public policy might focus on
increasing the likelihood of detection or increasing the associated fines. Becker shows how
different public policy instruments provide different incentives to criminals.
The economics literature that addresses food fraud is relatively small. It includes the work
of Liang and Jensen (2007)19 and Pouliot (2012a20, 2012b21). Pouliot (2012a) claims that
the decision by an exporting firm to adulterate its output depends on the relative price of
inputs and the ability of the importing country to detect adulteration. Pouliot (2012a)
concludes that the country of origin, the port of entry, product code and product description
are determinants of fraud. Liang and Jensen (2007) emphasise the effect of the monitoring
agency’s effort on the risk of fraud. This can be achieved through a combination of policies
of penalty, sale ban and monitoring activities. The authors do not proceed to elaborate on
how these can be achieved in further detail.
c. Fraud in other areas
There is abundant literature on fraud in a number of different areas beyond food.
Moreover, economic analysis has been applied to a number of them. Examples of these
activities that are subject fraud are presented in Table 3.1. The conclusions obtained by
the literature in these areas are presented as a separate Annex (Annex I). Table 3.1also
discusses the main methods used to analyse fraud in these areas. A description of these
methods is included in the Methodology section below and Annex II.
18
Becker, G. S. (1974) “Crime and punishment: An economic approach” In Essays in the Economics of Crime and Punishment, Gary S. Becker and William M. Landes, eds. UMI.
19 Liang, J., & Jensen, H. H. (2007) “Imperfect food certification, opportunistic behaviors and detection.”
20 Pouliot, S. (2012) “Using economic variables to identify adulteration in food imports: application to US
seafood imports.” 21
Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood imports. Cahier de recherche/Working paper, 2012, 15.
Literature Review
15
Table 3.1: Activities subject to fraud studied in the literature
Type of fraud Area Method Data requirements
Misrepresentation
Credit Card
Data Mining Very high
Econometrics-Logit
Model
High
Tax Compliance Panel Data Techniques Moderate
Insurance
Econometrics Moderate - High
Survey Moderate
E-commerce Data Mining High
Counterfeit
Pharmaceuticals Econometrics Moderate
Luxury Goods Econometrics - Indices Moderate
Tobacco Econometrics and
Simulations
High - Data mostly
confidential
Art - paintings Indices High - Highly specialised
data
d. Conclusions
The main lesson learned from the literature review is that there is very little, if any, existing
research that applies economic intelligence to model or predict food fraud. However, the
literature on food fraud is very rich in identifying factors that may determine the likelihood
of this type of fraud. The next section discusses these factors, informed by the reviewed
literature.
The literature review has found that economic intelligence has been applied frequently to
other types of fraud. The above discussion presents the general approach shared in the
vast majority of economics reports to conceptualise fraud. Annex I provides a detailed
review of the literature of economic intelligence used to address fraud in areas other than
food.
The key remaining question is whether the identified methodologies used in other areas
can be applied to food fraud. The methodology section gives a positive answer to this
Literature Review
16
question, by proposing an approach that adopts many of the elements used in the
literature.
Factors that Affect the Risk of Fraud
17
4. Factors that Affect the Risk of Fraud
This section presents a comprehensive list of factors that are potential determinants for the
risk of food fraud. We have collected these from various sources, including our literature
review and consultations with subject experts. The identified factors are grouped into four
categories:
Economic factors and market characteristics.
Production and distribution.
Product characteristics and detection technology.
Institutional and enforcement factors.
a. Economic factors and market characteristics
Economic information would allow us to estimate what the potential gains are for a
producer to engage in food fraud. Such gains would depend primarily on prices and
volumes which, in turn, determine total revenue and profit margins.
Economic data would include, at least, market prices and volumes produced. Price data
should cover historical series, up-to-date prices and current prices from futures markets,
where relevant. The longer the data series, the better position we will be in to draw
accurate conclusions and validate our methodology. From prices and volume data it would
be possible to construct additional variables, such as whether there are rapid changes in
market conditions. We would also evaluate whether events such as crop failures should
also be included in the analysis. This would be done only when there is reason to believe
that their effects might not be appropriately captured by commodity prices and volumes.
The relevant economic variables that we have identified include the following:
Price gap between authentic product/ingredient and adulterant: the larger this gap is
the higher the likelihood of fraud. This is because it allows the supplier to gain a
higher profit margin for each transaction. This gap widens as the price of the
authentic food increases or when the price of the adulterant decreases. Food
products that tend to have increasingly higher prices would be more likely to be
adulterated.
High profit margin: when the difference between price and cost is large there are
high incentives to commit fraud even at a small scale. Fraudsters may substitute or
adulterate branded vodka with cheap non-branded vodka. Examples of food
products that have high profit margin are: alcohol, poultry and chocolate.
Scale: even with small margins products that are sold on a large scale might enjoy
large total profits and, consequently, are attractive for fraudsters. At the same time
however, frauds at a smaller scale are less likely to be detected hence they provide
an additional incentive for fraudsters to commit the crime. Thus, ceteris paribus, it is
expected that fewer instances of fraud would occur at a large scale.
Factors that Affect the Risk of Fraud
18
Brand ownership: owners of recognised brands have an incentive to protect them
and increase the assurance controls imposed upon suppliers. At the same time,
brands that have high values might incentivise counterfeit because they are highly
demanded by consumers.
Known imbalances in quantities between primary production and final distribution:
increases in demand or reductions in supply can create unsatisfied demand (at
least in the short run) and create an incentive for fraud. Similarly, a large supply of
cheap ingredients may increase the incentive to use it as an adulterant. Such
imbalances are more likely when the particular fraud is more vulnerable to
environmental conditions. For instance, tomatoes are susceptible to bad winter
weather conditions. A bad winter may mean the supply of tomatoes may decrease.
The final product per se may not be adulterated but there is a high probability that
its derivatives such as tomato juice and ketchup will be adulterated. Another
example would be caviar. The supply of caviar eggs has remained constant
whereas demand for this good has been increasing steadily which has pushed its
price upwards.22 This provides a good opportunity for suppliers to adulterate the
food and gain higher profit margins over time.
Relation to organised crime: recent opinion suggests that organised crime groups
are increasingly involved in food fraud adulteration.23
b. Production and distribution
Food products have various production and distribution processes. Some of these
processes are influenced by features such as the geographical location of producers and
consumers or product characteristics (e.g. the number of ingredients, whether a product is
fresh or frozen - see below). Production and distribution factors that may affect the risk of
fraud include:
Long/complex supply chain: this occurs when products have a large number of
ingredients, when ingredients have in turn several other ingredients and/or a large
number of companies are involved in the supply chain. Complexity in the supply
chain is difficult to manage for the producer (which incurs the risk of purchasing
adulterated ingredients) and to audit by buyers or authorities.
Rapid increases in supplies and sales: this may occur due to changes in consumer
preferences. A particular food could turn into a “super food” overnight because a
news article has argued that it is particularly healthy. A sudden surge in demand for
that particular food is more likely to be met by an increase in the supply of the
adulterated version of the product.
22
http://www.cnbc.com/id/100838720 23
Dennis, J., & Kelly, S. (2013) “The identification of sources of information concerning food fraud in the UK and elsewhere”.
Factors that Affect the Risk of Fraud
19
Point of entry into the UK: some ports are suspected to have less strict checks and
border controls as others. This could encourage fraudsters to import the adulterated
ingredients from those particular ports as they are less likely to be detected.24
Supply chain assurance: a number of retailers and wholesalers use sophisticated
operations systems that ensure that all their suppliers satisfy the necessary
requirements and the food they sell to the final consumers is as authentic as
possible. Nonetheless, some retailers and distributors may not have the necessary
resources to engage in such detailed audits, thus the products that they sell to the
final consumers are more likely to be adulterated.
c. Product characteristics and detection technologies
The physical characteristics of food products may influence the ease of committing fraud
as certain adulterations are more difficult to detect. In addition, technological and
economics constraints exist in the detection methods available to local authorities. The
present section discusses these two sets of factors.
Product characteristics
These characteristics would include supply chain drivers such as country of origin, shelf-
life, amount of time required for production and trade route complexity, as well as other
variables such as the physical similarity between the authentic and fraudulent products.25
Physical state of the food: the physical state of the food affects the chances of
detection by buyers and authorities. The Outsmart project has identified the
following product characteristics that might make detection difficult (in decreasing
order of difficulty): liquid, ground, prepared, powder, mixed consistency, non-
characteristic colour, homogeneous consistency, dried, colourless and frozen.26
Sold and transported in bulk processed form: if the product has already been
processed it becomes more difficult for the retailer to notice if anything suspicious
has gone into the production of the good.
New product: there might be a short period, after the introduction or rapid increase
in the scale of a product, where detection is less likely. This might be because there
is little experience and proper enforcement procedures might not be in place. That
could be the case, for example, for products that increase suddenly in popularity
having been the subject of health and/or beauty claims.
Cheaper adulterant not easily detectable: such an example is provided by fish of the
same species. It is challenging to distinguish farmed salmon from wild salmon
24
http://www.sundaypost.com/news-views/uk/after-horsemeat-scandal-food-fraud-is-still-rife-1.457754 25
For example, horsemeat can be distinguished from beef via DNA testing, whilst the difference between wild and farmed salmon or organic and non-organic products might be more difficult to test.
26 NSF Safety and Quality UK Ltd (2014, “Risk Modelling Of Food Fraud Temptation - 'Outsmart' Intelligent
Risk Model Scoping Project”.
Factors that Affect the Risk of Fraud
20
based on DNA testing. Thus, the fraudster has a higher incentive to switch the two
fish and increase his profit margins
Cost of detection methodologies: the more expensive it is to detect the presence of
an adulterant in a product (see below) the higher the probability of fraud.
Low concentration of the adulterated ingredient: a smaller the amount of the
ingredient being adulterated might reduce the probability of detection.
Cost of the process of adulteration: in addition to variable costs (i.e. costs that
depend on the volume of production, such as the amount of adulterant), there might
be fixed costs. In order to achieve the adulteration the fraudster might need to
invest in a new type of technology, which could be expensive enough to deter him
from committing fraud.
Labelling/tamper proofing: the easier it is to remove or recreate the label of the
original product and attach it to the adulterated product the higher the probability of
fraud. This is not confined to the label, it also includes the packaging of the
authentic food product.
Detection methods
There is a wide variety of methods available to local authorities to test for the authenticity
of food products. Some of these methods, such as immunoassays, microscopy and
analysis of nitrogen content have a low cost associated to them. However, many other
detection technologies exist that have a higher cost and, consequently, are used less
frequently. Despite the fact that these technologies exist, cost barriers might restrict their
availability. Examples of such technologies are:27
Stable isotope ratio analysis (SIRA): geographic origin, production method. This
method is used more for agricultural products where nutrients are drawn from the
earth and is generally restricted to a certain geographical area. It includes the ID of
Italian tomatoes as Italian areas have a different isotopic profile when compared
with other countries. In the case of counterfeit wine, using isotopic analysis is the
most appropriate way to help assess the extent of fraudulent wine present in the UK
market.
DNA methods: used for species/variety identification e.g. meat breeds, fish. An
interesting example of how useful DNA identification methods are is the case of
olive oil. DNA analysis can be performed to determine the species of the olive and
thus ensure that the olive oil is authentic (Pafundo et al 2005).28
Proteomics: identifies peptide biomarkers in complex samples; searched against
databases (known protein sequences) to identify protein origin. There is potential
use for this method in meat, fish and products with defined protein ratios.
27
See Rollinson, S. (2014) “The UK Food Authenticity Programme”, Presentation at the Food Fraud Analytical Tools Conference. Available at: https://secure.fera.defra.gov.uk/foodintegrity/downloadDocument.cfm?id=101
28 Pafundo, S., Agrimonti, C., & Marmiroli, N. (2005). Traceability of plant contribution in olive oil by
amplified fragment length polymorphisms. Journal of agricultural and food chemistry, 53(18), 6995-7002.
Factors that Affect the Risk of Fraud
21
Others: metabolomics (use to identify biomarkers in fruit juices), low molecular
weight compounds in cells/tissues (to obtain “finger print” profile for honey),
metagenomics (quantification of DNA in products, such as the amount of a species
DNA in composite animal products) and lectin chips (analysis of glycoproteins,
glycolipids and polysaccharides used for cheese and milk adulteration analysis).
d. Institutional and enforcement characteristics
Measures of the level of enforcement and other institutional characteristics could provide
information about the probability of detection. These variables could be obtained, for
example, from data on past investigations and the legal framework for particular products.
Potential variables of interest include:
Testing frequency: this refers to how often tests are conducted. More frequent
testing would increase the probability of detection. Similar results were reported in
the tobacco industry where, according to a report by Deloitte, the amount of illicit
tobacco trade decreased when authorities increased the level of enforcement.29
Testing intensity: this could be captured by the number of producers that are
surveyed. A higher testing intensity would increase the probability of detection.30
Penalties in case of detection: these might include direct monetary penalties,
reputation effects (loss of trust by buyers/consumers), seizure, prohibition to
continue trading and/or prosecution. If the adulteration has health risk
consequences penalties might be even more severe. Liang and Jensen (2007)31
find that, in a theoretical framework, using severe enough penalties in case of
detection could completely eliminate the probability of fraud.
Consumer effects: the level of harm inflicted on consumers might affect the costs of
committing fraud beyond the formal penalties applied in case of detection. These
wider responses might include increased government intervention (e.g. in the form
of additional regulation), consumer retaliation (e.g. the existence or suspicion of
fraud might lead consumers to be reluctant to purchase certain products or
purchase from certain outlets) and reputational damage.
Association with organised crime: some authors claim that food fraud might be
closely related to organised crime.32 It might be possible to identify this link from tax
and financial data, although this would generate extra costs to the local authorities
and such data may not even be available to other institutions other than the police.
The FSA Food Fraud Database, however, can prove a useful tool as it documents
all possible suspicions that have been reported regarding a given supplier. Thus,
29
http://www.bata.com.au/group/sites/bat_7wykg8.nsf/vwPagesWebLive/DO7WZEX6/$FILE/medMD8EHAM5.pdf?openelement
30 We note that the level of monitoring conducted by local authorities and other enforcement agencies are
constrained by the detection technologies available to them. The preceding section provides details of these methods.
31 Liang, J., & Jensen, H. H. (2007). Imperfect food certification, opportunistic behaviors and
detection. Selected Paper, 175174. 32
See, for example, the Elliott review.
Factors that Affect the Risk of Fraud
22
the more reports that are gathered that mention the same individual/company the
higher the probability that the individual is a fraudster and should be investigated.
This finding is corroborated by the findings of Dennis and Kelly (2013).33
33
Dennis, J., & Kelly, S. (2013) “The identification of sources of information concerning food fraud in the UK and elsewhere”.
Methodology
23
5. Methodology
a. Selecting a methodology
Based on the literature review we have considered several methodologies that have been
used in the past to quantitatively estimate the risk of fraud. A description of these can be
found in Annex II.
These were grouped into three broad categories:
Risk indices: these are statistical measures represented by numbers placed on a
given scale that identify how high the risk of food fraud is. Indices take into account
a number of factors that have been identified to have a significant effect on the
probability of fraud. Each variable is given an appropriate weight in determining the
risk of fraud and then all the variables are aggregated to construct the index.
Econometric methods: these methods employ mathematics and statistical methods
to economic data in order to give empirical content to economic relations. The
objective of econometrics is to identify (causal) relationships in economic data. The
main tool used by econometricians is regression analysis which is described in
more detail in the appendix.
Data mining: these are computational methods mostly used when there is access to
large data sets, e.g. banks observe thousands of daily transactions when trying to
identify credit card fraud. The method employs computerised processes to identify
patterns in the data. For example, these methods might flag behaviour that poses a
deviation from the existent patterns.
We have established criteria for evaluating the methodologies, also taking into account the
data sources available. These criteria are that:
The methodology addresses the desired objectives. In other words, it needs to
identify which factors are relevant to the determination of the fraud risk and to
quantify the relative importance of each. The latter objective would be important to
assess the forward-looking risk of fraud based purely on the identified factors.
It is feasible to satisfy data requirements of the methodology using the data sources
available.
It must be feasible to implement within the time constraints of the project.
The methodology would ideally be applicable both to a single product and multiple
products. For example, the methodology must be testable via a case study.
However, if it is applied to further food products / types of fraud in the future the
methodology should be able to estimate the impact of the differences across
product characteristics.
The methodology efficiently exploits the information available in the data. That is, a
methodology is preferable when it assumes magnitudes or relations in the variables
that can be derived from or verified by the data.
Methodology
24
After the review of literature and data sources we have arrived at the following conclusions
when comparing the three approaches listed above:
An econometric approach would produce estimated coefficients that may be used
for prediction of the explained variable. In addition, various statistical methods could
be used to evaluate whether a particular model is a good fit for the data and,
consequently, to what extent the model is addressing the project objectives
satisfactorily.
Data mining methods (such as clustering) typically does not provide a quantification
of the relative importance of different factors. These methods are well-suited for
identifying patterns and hypotheses from existing data. However, the lack of
quantification would limit the model’s predictive power for values of the explanatory
variables that have not been observed in the past. Moreover, these methods might
not be well-suited to compare estimations for different products and obtain general
lessons.
With the exception of the most basic techniques for cluster analysis, data mining
methods typically require a very large number of observations to obtain robust
conclusions. Given our review of available data sources in Annex III, we believe that
this project will not gather sufficient data for any possible case study that would
satisfy these requirements.
The creation of ad-hoc indices does not require large amounts of data. However,
this approach relies on a number of arbitrary judgements (e.g. the weight that is
given to each of the explanatory variables). Econometric methods would estimate
what is the corresponding explanatory power of each variable depending on the
data, extracting more information from it.
Based on the criteria and conclusions above, we believe an econometric methodology
would provide the best available approach to predict the risk of food fraud given the
existing constraints. Compared to data mining techniques, this methodology would allow
for a clear quantification of the effects that can be validated using a variety of statistical
tests. Moreover, the quantification of past effects would feed straightforwardly into the
prediction of future fraud. Compared to ad-indices, econometric methods identifies the
factors that have proven to be relevant to predict fraud from the data and determines their
relative importance. While the econometric approach would be more demanding in terms
of data than ad-hoc indices we believe that there will be sufficient data available to conduct
the analysis. Despite having noted the limited data on past fraud available at the moment
the number of observations will be less constraining in the near future.34
34
For example, the case study on Basmati rice in this report has data on past authenticity tests conducted in the UK for 21 months, which might be considered a bare minimum number of observations to obtain statistically significant results. However, at the present rate the sample size would double in two years time, considerably increasing the reliability of the statistical results.
Methodology
25
b. An econometric methodology
The main tool used in econometrics is regression analysis. Regressions estimate the
correlations between an explained variable and (potentially multiple) explanatory variables,
quantifying the relationship between an explained variable and the explanatory variables.
The sign of the estimated coefficients determine the direction of the effect that the
explanatory variables have on the explained variable.
The econometric methodology detailed below follows previous economic approaches to
fraud such as the one applied by Manuela and Paba (2010) to address credit card fraud.35
The methodology presented in that report applies to the case of food fraud. However, we
note that the literature that applies econometric methods to fraud generally follows a
similar approach.
Choice of variables
The review of the literature has identified a large number of variables that could be used
as explanatory variables of past fraud and, consequently, predictors of future fraud. We
consider that our model is well suited to include such variables. In addition, our data
scoping exercise (see Annex III) indicates that it could be feasible to include the following
key variables:36
Explained variable: the most important variable of our methodology is the history of past
fraud, since correlation between this and other variables are the main source of the
methodology’s predictive power. The history of past fraud could potentially be measured in
different ways. For example, it could be a binary variable that takes the value of one if
fraud was detected in the same period and zero otherwise. A more sophisticated measure
would be the percentage of products surveyed that were found to be fraudulent.
Explanatory variables: these are observed variables which have been identified either by
the literature or by empirical findings that can have an effect on the risk of fraud and thus
can “explain” the probability of food fraud. The objective of the methodology will be to
establish and quantify the relationship between the explanatory variables and the
explained variable (i.e. the risk of fraud).
Country of origin of products: food safety regulations and their enforcement vary by
country.
Prices of authentic product and adulterant: fraud is, by definition, economically
motivated. When the price of the authentic product is significantly higher than the
price of an adulterated product, the benefit of committing fraud is given by the gap
between them (ceteris paribus). Prices can be measured in absolute terms for each
35
Manuela, P. and Paba A. (2010), "A discrete choice approach to model credit card fraud". 36
The availability of these variables would depend crucially on the product and type of fraud chosen.
Methodology
26
ingredient as levels or indices or in relative terms between the authentic ingredient
and adulterant as differences or ratios.
Volumes: these may also have an impact on the incentive to commit fraud. A
shortage of a particular product could drive its price up, thereby increasing the
incentive to commit fraud and produce more adulterated products. Increases in
demand or reductions in supply can create unsatisfied demand (at least in the short
term) and create an incentive for fraud. For instance, such imbalances are likely
when the particular fraud is more vulnerable to environmental conditions.
Conversely, unexpected availability of a cheap ingredient might incentivise its use
as an adulterant of a more expensive one. The key variables of volume to be
included are:
o Production: domestic and in country of origin.
o Consumption: domestic and in country of origin.
o Trade: particularly imports from producing countries to UK.
Rapid changes in the above variables: this may occur due to changes in consumer
preferences. For instance, a particular food product could become very popular
suddenly due to alleged health benefits in the media. A sudden surge in demand for
that particular food may be met by an increase in the supply of the adulterated
version of the product.
Level of enforcement, measured by the intensity of testing: more frequent and
intense testing can increase the probability of being caught, thus decreasing the
incentive to commit fraud.
Product specific variables: these are the idiosyncratic characteristics of a product
that make it more susceptible to food fraud:
o Cost of testing adulteration: higher costs of detecting the presence of an
adulterant in a food product might decrease the intensity of testing and
increase the probability of fraud. Therefore, local authorities with low
resources to audit the food ingredients are more likely to be supplied with
adulterated food.
o Points of entry the UK typically used: it has been reported that fraudulent
products are more likely to use points of entry with certain characteristics,
possibly due to lower likelihood of detection.
o Physical characteristics of the product such as:
State of the product (e.g. minced, frozen): the physical state of the
food affects the chances of detection by buyers and authorities. The
state of the product could be, for example, liquid, ground, prepared,
powder, mixed consistency, non-characteristic colour, homogeneous
consistency, dried, colourless or frozen.
Shelf life: may affect the probability of detection. In addition, the
financial risk borne by producers or retailers differs if the product is
perishable, which may in turn affect their incentives to commit fraud.
Methodology
27
o Form in which the product is traded. For example, it could include the
percentage typically sold in bulk. If the product has already been processed it
becomes more difficult to detect potential adulteration
We recognise that the literature has identified additional variables that are presumed to be
very relevant in the determination of the risk of fraud. A prime example are the
characteristics of the supply chain, such as its length, complexity and visibility (i.e. the
ability of a supplier/retailer to conduct quality assurance of links far removed from them).
Unfortunately, some of these variables are difficult to establish due to issues such as the
following:
They do not have unambiguous or commonly agreed definition. For example,
complexity of the supply chain might be related to factors such as the number of
ingredients and the number of suppliers or their geographical locations. However, a
precise definition is not available.
They are not measurable. While some commentators emphasise the effectiveness
of supplier quality assurance some crucial aspects of this process are not
quantifiable.
No reliable source of data is available. Even in the case of straightforward variables,
such as the number of suppliers, there is no systematically collected data available
to track them back in time for a particular food product.
Therefore, it may be that it is not feasible to develop a statistically-based methodology that
incorporates some of these variables into the analysis.
Data requirements
In general, for the methodology to be implemented, most data sources should be rich in
terms of time coverage and frequency, product and ingredient disaggregation and
geographical coverage.
The methodology requires suitable time series for all the variables that coincide in the time
period covered and the frequency of the data.37 It is possible that the data available for a
variable does not include the complete coverage of the dataset. For example, if variable A
is available for the period 2010-2013 and variable B for 2012-2014, a regression method
that includes these two variables can only be applied the period for which there is a perfect
overlap (i.e. 2012-2013). Therefore, when overlap is not perfect, some information will be
discarded. Based on the conceptual importance of the variable, it would be necessary to
decide whether the smaller sample size is justifiable or whether it would be advisable to
exclude the variable for which there is less data available.
37
We note that in certain cases it is possible to modify the frequency of time series by aggregation or interpolation. However, these approaches might not be advisable in all cases, depending on the nature of the variable.
Methodology
28
In addition the above criterion, some further conditions would be desired for the data on
previous incidents of food fraud. The two main criteria would be:
It contains detail about the number of investigations carried out and the number of
incidents detected. This would allow the methodology to assess the extent to which
there is fraud in a given product.
It covers a significant period of time in which potential fraud was investigated.
Differences in potential factors (such as prices) across this period would allow the
methodology to draw conclusions about the relative impact of them on the risk of
fraud.
Descriptive statistics
Before proceeding to construct an econometrics model that would identify the probability of
food fraud given a certain number of variables we run a series of diagnostics checks.
These checks would flag potential issues or biases in the estimates of the econometric
model.
The checks would include:
Whether variables are significantly correlated between them.
The minimum, maximum and average values of each of the variables.
Charts to illustrate the evolution of the key variables over time in order to ensure
that no shocks (structural breaks) have occurred during the period of time under
examination.
Model specification
Given the nature of food fraud data (in particular of the explained variable), the
methodologies we will test for consist of the three following classes of methods:
Ordinary Least Squares (OLS): this method postulates a relationship of the form:
,
where the s are the estimated coefficients which weigh the significance of each factor in
determining the risk of fraud.
Binary methods: these methods are particularly appropriate when the explained variable
can only take values between zero and one. In the case of food fraud, the explained
variable would take the value of one if fraud was detected in a given period or zero
otherwise. The method would estimate the probability of fraud using the functional form of
a cumulative probability distribution instead of a linear function, as postulated by OLS.
Multinomial methods: these models are an extension of the binary methods whereby the
explained variable can take more than two values. These values may or may not be
ordered.
Methodology
29
c. Interpretation and use of the results
Model selection
Based on the diagnostics and tests described above, the methodology would select a
reduced set of models based on:
The statistical significance of the coefficients: it is possible to establish that the
coefficient associated with key variables are different from zero with confidence of
at least 90 per cent.
The lack of data issues that might bias or provide spurious estimates (e.g.
heteroskedasticity): appendix IV present a list of the most common issues that may
bias econometric results together with tests used to detect them. The selection of a
most preferred model would take into account the extent to which these issues are
present or can be corrected.
Goodness of fit indicators (e.g. adjusted R-square): several measures exist that
establish the extent to which a particular model “fits” the data.
The interpretability of the results: complex functional forms might impair a clear and
usable interpretation of the estimated coefficients.
We note that the criteria above do not provide an automatic selection of the appropriate
model, particularly when there is a trade-off between them. A degree of judgement will be
inevitable in weighing the criteria and forming a view as to at the preferred model (or set of
models) when these are in conflict.
Limitations of the approach
The econometric methodology proposed in this report follows best practices established in
the literature of fraud and other areas of economics. Consequently, the limitations
associated with this approach have been already identified and discussed extensively.
The issues that are particularly relevant for the application of econometrics to food fraud
are the following:
Sample size. The predictive power of the methodology could be considerably
limited by the number of observations. For example, the case study presented
below is based on 21 observations. This is a very small sample size that might lead
to inconclusive results in the form of coefficients that are not statistically significant.
It is possible that the methodology does not establish a statistically proven
relationship between variables because the number of observations is low rather
than because the relationship does not exist. As shown in the case study, this
small number of observations can establish a relationship between fraud and one
explanatory variable (the price gap between authentic and adulterated products).
However, the model cannot identify statistically significant relationships with more
than one explanatory variable. A larger sample size would allow for the possibility
of establishing these relationships.
Methodology
30
Data quality. The reliability of the results of the methodology depends on the quality
of the data used to generate them. If the underlying data is inaccurate, the
estimated coefficients are likely to misrepresent the true relation between variables.
High variation in the estimated coefficients. The proposed methodology suggests
estimating several alternative model specifications, using different combinations of
variables to establish their capacity to predict fraud. Since the coefficient of the
same variable is likely to change across specifications, the outcome of this
exercise might sometimes be more accurately represented as a range, rather than
a point estimate. However, it is possible that the variation in the estimates is too
large to provide a good indication of the true magnitude of the effect of a given
variable. This problem would certainly be exacerbated in small samples.
Omitted relevant variables. Due to difficulties in quantifying some explanatory
variables or lack of available data it might not be possible to include all relevant
explanatory variables in the models to be estimated. In this case, the coefficient of
the variables that are included might be biased, because they would pick up the
effects of other omitted (but correlated) variables.
Lagged effects. It is conceivable that the effect of certain economic variables would
affect the risk of fraud only after some period of time. Therefore, the
contemporaneous coefficient might not necessarily be the best indicator of the true
effect of an explanatory variable. It is possible to address this problem by including
lagged variables. However, the form in which lagged effects take place might be
complex and difficult to capture accurately.
Prediction of future fraud
Binary choice models can be used to estimate the risk of food fraud. The key characteristic
of these models is that the dependent variable is always drawn from a dichotomous set of
options. Examples of such options include a “yes/no” answer which can easily translate
into “occurrence/not-occurrence” of food fraud.
Binary choice models are usually estimated using maximum likelihood methods. The
estimation process aims at finding the coefficients (weights) for each independent variable
that would maximise the likelihood of observing that particular sample of outcomes
(dependent variables).38 The coefficients can then be used to calculate the expected
probability of fraud, based on a set of observed characteristics.
Expected probabilities can be mapped to an “occurrence/not-occurrence” discrete
outcome by comparing them to a pre-determined threshold value (the choice of the
threshold can be made arbitrarily but an obvious options is to set it to 0.5, which means
treating as “occurrence” those observations with an estimated probability greater than 0.5,
and treating them as “non-occurrence” otherwise).
38
Logit and probit models are typically used for discrete choice models.
Methodology
31
The predicted outcomes can be compared to the actual outcomes in order to test the
accuracy of the model (this is known as measuring the model’s goodness of fit). Four
potential scenarios are generated through this process: the model correctly predicted as
“non-occurrence” situations where no fraud was observed; the model incorrectly predicted
“non-occurrence” situations with fraud; the model incorrectly predicted “occurrence”
situations with no fraud; the model correctly predicted “occurrence” situations were fraud
was observed. These scenarios are shown in the table below for A, B, C and D,
respectively.
Table 5.1: Measuring the accuracy of the model
Observed outcome
Model prediction Non-occurrence Occurrence
Non-occurrence A B
Occurrence C D
The method is accurate when fraud was not observed if the predicted risk is low and fraud
is observed when the predicted risk is high. Therefore, the accuracy measure of the
prediction is given by (A+D)/(A+B+C+D). The measure of accuracy could range between 0
and 100 (in a model that predicts fraud perfectly B and C would be equal to zero).
Similar predictions can be made using out-of-sample data. In this situation, the coefficients
of the model would be used to calculate the expected probability of fraud based on a set of
observed characteristics for which there is no observed outcome of fraud. As we see in the
example below, this can be used to forecast a “high” or “low” probability of occurrence for
a set of predictors (for example, variables related to observed differences in prices of real
and adulterated products).
d. Single and multiple products
Applying the methodology to a single product or type of fraud, as is done in the case study
below, introduces certain limitations to the number of variables that can be included. In
particular, it does not allow for testing the effects of features that are characteristics of the
product, market or testing technologies that do not vary over time. Any statistical method
would, by necessity, exploit variation in these characteristics in order to assess their
impact on the likelihood of fraud. However, if these features do not change, the
econometric method would not be able to attribute any impact.39
39
Technically, the regression constant would capture the impact of all time-invariant variables in the case of a single product.
Methodology
32
Applying the methodology to multiple products would allow to test for the impact of certain
effects that are not possible to test with a single product. These include:
Physical characteristics of the product.
Points of entry into the UK.
Cost of testing the particular adulteration type.
Based on the scope of the project, the case study below implements the methodology for a
single type of fraud (and product) – adulteration of Basmati rice. Therefore, this test case
is not able to estimate the effect of the variables listed above. A future extension to
implement multiple products simultaneously is highly recommendable. The estimation of
regressions with multiple products simultaneously would require the use of panel
techniques since the variables of interest would vary not only with time but also with the
different products considered. For most of the methods described above, estimators exist
that address this additional level of complexity.
e. New types of fraud
The methodology described above depends crucially on having data on past instances of
fraud and establishing the relationship between fraud and other variables. Once these
effects are estimated, prediction of future fraud is performed under the assumption that
these relations persist over time. However, it is not unusual for authorities to discover new
types of fraud that had not been detected in the past. A direct application of the
methodology presented above would not be feasible in these cases.
If the methodology is applied to multiple types of fraud for which past data is available, it
might be possible to use these estimates for an indirect approach to new types of fraud.
We note that this approach would not be as reliable as the direct approach proposed
above. The only case in which it would be advisable is for fraud in products that had not
been detected before and, therefore, have no data available to implement the alternative.
The indirect approach would rely on:
Using the multiple product estimation of known types of fraud to quantify the effect
on fraud risk of each of the product characteristics for the new product / type of
fraud.
Construction of an index for the new type of fraud accounting for the effect of its
product characteristics and the effect of other economic variables shown to be
significant in the estimations for known types of fraud.
We note, however, that this indirect approach would have much less statistical reliability
that the direct approach, since it would increase the risk of bias due to the omission of
relevant variables, that would be significant only for the new type of fraud but not those
estimated directly.
Methodology
33
f. Comparison of the proposed approach and the literature
The methodology proposed in this section is firmly based on the same principles applied
by the econometrics literature on fraud. Table 5.2 provides a comparison with each of the
individual reports that were identified. As it can be seen in the comparison column, the key
features of the approach proposed in this report agree with the methods used elsewhere in
the literature.
There is ample overlap between the literature and the proposed methodology. In
particular, the literature has extensively used OLS and binary choice models (such as logit
or probit) with past fraud as the explained variable. Some reports have suggested the use
of panel estimation techniques. While these are not part of the main approach proposed
above, their use would become necessary if multiple products are included in the analysis.
Finally, our main approach does not include tools that correct for sample selection issues,
as proposed by Artıs, Ayuso and Guillén (1999) and Greene (1998). However, if there
were evidence of a sample selection bias, the proposed methodology could be extended
as suggested by these reports to correct for this problem.
Table 5.2: Comparison with the econometric literature on fraud
Name of Article Authors Fraud
Area
Econometric approach taken Comparison
A discrete choice
approach to model
credit card fraud
Manuela, P.
and Paba A
Credit
Card
This report uses binary choice
models with fraud as a
dependent variable and a set of
explanatory variables such as
gender, location and currency
used for transactions
The approach used in this
report is closely related to the
one proposed in this section.
Sample selection in
credit-scoring
models
Greene, W. Credit
Card
The paper employs a binary
choice models to decide
whether to extend credit or not.
Additionally, it suggests an OLS
regression model for predicting
expenditures.
The proposed method consists
also of a combination of OLS
and binary choice regressions.
The dependent variable is also
a key past outcome: whether a
credit was extended.
Modelling different
types of automobile
insurance fraud
behaviour in the
Spanish market
Artıs, M.,
Ayuso, M., &
Guillén, M.
Insurance Maximum likelihood estimation
with the correction for choice-
based sampling in order to take
into account the effect of the
over-representation of fraud
claims
The method suggested above
employs logit estimations
which are based on maximum
likelihood functions. It does not
include a correction for sample
selection.
Methodology
34
Name of Article Authors Fraud
Area
Econometric approach taken Comparison
The Economic
Impact of
Counterfeiting and
Piracy
OECD Consumer
Goods
The authors construct an index
known as the General Trade-
Related Index of Counterfeiting
for products (GTRIC-p) based
on econometric techniques. This
index estimates the total number
of counterfeiting based on
seizure outcomes.
The report uses OLS
regressions to establish the link
between counterfeiting and key
factors, such as institutional
variables.
Economic
institutions and
individual ethics: A
study of consumer
attitudes toward
insurance fraud
Tennyson, S. Insurance Ordered probit and OLS
regressions to link various
factors to measured attitudes
towards fraud.
The methods (OLS and
ordered probit) are aligned with
the ones proposed above. A
key conceptual difference is
that the explained variable is
the attitude towards fraud by
consumers. This difference is
due to the fact that this report
addresses a different research
question.
Detecting
counterfeit
antimalarial tablets
by near-infrared
spectroscopy
Floyd E.
Dowell,
Elizabeth B.
Maghiranga,
Facundo M.
Fernandez,
Paul N.
Newton and
Michael D.
Green
Pharmace
uticals
Regressions and indices Similarly to the proposed
methodology, this report also
used regressions to detect the
probability of food fraud and
then developed an index which
shows the risk level of fraud.
Analysis of the
demand for
counterfeit goods
Pamela S.
Norum
Luxury
goods
T-tests and logit regressions Despite trying to quantify a
different effect, this report
makes use of t-tests and logit
regression in a similar manner
as the proposed methodology.
Estimating dynamic
demand for
cigarettes using
panel data: the
effects of
bootlegging,
taxation and
advertising
reconsidered.
Baltagi, B. H.,
& Levin, D
Tobacco Panel Data Techniques The methods used by this
report are closely related to the
proposed approach in the case
with multiple products, where
panel techniques are
recommended.
Methodology
35
Name of Article Authors Fraud
Area
Econometric approach taken Comparison
The Research and
Application of Art
Price Index
Danting
Chang
Art /
Paintings
Indices - the "Art Price Index" Despite not having a fully
econometric approach, this
report constructed indices to
classify the different degrees
risk of fraud, similarly to the
proposed methodology.
On the Economics
of Adulteration in
Food Imports:
Application to US
Fish and Seafood
Imports
Sébastien
Pouliot
Food Simulations and Calibrations.
The paper wants to simulate the
effect of the Mexican Gulf
disaster on food fraud. Thus he
creates a theoretical model and
simulates using the values
estimated by the literature for
the relevant coefficients
This report does not follow an
econometric approach.
However, it uses OLS
regressions, as the ones
proposed here, to estimate
some of the parameter values
used for the simulations.
Observations on
economic
adulteration of high-
value food
products: The
honey case
Fairchild, G.
F., Nichols, J.
P., & Capps,
O.
Food Analysis of price and revenue
impacts of honey adulteration.
Rather than providing
economic intelligence, this
report aims at quantifying
economic impacts of fraud.
They use econometric results
that estimate demand
elasticities for honey.
Case Study: Basmati Rice
36
6. Case Study: Basmati Rice
For the purposes of testing the validity of our methodology we have chosen Basmati rice to
be our case study. The reasons for doing so were:
Data availability of economic data on prices, quantities and exports of Basmati rice
for a long enough period to allow the possibility of conducting econometric
analyses.
Documented instances of food fraud in the past (relative to other products) which
provide a minimum number of observations for the sample.
Well defined dates of when the fraud has occurred. Using data provided by the FSA
it was possible to identify the precise dates that the fraud had occurred or been
detected.
Clarity of the definition of the product. During the process of choosing the most
appropriate case study we had to reject a number of candidates because the
definition of what constitutes the specific product was too broad (e.g. vodka or fish)
or ambiguously defined. While failing to satisfy this condition would not be an
insurmountable obstacle, it would require additional effort to ensure that the data
employed is consistent in their definitions.
a. Global market for Basmati rice
Basmati rice is a popular variety of rice. It has a legally enforced regional denomination,
which means that it can only be produced in India and some parts of Pakistan. In India
Basmati rice is grown in the states of Punjab, Uttar Pradesh, Haryana and Uttaranchal, in
Pakistan it is only grown in the Punjab area. The name Basmati means “the fragrant one”
which indicates that it has a distinct and pleasant aroma. The grain is long and slender and
when it is cooked it becomes longer and acquires a dry and fluffy texture.
Basmati is one of the most expensive grains in the market. More specifically, on average
Basmati rice yields double the price of other types of rice. In 2012 the average price of
Basmati rice stood at about $1000 USD per metric tonne whereas the average price of
other price varieties was about $600 USD per metric tonne.40
According to research conducted by Horizon Research, the global rice industry is
approximately worth $275 billion USD, out of which $5.8 billion or 2.1 per cent is attributed
to Basmati rice.41 In 2012 India accounted for about 72% of the world production of
Basmati rice which is about 4.8 million metric tonnes. Pakistan accounted for the rest –
about 1.9 million metric tonnes. It is estimated that the demand for Basmati rice has grown
at an average of 10.5% between 2001 and 2012. This is a significant growth rate
compared to the one for non-Basmati rice which stands at about 1.2% per annum.
40
http://www.apeda.gov.in/apedawebsite/index.asp 41
http://horizonresearchpartners.com/wp-content/uploads/2012/08/Indian-Basmati-Rice-Industry-7-26-12.pdf
Case Study: Basmati Rice
37
Horizon research has found that rice producers are seemingly identical and are
characterised by the following features:
Production of rice requires high working capital availability.
The producer needs to be able leverage its debt quite highly because Basmati rice
takes time to age and be ready for sale.
Producers have a limited pricing power.
Limited brand recognition.
b. UK market for Basmati rice
In the UK, rice is an important staple for the average household. According to Mintel, sales
of rice in UK in 2010 were worth £415m.42 CBI has found that the UK is one of the largest
consumers of milled rice in the EU.43 UK households consumed about 268 tonnes of rice
in 2004, which represents a 59% increase in rice consumption from 2003. Furthermore, in
2003 the UK imported about 70% of the total imports of Basmati rice in the EU.44 Basmati
rice is imported into the UK either directly from India and Pakistan or indirectly from millers
in the Netherlands, France or Belgium. In 2004, sales of Basmati rice were increasing by
about 12% annually and were expected to overtake sales of other long grain rice in the
following years. Prices of Basmati rice follow the global trends in relation to other types of
rice. A study conducted by the Food Authenticity Programme found that the price of
Basmati rice (£1.40 per kilogram) was double the price for other varieties of rice (£0.70 per
kilogram on average).
The increasing popularity of the grain and significant price differential between the Basmati
rice and other types of rice have made it a good target for food fraudsters to adulterate.
The most common type of food fraud we observe in Basmati rice is adulteration of the
authentic grain with other types of rice. In the UK there is not specific legislation regarding
the authenticity of Basmati rice, however, under the Food Safety Act it is illegal to sell
“food that is not of the nature, substance or quality demanded by the consumer or to
falsely or misleadingly describe or present food”.45 In the UK, the term Basmati should only
be used to describe the 11 Indian varieties and 5 Pakistani rice varieties that are
characterised by the Basmati properties. Despite this legislation, a number of fraud
instances were detected in the past. The majority of them were detected by the Food
Authenticity Programme. This dataset shows that during 2012, 3 out of the 33 samples
collected by the FSA were found to be fraudulent.
42
http://www.marketingmagazine.co.uk/article/1071314/sector-insight-pasta-rice-noodles 43
http://www.cbi.eu/system/files/marketintel/201020-20Rice20and20pulses20-20UK1.pdf 44
http://multimedia.food.gov.uk/multimedia/pdfs/fsis4704basmati.pdf 45
http://www.legislation.gov.uk/ukpga/1990/16/contents
Case Study: Basmati Rice
38
c. Basmati rice adulteration
Basmati rice is the customary name given to specific varieties of rice with unique
organoleptic characteristics and grown exclusively in the northern part of the Western
Punjab in both Pakistan and India; and in Haryana State and Western Uttar Pradesh in
India. Due to these organoleptic qualities, Basmati rice attracts a premium price, hence its
attractiveness to potential Economically Motivated Adulteration (EMA). It has been
reported that, since 2002, Indian traders have been selling a lesser quality rice, CSR 30,
as Basmati rice in major markets such the US, Canada and the EU.46 Rice exports in India
are exempt from the duty accorded to pure Basmati in the EU, making it even more
profitable for fraudsters to adulterate Basmati rice. The authentic stock of traditional
Basmati grain usually gets depleted on Indian farms (i.e. there is an excess demand for
this product). Ricesearch, a DNA rice authenticity verification service in India, has found
that more than 30 per cent of the Basmati rice sold in the retail markets of the US and
Canada is adulterated with inferior quality grains. It is suspected that this number may be
higher in Europe.
In Europe, Commission Regulation 1549/04 grants a lower import tax on nine basmati
varieties: Basmati 370, Dehradun (Type 3), Basmati 217, Taraori, Ranbir Basmati, Kernel,
Basmati 386, Pusa Basmati and Super Basmati. Other basmati rice varieties approved by
India, Pakistan and the UK include Basmati 198, Basmati 385, Haryana Basmati, Kasturi,
Mahi Suganda and Punjab Basmati; and are outlined within the Basmati Rice Code of
Practice, agreed between the UK, Indian and Pakistani industry and enforcement bodies.
This code of practice also allows for the inclusion of no more than 7% non-Basmati rice
content.
The determination of Basmati rice varieties follows established testing protocols developed
by the UK FSA, using DNA based analysis to obtain a qualitative (positive or negative
presence) and quantitative (percentage of basmati and non-basmati DNA) result. Results
obtained from samples are compared against known references based on each of the
approved varieties of Basmati rice. From this, a determination is made on authenticity and
amount of basmati rice present in the sample. The associated costs per sample are
between £150 and £200 pounds before any courier costs, which may be a significant
expense in the budget of local authorities. Therefore, these costs are unlikely to be borne
unless suspicion of EMA exists or unless specific funding is provided.
d. Data
For the purposes of testing our methodology using a case study we have used various
datasets from a number of publicly available sources. We have mainly used data from the
World Bank for our macroeconomic indicators (GDP and CPI) in order to avoid any issues
46
http://articles.economictimes.indiatimes.com/2007-07-06/news/28467196_1_basmati-rice-india-s-basmati-basmati-export
Case Study: Basmati Rice
39
of harmonisation. Our main source for the variables relating to Basmati rice coming from
India has been the All India Rice Exporters Association. Our datasets are presented in
more detail in the table below. For some missing values and for some data available at a
lower frequency (e.g. yearly instead of monthly), we have employed interpolation
techniques.
Table 6.1: Case study data sources
Data Source Variable Dates Freque
ncy
Units Link
Food
Standards
Agency
(FSA)
Basmati Rice
Samples Tested
2010-2014 Monthly Number
of tests
http://www.food.gov.uk/
enforcement/enforcewo
rk/foodfraud/foodfraudd
atabase#.U7wkn-
kU_Gg
Number of Tests
Failed
2010-2013 Monthly Number
of
instances
of fraud
http://www.food.gov.uk/
enforcement/enforcewo
rk/foodfraud/foodfraudd
atabase#.U7wkn-
kU_Gg
All India Rice
Exporters
Association
Price of Basmati
Rice India
2010-2013 Monthly USD per
MT FOB
http://www.airea.net/pa
ge/53/statistical-
data/basmati-rice-
monthly-average-price-
analysis
Exports of Basmati
Rice from India to
UK
2011-2014 Yearly MT, Value
in Rs.
Lacs
http://www.airea.net/pa
ge/58/statistical-
data/export-statistics-
of-basmati-rice
Quantity of
Basmati rice
produced in India
2010-2013 Monthly MT http://www.airea.net/pa
ge/53/statistical-
data/basmati-rice-
monthly-average-price-
analysis
Exchange rates
(US dollar to the
Indian rupee,
Pakistani rupee
and U.K. pound
sterling
2010 - 2014 Daily Monthly
data were
obtained
by simple
average
of daily
data
http://www.imf.org/exter
nal/np/fin/ert/GUI/Page
s/CountryDataBase.asp
x
APEDA Agri
Exchange
Price of Basmati
Rice Pakistan
2008-2014 Monthly USD per
MT
http://agriexchange.ape
da.gov.in/int_prices/inte
rnational_price.aspx
Case Study: Basmati Rice
40
Data Source Variable Dates Freque
ncy
Units Link
Mundi Index Price of non-
Basmati Rice
2010-2014 Monthly USD per
MT
http://www.indexmundi.
com/commodities/?com
modity=rice&months=6
0
World Bank GDP UK, India,
Pakistan
1980-2014 Yearly Per capita
value at
current
prices in
USD
http://data.worldbank.or
g/indicator/NY.GDP.MK
TP.CD
Department
for
Environment,
Food and
Rural Affairs
Consumption of
rice in UK (dried
rice, cooked rice
and take-away
rice)
1974-2014 Yearly gr per
househol
d per
week
https://www.gov.uk/gov
ernment/statistical-
data-sets/family-food-
datasets
e. Descriptive statistics
Before conducting the econometric analysis, we exhibit a set of descriptive statistics in
order to gain a preliminary understanding of the characteristics of the data. The summary
table below estimates the minimum, maximum and mean values of our variables together
with the time range that they cover. These statistics are summarised in Table 6.2.
Table 6.2: Summary statistics
Variable Description Min.
Value
Max.
Value
Mean Number of
Obs.
Period
Samples Number of samples
tested
0 45 3.1 53 Jan-10 -
May-14
Non
compliance
Number of non-
compliant samples
0 4 0.3 53 Jan-10 -
May-15
Fraud
percentage
Percentage of fraud
=non-
compliance/sample
s
0 1 0.1 21 Jan-10 -
May-16
Fraud binary Fraud was
detected=1, binary
variable
0 1 0.4 21 Jan-10 -
May-17
Basmati rice
production in
India
Basmati rice
produced in India in
Metric tones
164004 418782 277088 36 Apr-10 -
Mar-13
Case Study: Basmati Rice
41
Variable Description Min.
Value
Max.
Value
Mean Number of
Obs.
Period
Basmati price
India
Price of Indian
Basmati in USD per
Metric Tonne
845 1209 1049 36 Apr-10 -
Mar-14
Non-basmati
rice world
price
World price of non-
basmati rice in USD
per Metric Tonne
404 616 529 50 Apr-10 -
May-14
Consumption
of rice UK
Rice consumption in
UK in kgs
25105 25440 25275 36 Jan-10 -
Dec-12
GDP Pakistan GDP in Pakistan in
millions USD
85 108 100 48 Jan-10 -
Dec-14
GDP UK GDP in UK in
million USD
3048 3279 3198 48 Jan-10 -
Dec-15
Exported
quantity of
Basmati from
India
Quantity of Basmati
rice exported from
India to UK in
tonnes
5245 7083 6292 36 Apr-11 -
Mar-14
Average price
Pakistan
Average price of
Pakistani rice in
USD per Metric
Tonnes
860 1407 1174 50 Apr-10 -
May-14
High risk If fraud percentage
is >0 then the
sample is of high
risk
0 1 0.4 21 Jan-10 -
May-15
On first inspection we see that the average price of Basmati rice is twice as high as the
price of non-Basmati rice. On average, there is a 10% probability that a given sample of
Basmati rice tested by the authorities will be adulterated. Additionally, we observe that the
UK has the highest GDP per capita relative to the producing countries (almost thirty times
higher than GDP in India).47 Finally, we observe that the average price of Indian Basmati
rice is slightly lower than the one coming from Pakistan, with a mean value of 1049 USD
per MT in comparison with 1174 USD for the Indian one.
47
In UK GDP per capita stands at 39,350.64 USD while in India this number stands at 1,498.87 USD.
Case Study: Basmati Rice
42
We continue our analysis by looking at the linear correlation between variables. A
significant correlation between two potential explanatory variables means that using both
variables in the regression specification will lead to the problem of multicollinearity48.
An analysis of linear correlations between the variables indicates that there is a significant
correlation between our explanatory variables and our dependent variable (non-
compliance), which suggests that our candidate variables could potentially be causal
drivers of changes in the dependent variable (see Appendix V for details). However, we
also notice a number of significant correlations between the independent variables
themselves. This may imply that we would have to choose which variable is more
appropriate to go into the regression specification. More particularly, we observe that the
GDP in India is highly correlated with the one in Pakistan, indicating that multicollinearity
would be a problem if both measures were to be included in the same regression.
We look at a simple graphical illustration of the analysis of linear correlations identified in
Appendix V. More precisely, Figure 6.1 tries to establish whether there is a visible
relationship between the samples of Basmati rice that were tested and found non-
compliant and the gap between price of Basmati and non-Basmati rice (both Indian and
Pakistani Basmati rice). The graphs contain a straight line that represents the best linear fit
(according to OLS) of the data. Both the figures for India and Pakistan confirm a positive
correlation between price differences and detected fraud.49 It should be noted that the
small number of observations and the large residuals (i.e. the difference between the
observations and the linear fit) raises questions about the robustness of the identified
correlation. In order to obtain more reliable correlation estimates, it would be necessary to
have a larger number of observations. Finally, we also note that there is a risk of outlier
bias in these results. For example, it is quite possible that the observation shown in Figure
6.1, in which the fraud percentage is 100, is an extremely low probability event. However,
we have repeated the analysis excluding potential outliers and the results were not
affected significantly.
48
Multicollinearity (also collinearity) is a statistical phenomenon in which two or more variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a considerable degree of accuracy.
49 We note that it is difficult to assess the strength of this correlation by visually inspecting Figure 6.1. This is
because the appearance of the slope would be affected by the scaling of the vertical axis. A more reliable test is to evaluate the statistical significance of the corresponding regression coefficient, as it is performed below.
Case Study: Basmati Rice
43
Figure 6.1: Correlation between fraud percentage and the gap between Basmati rice prices
and non-Basmati rice (in India at the left and Pakistan at the right)
0.2
.4.6
.81
200 400 600 800 1000Price gap India
Linear fit Percentage of fraudulent samples
0.2
.4.6
.81
400 600 800 1000Price gap Pakistan
Linear fit Percentage of fraudulent samples
We note that because the data on the price of Basmati rice coming from India during the
period of April 2013 and July 2014 is missing, we have extrapolated the price series using
the Indian CPI. However, we appreciate that this might not be a satisfactory approach as it
fails to take into consideration any significant variations in prices which might explain the
increased occurrence of Basmati rice fraud during 2014. However, we do observe that the
trend of the extrapolated Indian price follows a similar pattern to the actual Pakistani price.
In any case, because the data of Pakistani Basmati rice covers a longer period, we would
prefer to base our inference on the findings that include the Pakistani rice.
Table 6.3: T-test of mean price difference of Indian Basmati rice and non-Basmati rice
Group Obs. Mean Std. Err. Std. Dev. [95% Conf.
Interval]
No fraud
detected
12 525.74 36.13 125.14 446.23 - 605.25
Fraud
detected
9 680.73 62.32 186.95 537.08 - 824.42
Combined 21 592.16 37.01 169.62 514.95 - 669.37
Difference -154.99 68.00 -297.32 -12.66
Pr(|T| > |t|) = 0.034
Error! Reference source not found. presents the results of t-tests (see methodology
section), comparing the means of price differences between Basmati and non-Basmati rice
depending on whether fraud was detected. The analysis indicates that there is a
statistically significant difference (at a 95 per cent confidence level) between the mean gap
in prices (Basmati price – Non-Basmati price) when fraud occurs versus when fraud is not
detected. In other words, the t-test determines whether it is possible that the observed
Case Study: Basmati Rice
44
mean in prices when fraud was detected cannot be distinguished to the mean in prices
when fraud was not detected (i.e. they could be realisations of the same probability
distribution). The result shows that this was not the case with at least 95 per cent of
confidence. This suggests that there is a statistically significant relationship between fraud
occurrence and Basmati price gaps.
f. Econometric results
Estimated models
We have conducted a large number of potential specifications and estimation methods to
explain the risk of fraud in Basmati rice in the UK. In this section we present and discuss
the key results of the econometric analysis. The complete series of regressions that were
estimated can be found in Annex VI.
We have tested to check whether variables such as the GDP in India, the export quantity
of Basmati rice and the total quantity of Basmati rice produced in India have a significant
effect on fraud percentage both contemporaneously and with a lagged impact. We did not
find any significant relationship between these variables and fraud percentage and
therefore we do not report them in the main body of our report. We further note that these
variables might be significant as explained by our quantitative analysis, however, due to
limited data availability we are not able to reach such a conclusion.
Table 6.4 and Table 6.5 present the results of the regressions for India and Pakistan
respectively. In both cases, regressions 1-3 are the estimates from OLS models and
regressions 4-6 are estimates of logit models. The top rows contain the coefficients
obtained for each variable while the bottom rows report basic information and tests
performed for each model.
The regressions were estimated in logarithms. Consequently, the coefficients are to be
interpreted as “elasticities”.50 Finally, the interpretation of the OLS regressions is different
from the logit regressions. First, OLS models are linear (i.e. the effect of the explanatory
variables is constant) while logit models are not. The reported coefficient of the logit model
is the marginal effect when the explanatory variable is equal to its mean value.
Second, the explained variable used in each model differs. In the case of OLS, this
variable is the percentage of samples in which fraud was detected. For logit models, the
explained variable is binary, taking a value equal to one for periods in which fraud was
detected (and tested for) and zero otherwise. Therefore, the estimated effect of logit
models is on the probability that fraud will occur at all. It should be noted that the logit
models use less information than the OLS models.
50
For example, if the estimated coefficient of variable x is y, the interpretation is the following: an increase in variable x of one per cent is correlated with an increase in the explained variable of y per cent.
Case Study: Basmati Rice
45
In addition to the variables included in Table 6.4 and Table 6.5, we have conducted
regressions using other variables for which data was obtained (i.e. the variables
summarised in Table 6.1 or others derived from them). These regressions included
variables such as the GDP in the countries that export Basmati rice (India and Pakistan),
the export quantity of Basmati rice and the total production of Basmati rice (in India). We
did not find any significant relationship between these variables and fraud percentage and
therefore we do not report them in the main body of the report. The negative results were
obtained using these variables both contemporaneously (i.e. the explained and
explanatory variables correspond to the same period) and with a lag (i.e. the explanatory
variables correspond to a period earlier than the explained variable). It is worth noting that
the analysis indicates that it is not possible to establish a statistically significant effect of
these variables on fraud given the (limited) data availability. However, these might be
found to be significant in the future if more data becomes available.
Table 6.4: India regression results
Regression number 1 2 3 4 5 6
Estimation Method OLS OLS OLS Logit Logit Logit
Constant -0.2 0.09 0.11
Price difference 0.05 0 0.84* 0.26
Change in price difference 0.97* 2.69
Number of samples 0.03 0.29*
Number of Observations 21 21 21 21 21 21
R2 0.00 0.16 0.02 0.14 0.10 0.30
F statistic 0.08 3.65 0.15 4.04 2.90 8.62
Prob > F 0.77 0.07 0.86 0.04 0.09 0.01
Heteroskedasticity (Breusch-Pagan test), chi2(1) = 0.04 6.16* 0.16
Adjusted R2 -0.05 0.12 -0.09
Akaike Information Criterion 1.11 -2.49 2.86
Note: All explanatory variables were expressed in logarithms. A variable is significant at * =90% level, **=95% level, ***=99% level. Null
hypotheses not rejected at ˆ =90% confidence level, ˆˆ =95% confidence level, ˆˆˆ =99% confidence level.
Case Study: Basmati Rice
46
Table 6.5: Pakistan regression results
Regression number 1 2 3 4 5 6
Estimation Method OLS OLS OLS Logit Logit Logit
Constant -0.82 0.14*** -0.61
Price difference 0.14 0.11 0.91* 0.18
Change in price difference -0.89* -0.70
Number of samples 0.01 0.31*
Number of Observations 21 21 21 21 21 21
R2 0.02 0.12 0.03 0.11 0.01 0.30
F statistic 0.43 2.62 0.23 3.23 0.32 8.47
Prob > F 0.52 0.12 0.80 0.07 0.57 0.01
Heteroskedasticity (Breusch-Pagan test), chi2(1) = 1.3 15.19*** 0.52
Adjusted R2 -0.03 0.07 -0.08
Akaike Information Criterion 0.73 -1.51 2.67
Note: The variable is significant at * =90% level, **=95% level, ***=99% level. Null hypotheses not rejected at ˆ =90% confidence level, ˆˆ
=95% confidence level, ˆˆˆ =99% confidence level.
Table 6.4 and Table 6.5 present six regression models. The first three correspond to OLS
regressions, while models 4-6 correspond to logit regressions. The differences within these groups
is the set of explanatory variables included in the estimated equation.
We note that the values for the R-squared and adjusted R-squared are low. This indicates that the
overall variation in fraud cannot be explained with these variables alone. However, the objective of
this exercise is to quantify the specific effect of selected variables and to determine whether this
effect is statistically significant. The models shown in Table 6.4 and Table 6.5 establish a
statistically significant relationship between the price difference and the level of fraud. Furthermore,
these results are intuitive in the sense that the direction of the estimated effect agrees with what is
expected according to the literature.
Analysis of regression output
Table 6.4 and Table 6.5 already include a subset of the full set of regressions that were
conducted, based on the statistical significance of the coefficient estimates.51 However,
there is still a large number of different variables and estimates in these tables. This
section will use the criteria set out in the methodology to select the “preferred” model (or
subset of models).
The relevant criteria for selecting models are:
51
The complete results are presented in Appendix VI.
Case Study: Basmati Rice
47
1. The signs of the coefficients are in the expected direction and consistent across
regressions. This is particularly relevant when comparing the regression outputs
from India and Pakistan.
2. The coefficients are statistically significant. While this was a general pre-requisite,
some variables might be significant in the OLS model but not in the logit estimation
(or vice versa), for India but not Pakistan (or vice versa), etc.
3. Joint significance (F test). This test checks the joint statistical significance of all the
coefficients in the regression, including the constant.
4. Coefficient of determination (R2). This measure increases as the residuals of the
fitted model (the difference between the model prediction and the actual
observations) decrease. In other words, this is a measure of goodness of fit. It
should be mentioned that these measures are constructed differently for OLS and
logit models. Therefore, they are not comparable across these two types of models.
However, the R2 can be used to choose between different regressions that use the
same estimation technique.
5. Information criteria (Akaike and adjusted R2). Similarly to the R2, these are other
“goodness of fit” measures. However, they include other criteria, such as penalising
the inclusion of each explanatory variable. These measures are more likely to select
a parsimonious model (i.e. a model that includes a small number of explanatory
variables) than the unadjusted R2. Information criteria are available for OLS
regressions, although there is no direct counterpart for binary choice models. The
Akaike criterion selects the model with the lowest value of its index while the
adjusted R2 would select the model with the highest value.
The variables that were found to be significant and therefore are included in Table 6.4 and
Table 6.5 are:
The price difference between Basmati and non-Basmati rice.
The change (relative to the previous period) in the price difference between Basmati
and non-Basmati rice.
The number of samples taken.
It is necessary to point out immediately that the number of samples taken is not an
economic variable. Moreover, the likelihood of fraud detection will increase by definition as
more samples are taken. Therefore, logit models that use the likelihood of fraud detection
as an explanatory variable will almost tautologically find that this variable is highly
significant. This is because, even in the case where the fraction of fraud is constant, more
testing will inevitably lead to more identified cases of fraud and, therefore, an increase in
the probability that at least one sample will be found to be non-compliant. In contrast, OLS
regressions use the percentage of fraudulent samples as the explanatory variable, which
is not necessarily correlated with the number of samples taken. Given the risk of spurious
correlations in these regressions and the very low number of observations in our sample,
we do not consider these models.
Case Study: Basmati Rice
48
Of the remaining OLS regressions, the coefficient of the price differential is not significant
in either Indian or Pakistan, while the coefficient of the change in the price differential is
significant in both cases. Therefore, model 2 is preferred over model 1. The opposite is
true for the remaining logit estimations: the coefficient of the change in the price differential
is not significant in either Indian or Pakistan, while the coefficient of the price differential is
significant in both cases. Therefore, model 4 is preferred over model 5.
We conduct a full analysis of the criteria detailed above for the two remaining models. The
conclusions are summarised in Table 6.6. The first criterion looks at the magnitude (and
sign) of the coefficients. In model 2, the sign of the coefficient goes in the expected
direction in the case of India (positive; i.e. a higher price gap is correlated to more fraud)
while the estimate for Pakistan goes in the opposite direction. Moreover, the difference
between both coefficients is quite considerable. Model 4, on the other hand, obtains similar
coefficients for India and Pakistan, with the expected sign in both cases. The second
criterion looks at the statistical significance of each individual coefficient. Given that our
previous argument discarded models that did not satisfy conditions along these lines, it is
not surprising that both models fulfil the criterion in both countries. The third criterion
requires joint significance of all coefficients according to the F-test. The only model that
does not satisfy the condition with at least 90 per cent confidence is the estimate for
Pakistan of model 2. The fourth criterion evaluates the estimates according to the
goodness of fit. While the R2 of OLS models cannot be compared to the one of logit
models, it would be desirable that the selected model has the highest R2 within their class.
This is indeed the case for model 2 but not for model 4, which is outperformed by model 6.
However, we do not find this a compelling argument against model 4 given our previous
discussion of the spurious nature of the results in model 6. Finally, the fifth criterion
evaluates the models according to the information measures of Akaike and adjusted R2.
While these measures are not available for logit models, model 2 outperforms the other
OLS regressions in this regard.
Case Study: Basmati Rice
49
Table 6.6: Criteria for selecting best performing econometric model
Criterion OLS (Model 2) Logit (Model 4)
India Pakistan India Pakistan
Sign of coefficients as expected
and consistent across regressions
Coefficients are statistically
significant
Joint significance (F test)
Coefficient of determination (R2)
Information criteria (Akaike and
adjusted R2) N/A N/A
Based on the argument above, we conclude that the preferred model across our sample is
one that runs a logistic regression on the price difference between Basmati and non-
Basmati rice. The corresponding model for India shows that a one percent increase in the
price of Basmati rice (using the mean observed value of this difference as a reference)52
will lead to a 0.84 percentage point increase in the probability that fraud will be committed.
The model for Pakistan shows that a one percent increase in the price of Basmati rice will
lead to a 0.91 percentage point increase in the probability of fraud. Since Basmati rice is
imported to the UK from both India and Pakistan, the average elasticity is estimated to be
somewhere in the range from 0.84 to 0.91.
Risk levels
In this section we explore how to map Basmati and non-Basmati price differences into
different levels of fraud risk. The econometric analysis above provides a link between
explanatory variables (e.g. prices of Basmati and non-Basmati rice) and past fraud. In
particular, it quantifies how movements in these variables would affect the risk of fraud.
Therefore, it would be possible to classify the risk of fraud in different categories
quantitatively. For example, low risk would correspond to a probability of fraud smaller
than 33.3 per cent, medium risk when the probability is between 33.3 and 66.6 per cent
and high risk when the probability of fraud is larger than 66.6 per cent. The econometric
52
Since the logit model is non-linear, the marginal effect is not constant and needs to be evaluated at a particular level of the price difference.
Case Study: Basmati Rice
50
model would allow for the construction of thresholds in the explanatory variables that
would lead to the risk of fraud being in each of the specified categories. Potential
applications of this model would track the (observable) variables and determine the risk
level predicted by the model.
The analysis will be based on the best performing model of our econometric estimation:
the logit model using only the price gap as explanatory variable (model 4) Table 6.4 and
Table 6.5. While there is only one relevant factor in this model, the methodology can be
directly extended to models that include multiple explanatory variables. For the sake of
illustration, we will focus on the results obtained for India.53
Figure 6.2 plots the probability of fraud predicted by our econometric model as a function
of the price gap between Basmati and non-Basmati rice. The fitted values represent the
probability of fraud occurring given the coefficients estimated by the regression and the
values taken by the explanatory variables. These values were obtained by replacing the
observed prices in the model to generate the predicted (“fitted”) probability of fraud. The
graph takes a positive slope which provides us with an indication that the higher the price
gap the higher the probability of fraud and, consequently, the higher the probability of its
detection. It can also be noted that, as a consequence of the chosen econometric model,
this positive relation is non-linear.
Figure 6.2: Predicted probability of fraud in a logit model (India)
0.2
.4.6
.8
pro
bab
ility
of fr
au
d
200 400 600 800 1000Price difference India
53
Alternatively, the prediction could be based on the results for Pakistan or a weighted combination of both.
Case Study: Basmati Rice
51
We note that it would be possible to conduct the same exercise in Figure 6.3, but in this
case we use the fitted values of a linear OLS regression model. In this case however, the
variables measured in the vertical axis are different than in Figure 6.2. In a logit estimation,
the predicted variable is the probability that fraud will be detected, irrespective of the
number of cases. The OLS model utilises the fraction of samples taken that tested positive
for fraud as their (continuous) dependent variable. Despite this difference in interpretation,
the overall shape of the curves predicting the risk of fraud is similar.
Figure 6.3: Predicted proportion of fraud in a linear model (India)
.08
.1.1
2.1
4
Fitte
d v
alu
es
5.5 6 6.5 7logarithm of price difference India
Finally we classify the Basmati rice price gap into two groups based on the probability of
food fraud: low and high risk. The threshold probability used is 0.5 (50 per cent). Based on
the logit model, the corresponding threshold in the price differential between Basmati and
non-basmati rice is 628 USD. Therefore, the econometric model predicts that a price
differential below (above) the threshold of 628USD correspond to low (high) risk of fraud.
The accuracy of this prediction of the model can be tested by comparing past prices and
the binary variable that takes the value of one if past fraud was detected in the same
period and zero otherwise. Table 6.7 presents a cross-tabulation. When the model
predicted low risk of fraud, this prediction was correct in 9 out of 13 cases. Similarly, when
the model predicted high risk of fraud, the prediction was correct in 5 out of 8 cases.
Assessment of the accuracy of prediction has been done using in-sample data (over the
sample used for obtaining the estimates). Out-of-sample assessment was not possible due
to the small sample size available for the analysis.
Case Study: Basmati Rice
52
Using the accuracy measure defined in the methodology section, we find that the
prediction based on the observed price difference attains an accuracy level of 66.6 per
cent (9+5=14 out of 21 observations). In other words, the model has a non-trivial predictive
power (over 50 per cent), although this power is far from perfect (100 per cent). This level
of accuracy suggests that the test proposed in this report may contain useful information
that would indicate a higher risk of fraud. However, we would like to stress the indicative
nature of this accuracy level. Given its limitations, the fact that the test indicates high risk
of fraud should not be interpreted as conclusive evidence that fraud would occur.
Furthermore, the 66.6% level of accuracy is better than the level of accuracy obtained by
using a trivial predictor based only on the price ratio between the original Basmati rice and
the world non-Basmati rice. The maximum level of accuracy this predictor would yield is
62% (this is reached when the threshold for the price ratio between the two prices of rice is
set at 58% so that any price ratio above 58% would be considered suspicious and require
an investigation by the authorities).
We note that the model is more accurate when predicting low risk (9/13 = 69.3 per cent
accuracy) than when predicting high risk (5 / 8 = 62.5 per cent accuracy). In other words,
the model is more likely to predict false positives than false negatives.
Table 6.7: Risk Classification
Fraud observed
Predicted risk No Yes Total
Low 9 4 13
High 3 5 8
Total 12 9 21
Note: The threshold used to define low and high risk was 628 USD.
We note that choice of the threshold probability is arbitrary. This approach may be further
improved by choosing several threshold probabilities and selecting the “optimal” one (i.e.
the one that leads to the highest accuracy level).
Limitations of the case study analysis
The methodology section identified a number of potential limitations on the proposed
approach. Below we assess the extent of each of these limitations for the results obtained
in the case study:
Sample size. The preceding analysis was based on 21 observations, corresponding
to the 21 months in which authenticity tests were conducted on Basmati rice
according to the UKFSS database. After assessing alternative data sources of past
fraud, we are confident that this is the largest dataset available for the UK.
Moreover, in time this database would become significantly richer and would allow
for more reliable results. However, for the time being, the available dataset has a
Case Study: Basmati Rice
53
small number of observations. We consider that there is a reasonable expectation
that many of the inconclusive results obtained (in the form of coefficients that are
not statistically significant) are a consequence of the small sample size. In addition,
the small number of observations make it difficult to disentangle the effects if
several variables are included simultaneously. We expect that a larger sample size
would allow for the inclusion of many explanatory variables and estimate
statistically significant effects for each of these variables simultaneously.
Data quality. The case study combines data from different sources, with different
degrees of reliability. Data used from trade associations, such as the All India Rice
Exporters’ Association (AIREA),54 contains information not found in other sources.
However, the statistical rigour used to collect and aggregate the data might not be
comparable to the one found in national statistics offices. Additional quality
concerns includes the data gap found in the price series for Basmati rice in India.
While the gap was interpolated using India’s general price evolution, this solution is
less than satisfactory and might fail to capture key movements that are specific to
the Basmati rice market.
High variation in the estimated coefficients. The different estimations conducted
above lead to a wide range of estimates for the effect of prices on detected fraud.
We consider this limitation to be a side effect of a small sample size. In other
words, a larger sample is expected to reduce this limitation
Omitted relevant variables. This potential limitation is pervasive in econometric
analysis, particularly when it is not possible to include in the analysis suspected
factors due to lack of data. In light of the literature of food fraud, we consider that
the largest risk of the present analysis is to omit variables that capture key features
of the supply chain. As mentioned in the literature, the length and complexity of the
supply chain are expected to be key explanatory factors in the likelihood of food
fraud. Unfortunately, the data assessment exercise performed during this project
has not identified a viable approach to capturing these characteristics
Lagged effects. Appendix VI has addressed this issue by including explanatory
variables with lags. However, as noted above, this approach might fail to capture
more complex interactions between fraud and past prices and volumes.
54
http://www.airea.net/
Conclusions and Recommendations
54
7. Conclusions and Recommendations
We consider that the main contributions made by this report are as follows:
A review of the literature on food fraud that led to the identification of a large set of
variables that are considered relevant to predict the risk of food fraud. These
variables were classified in economic factors and market characteristics, production
and distribution factors, product characteristics and detection technology and
institutional and enforcement factors.
A review of several approaches that have been used in the economic literature to
explain or predict the risk of fraud. While there are few examples of previous work
that has applied economic intelligence to study food fraud, relevant methodologies
were found in other areas such as insurance and credit card fraud.
We have compiled and reviewed multiple sources of data that can be used to
conduct estimations of the risk of food fraud for various products. Of particular
importance is the review of data sources that document past instances of food fraud
in the UK. We have concluded that the UKFSS provides the best available source
of this data.
In addition to past fraud, we have explored sources of economic data, such as
prices and volumes of authentic ingredients and adulterants. We have focused on
publicly available data produced by recognised institutions. However, additional
data is provided by private institutions on a subscription basis. We recognise that
further exploration of these sources may be required to arrive to a comprehensive
assessment of economic data currently available.
The literature review of the economics of fraud identified alternative methodological
approaches that could be applied to food fraud. We provide criteria to assess the
advantages and disadvantages of these approaches. It was concluded that an
econometric methodology would be the most suitable approach based on economic
intelligence for the case of food fraud.
The proposed methodology identifies alternative econometric models and variables
that could be estimated to predict the risk of fraud. Additionally, it provides criteria to
select between these models based on the interpretability of the results, the
statistical significance of the coefficients and goodness of fit indicators.
The proposed econometric models would provide a quantification of the link
between movements in economic variables and past fraud. The methodology also
suggests how to use these estimations in a forward looking manner. By tracking the
(observable) variables, it would be possible to determine the risk level predicted by
the model. The proposed approach suggests a method to construct thresholds for
the explanatory variables that would map into a small and discrete set of categories
of fraud risk (e.g. two categories: low and high risk).
Conclusions and Recommendations
55
a. Case study
The proposed methodology was tested using adulteration of Basmati rice using cheaper
varieties of rice as a case study. The selection of this particular type of fraud for the case
study was based on well documented instances of past fraud and availability of economic
data. It was found that the model that performed best is a logit regression in which the
price gap between the price of Basmati rice and other varieties of rice was the only
statistically significant predictor of fraud. While other variables were included, no evidence
was found that there is a link between them and the risk of fraud. We note, however, that
this is not a definitive finding and results might change considerably if a larger sample
were available. Based on the regression results, the effect of the price gap between the
authentic ingredient and the adulterant on fraud was used to predict future fraud. It was
determined that the risk of Basmati rice fraud would be high when the price gap exceeds
628USD. This result was tested using past data and it was determined that it predicts
fraud correctly with 66.6 per cent accuracy.55 This level of accuracy suggests that the test
proposed in this report may contain useful information that would indicate a higher risk of
fraud. However, we would like to stress the indicative nature of this accuracy level. Given
its limitations, the fact that the test indicates high risk of fraud should not be interpreted as
conclusive evidence that fraud would occur.
b. Limitations
The case study also served to illustrate the considerable limitations that could be faced
when applying the proposed methodology to a particular product or fraud type. The most
important limitation is the small sample size. Despite that fact that Basmati rice was
chosen as a case study due to having abundant data (relative to other products), the
results obtained suggest that the sample size was barely sufficient to establish the most
basic relationships between the variables considered. Attempts to apply this methodology
to other products may encounter the same or even greater data limitations.
The negative consequences of having a small sample size are multiple. First, for some
variables that are in fact important may not be found to be statistically significant, simply
because the sample size is too small. Second, a small sample size is likely to cause high
variation in the estimated coefficients across different models, undermining the
robustness of the results.
The main constraint on the sample size is the number of periods in which past instances
of fraud were tested (and possibly detected). These observations are obtained from the
UKFSS. While we found this database to be the most complete out of the ones identified,
it is important to highlight that it is relatively new, with very few observations for dates
earlier than 2010.
55
As explained above, this is measured on a scale from 50 to 100 per cent.
Conclusions and Recommendations
56
Other limitations of the analysis include the use of low quality or missing data and the
difficulty (or impossibility) to measure variables that the literature has identified as
relevant, such as key features of the supply chain.
c. Recommendations
We consider that the proposed methodology is appropriate and solidly founded in the
literature. However, due to limitations in the data currently available, the results obtained
when applying the methodology might not be entirely satisfactory. We recommend that this
approach is applied when the availability of data on past fraud is more abundant.
It is not possible to determine a priori which sample size would be sufficient to obtain
statistically reliable estimates. However, given the UKFSS database is constantly
expanding at an encouraging rate, the quality of the output of the approach proposed in
this report would increase significantly in the medium term.
We note that the choice of econometric methods was not guided by the time and resource
constraints of this project. While more sophisticated methods could be employed, these
were not prioritised at present due to their higher requirements in terms of sample size.
There is scope for future work in this area. It would be desirable to identify additional
sources of economic data, especially reviewing those provided by third parties. In addition,
more progress could be made in measuring some of the variables identified by the
literature but omitted from the analysis due to insufficient data.
Finally, we consider that the implementation of the methodology to multiple products
simultaneously using panel techniques might be worth exploring in more detail. The
advantage of this approach would be an increase in the number of observations. However,
we note that there would be additional complications associated with this approach, since
the effect of economic variables on fraud, such as prices, might differ significantly across
products. Therefore, there would be an increased risk of non-significant estimates and
biased results. We consider that a multi-product approach would not be excessively
challenging from a conceptual point of view. However, data quantity (and quality) will be
the deciding factors in whether this approach is successful. Therefore, the choice of
products for this analysis should be based primarily on this criterion.
Annex I: Detailed Review of Selected Literature
57
8. Annex I: Detailed Review of Selected Literature
a. Food fraud – economics
Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish
and seafood imports. Cahier de recherche/Working paper, 2012, 15.
Objective: Show the role of economics in the adulteration of food imports.
Data: Simulations and empirical data of import refusals in US from the FDA 2012
database.
Variables: The country of origin, the port of entry, product code, product description and
lists of the charges that motivated rejection. The dataset, however, does not include
information regarding the quantity and the value of products refused entry.
Method: The mechanism of impact in the model is the choice of input quality by exporting
firms. One implication of the model is that economic variables can be used to predict
adulteration in food imports. The author performed structural break tests on the weekly
number of import refusals using the structural change package in R which implements the
algorithm to find breakpoints. Here, the procedure tests for structural breaks in the average
weekly import refusals. Regression outcomes show means for the weekly number of
import refusals for fish and seafood in the periods delimited by the structural break tests.
Before the first break in November 2005, FDA refused entry to about 41 shipments per
week. The number weekly refusals of seafood shipments then decline to almost 33 per
weeks until December 2010. In addition, the author provides graphical evidence that the
Deepwater Horizon incident had no impact on import quantities of seafood.
Advantages: The model offers a framework that can be used to identify adulteration risk
for products other than food such as drugs or medical devices. With a few modifications,
the model can also help guide inspection of domestic facilities.
Disadvantages: Does not account for other covariates that could have impacted the
number of import refusals. In particular, the increase in import refusals could be due to
increase in oversight by FDA of all food imports.
Tested Method: No.
Factors that affect the likelihood of fraud: The closing of fisheries in the Gulf of Mexico
because of the Deepwater Horizon platform oil spill.
Liang, J., & Jensen, H. H. (2007). Imperfect food certification, opportunistic behaviours and
detection. Selected Paper, 175174.
Annex I: Detailed Review of Selected Literature
58
Objective: Present a theoretical framework to analyse the performance of the “Good
Agricultural Practises” program with respect to output and quality based on the assumption
of predetermined productive capacity (farm size), heterogeneous farms and exogenous
detection.
Data: Theoretical model.
Variables: Available resources to farms, farm size, effort by the monitoring agency, price
differences, reputation and advertising, cost of production.
Summary: The authors build a model which illustrates how, and under what conditions,
monitoring and enforcement activities might mitigate the fraudulent activities of food
growers under a voluntary GAPs program. The analysis brings out the following results:
first, the farms respond to the monitoring and enforcement not only on reducing fraudulent
output, but also on increasing truly high-safety output until the perfect compliance level is
achieved. Second, optimal monitoring policy depends on the exogenous parameters of the
farms. If the monitoring budget is not enough to cover the necessary inspection cost of
achieving perfect high-safety output level, it will allocate resources to farms with larger size
and lower costs; If the budget is enough to obtain perfect level of high-safety output but is
not enough to preclude fraudulent output, the monitoring agency will expend equal effort
on all the farms. Third, fraudulent behaviours can be eliminated using the combination
policies of penalty, sale ban and monitoring activities while cannot be excluded completely
under an endogenous detection rate.
Advantages: N/A.
Disadvantages: A more complicated analysis could be developed when the parameters
are dependent. Second, the monitoring budget is assumed to be exogenous in their model
and the authors do not address the question of how the budget is decided.
Tested method: No.
Factors that affect the likelihood of fraud: Monitoring and enforcement.
Pouliot, S. (2012) Using economic variables to identify adulteration in food imports:
application to US seafood imports. Working paper.
Objective: Show that inspection policies may integrate economic data to better target risk.
Data: Data on US seafood imports from China and Buzby, from Unnevehr and Roberts
(2008) describe US food import refusal data. The authors find that between 1998 and 2004, 65
percent of refusals were dues to adulteration, 33 percent for misbranding and 2 percent for
other violations. The most common type of adulteration was filth.
Variables: Losses in revenue if fail inspection, quality of inputs, inspection rate,
substitutability of inputs.
Annex I: Detailed Review of Selected Literature
59
Method: The model considers exporting firms that can buy inputs of two qualities: low and
high. The low quality input does not meet quality standards in the importing country such
that its use adulterates the output of the exporting firm. The decision by an exporting to
adulterate its output depends on the relative price of inputs and the ability of the importing
country to detect adulteration.
Advantages: Model applies to food, but applications of the model to drugs or other
products are possible with slight modifications. The model can apply to many other issues
such as the domestic inspection policies of food plants, the inspection of medicines and
the detection of counterfeits.
Disadvantages: Empirical investigation of food import quality is still limited by the availability
of inspection data. Not clear how to find suitable instruments for import inspection effort over
time.
Tested Method: No, although the FDA recognises the adulteration happens for economic
reasons it fails to take them into account when developing a model for assessing
economic fraud.
Factors that affect the likelihood of fraud: By monitoring prices, an inspection agency
can identify threats even before they materialize in imports because of lags between
production and change in prices. In the long run, an inspection agency can learn about
adulteration by observing high rates of rejection or obtaining information from other
inspection agencies.
Hoffmann, S. (2010). Food safety policy and economics. Resources for the Future.
Objective: Overview of developments in food safety policy in major industrial countries
and of economic analysis of this policy.
Data: N/A.
Variables: Willingness to pay for health, cost of illness, lost productivity, direct compliance
cost, studies, to derive valuation estimates for use in policy analysis.
Summary: It describes the elements of a risk-based, farm-to-fork food safety system as it
is emerging in OECD countries guided by discussions through Codex Alimentarius and
traces its roots in the development of risk management policy in the United States.
Empirical research estimating the benefits of food safety policy has used multiple methods
including hedonic estimates of demand for safety from market data, stated preference
surveys, and experimental auctions. Many areas of applied economics is increasingly
looking to meta-analysis - a method of using statistical analysis to look at systematic
patterns across related studies, to derive valuation estimates for use in policy analysis.
Advantages: N/A.
Annex I: Detailed Review of Selected Literature
60
Disadvantages: This system relies on risk management practices developed in the public
sector to guide environmental and health and safety policy and in the private sector to
reduce risk of failure in process engineering.
Tested Method: No, conventional approaches to food safety policy that have been in
place since the turn of the last century are not adequate to meet these new food safety
challenges.
Factors that affect the likelihood of fraud: A globalised supply chain.
Buzby, J. C., Unnevehr, L. J., & Roberts, D. (2008). Food safety and imports: An analysis
of FDA food-related import refusal reports (No. 58626). United States Department of
Agriculture, Economic Research Service.
Objective: Examines U.S. Food and Drug Administration (FDA) data on refusals of food
offered for importation into the United States from 1998 to 2004.
Data: U.S. Food and Drug Administration (FDA) refusals of food import shipments for
1998-2004 by food industry group and by type of violation.
Variables: IRR data which include those shipments ultimately refused entry into U.S.
commerce. For each refusal, FDA reports the violation or charge codes, which document
the reasons for refusal.
Method: Researchers analysed FDA Import Refusal Reports (IRR) for food shipments
refused entry into U.S. commerce between 1998 and 2004. Tabulations were created of
refusals by industry group and FDA violation code (e.g., type of violation). Adulteration
violations were examined closely, particularly those linked to pathogen contamination.
Advantages: risk based.
Disadvantages: The scope of the report does not include the imported meat, poultry, and
processed egg products regulated by FSIS.
Tested Method: Yes, the sampling strategies by the FDA and other agencies are
designed to focus enforcement and inspection efforts on areas that have the highest
probability of risk. Import refusals highlight food safety problems that appear to recur in
trade (i.e., the FDA thought they would be a problem and they are) and where the FDA
has focused its import alerts and monitoring efforts.
Factors that affect the likelihood of fraud: Types of violations, country of origin, and
product characteristics.
Starbird, S. A. (2005). Moral hazard, inspection policy, and food safety. American Journal
of Agricultural Economics, 87(1), 15-27.
Objective: examine the sampling inspection policies in the 1996 Pathogen
Reduction/Hazard Analysis Critical Control Point Act.
Annex I: Detailed Review of Selected Literature
61
Data: Theoretical discussion.
Variables: quality of goods, wage or price, inspection rate.
Summary: To gather information about safety, buyers often employ sampling inspection.
Sampling inspection exhibits sampling error so some unsafe product passes inspection
and some safe product does not. This uncertainty influences buyer and supplier behaviour.
In this article, the author uses a principal–agent model to examine how sampling
inspection policies influence food safety. They found that the sampling inspection policy,
the internal failure cost, and the external failure cost have a significant impact on the price
that the buyer is willing to offer for safer food and, therefore, on the supplier’s willingness
to exert the effort required to deliver safe food. The internal failure cost has a significant
impact on the behaviour of the supplier and the external failure cost has a significant
impact on the behaviour of the buyer. The author found the minimum external failure cost
that will motivate a risk-neutral buyer to demand high effort and showed that it depends on
the rate of lot acceptance, the contribution margin, and the safety level under high effort
and under low effort.
Advantages: Clarifies the relationship between inspection rate and risk of fraud.
Disadvantages: Analysis has focused on the behaviour of a single buyer and a single
seller in the short run. It is likely that repeated failures could affect the seller’s reputation
and market share in the long run.
Tested Method: No.
Factors that affect the likelihood of fraud: Size of penalties.
b. Food fraud – biological science
Moore, J. C., Spink, J., & Lipp, M. (2012). Development and application of a database of
food ingredient fraud and economically motivated adulteration from 1980 to 2010. Journal
of food science, 77(4), R118-R126.
Objective: To collect information from publicly available articles in scholarly journals and
general media, organize them into a database, and review and analyse the data to identify
trends.
Data: Reports of food fraud (food ingredients specifically).
Variables: N/A.
Summary: Literature search information was analysed and coded into a database by the
authors and other supporting researchers. Considerations were given to the most
appropriate and useful characteristics that could be extracted into a concise format for
tabular and database presentations that allow further data analysis and insights for
understanding and predicting food fraud and identifying analytical detection methods. The
Annex I: Detailed Review of Selected Literature
62
authors analysed the database by sorting all records into 2 datasets by report type, and
then they determined top ingredients and ingredient categories in each dataset. The
scholarly records dataset included a total of 1054 records based on 584 literature
references, and the media and other reports dataset included 251 records based on 93
articles. The authors analysed the scholarly reports dataset to determine the 25 food
ingredients with the greatest number of records or hits.
Advantages: The database provides information that can be useful for risk assessors
evaluating current and emerging risk for food fraud. The authors claim that it provides a
baseline understanding of the susceptibility or vulnerability of individual ingredients to
fraud.
Disadvantages: Many articles collected in the database do not have enough information
to facilitate classification into specific incidents.
Tested Method: Yes, the website http://www.usp.org/food-ingredients/food-fraud-
database uses their database.
Factors that affect the likelihood of fraud: government surveillance reports and
information from criminal prosecution cases for some types of food fraud.
Elliott review into the integrity and assurance of food supply networks: final report - a
national food crime prevention framework.
Summary: The Elliott Review aims at shedding light to the problem of food fraud so as to
make it much more difficult for criminals to operate in food supply networks, thus providing
the UK consumer with safer and more authentic food.
The author recommends a systems approach which is intended to provide a framework to
allow the development of a national food crime prevention strategy. Making it much more
difficult for criminals to operate in food networks by introducing new measures to check,
test and investigate any suspicious activity. The author suggests that those caught
engaging in food fraud activity must be severely punished by the law to send a clear
message to those thinking of conducting similar criminal activity.
To complete this report a data collection process took place together with well-structured
surveys with people related to the food industry. The report finds that the global nature of
the current food markets enables UK consumers access to all types of products even
when they are out of season. This means that the supply chain for food has become much
more complex as a number of these products has to be imported from abroad. Consumers
have become used to variety, taste, and access at low cost. All of these factors have
increased opportunities for mislabelling, substitution and for food crime. Based on a
number of consultations with the industry the report makes a number of recommendations.
Such recommendations are shown in the table below:
Annex I: Detailed Review of Selected Literature
63
Everstine, K., Spink, J., & Kennedy, S. (2013). Economically motivated adulteration (EMA)
of food: common characteristics of EMA incidents. Journal of Food Protection, 76(4), 723-
735.
Summary: The paper reveals gaps in quality assurance testing methodologies that could
be exploited for intentional harm. EMA incidents present a particular challenge to the food
industry and regulators because they are deliberate acts that are intended to evade
detection. Large-scale EMA incidents have been described in the scientific literature, but
smaller incidents have been documented only in media sources. The authors have
reviewed journal articles and media reports of EMA since 1980. They identified 137 unique
incidents in 11 food categories: fish and seafood (24 incidents), dairy products (15), fruit
juices (12), oils and fats (12), grain products (11), honey and other natural sweeteners
(10), spices and extracts (8), wine and other alcoholic beverages (7), infant formula (5),
plant-based proteins (5), and other food products (28). They also identified common
characteristics among the incidents that may help better evaluate and reduce the risk of
EMA. These characteristics reflect the ways in which existing regulatory systems or testing
methodologies were inadequate for detecting EMA and how novel detection methods and
other deterrence strategies can be deployed.
Johnson R. (2014), Food Fraud and “Economically Motivated Adulteration” of Food and
Food Ingredients, Congressional Research Service.
Objective: This report provides an overview of the issues pertaining to food fraud and
“economically motivated adulteration” or EMA, a category within food fraud.
Data: The data comes from two databases: (1) the United States Pharmacopeial
Convention (USP) Food Fraud Database and (2) the National Centre for Food Protection
and Defence (NCFPD) EMA Incident Database.
Variables: Profit margins, field protection and control during harvesting.
Summary: First, the report provides a general background information on food fraud and
EMA, including how it is defined and the types of fraud, as well as how food fraud fits into
the broader policy realm of food safety, food defence, and food quality. Second, the report
provides available information about foods and ingredients. Individual records therefore
have been further grouped by adulterant (e.g. melamine) and time period when the
incident is estimated to have occurred.
Advantages: It is able to highlight emerging concerns about food fraud involving “clouding
agents”.
Disadvantages: It may not be possible for FDA and DOJ to prosecute every instance of
food fraud given each agency’s myriad other responsibilities and limited personnel and
resources. Also, oftentimes inadequate evidence exists to effectively enforce against all
alleged or suspected cases of fraud.
Tested method: yes.
Annex I: Detailed Review of Selected Literature
64
Factors that affect the likelihood of fraud: N/A.
Fairchild, G. F., Nichols, J. P., & Capps, O. (2003). Observations on economic adulteration
of high-value food products: The honey case. Journal of Food Distribution
Research, 34(2), 38-45.
Objective: To highlights the issue of economic adulteration of high-value food products
and provide a context for discussion and analysis based on experiences with the U.S.
honey industry.
Data: Mail survey of fourteen U.S. honey packers was conducted at the request of the
National Honey Board in 1999.
Variables: Percentage of economically adulterated product purchased from various
sources, the answer to the question of whether or not they believe economic adulteration
is affecting their operation or creating unfair competition.
Method: One approach would be to begin with an estimation of the retail demand for a
given product, then develop estimates for own-price elasticity of demand at the retail and
producer levels of the market channel, and finally develop estimates for the upper bounds
of own-price flexibility at the producer and retail levels. Assume that high-value-product
prices are relatively sensitive to quantity changes.
Advantages: N/A.
Disadvantages: The survey was not a statistically representative (random) sample and
thus the information generated only represents the experience and opinions of the
responding firms.
Tested method: Honey case study.
Factors that affect the likelihood of fraud: One motivation behind economic adulteration
is the opportunity to reduce costs and increase profits per unit sold at prices comparable to
pure products, or to reduce input costs and lower selling price to increase sales volume
and/or market share. Cost differences can be significant enough that firms selling
adulterated product can cause economic injury to competing firms, sometimes selling
below product cost for pure products and sometimes driving producers and packers out of
business.
Spink, J., & Moyer, D. C. (2011). Defining the public health threat of food fraud. Journal of
food science, 76(9), R157-R163.
Objective: To provide a base reference document for defining food fraud - it focuses
specifically on the public health threat - and to facilitate a shift in focus from intervention to
prevention. This will subsequently provide a framework for future quantitative or innovative
research. The fraud opportunity is deconstructed using the criminology and behavioural
Annex I: Detailed Review of Selected Literature
65
science applications of the crime triangle and the chemistry of the crime. The research
provides a food risk matrix and identifies food fraud incident types.
Data: CDC annual reports and FoodNet surveys.
Variables: Level of tariffs and anti-dumping duties, profit markings, cost of inputs.
Summary: Through a literature review and peer consultation, this report was created as a
“backgrounder” on the topic. The intent of this research paper is to provide a base
reference document for defining food fraud—it focuses specifically on the public health
threat—and to facilitate a shift in focus from intervention to prevention.
Advantages: The major outcome of this study was to clarify that while the motivation may
be economic, public health remains vulnerable.
Disadvantages: None stated.
Tested method: Not yet but the authors suggest it should be used to define public policy.
Factors that affect the likelihood of fraud: Focusing on the criminal component of the
crime triangle56 provides insights to the motivations for seeking food fraud opportunities.
Brand growth and increased brand recognition of a product actually increases the fraud
opportunity (that is, more victims, spending and brand equity). Finally the guardian or
hurdle gaps lead to a greater fraud opportunity. Guardians include entities that monitor or
protect the product and could include customs, federal or local law enforcement, trade
associations, nongovernmental organizations, or individual companies themselves.
Hurdles include components or systems that exist (or are put in place) to reduce the fraud
opportunity by assisting in detection or providing a deterrence. Fraud opportunities could
be reduced by increasing the risk of detection, or increasing the costs of the necessary
technology to commit the fraud and/or of developing quality levels that would attract
consumers. Countermeasures are intended to reduce the fraud opportunity, but a
refinement to a process or a narrowing of focus in detection could inadvertently create new
gaps that could be exploited by fraudsters. An example of this uncertain nature is that
fraudsters may shift ports of entry by conducting strategic “port shopping” and by shipping
fraudulent product through less monitored entry points.
Woolfe, M., & Primrose, S. (2004). Food forensics: using DNA technology to combat
misdescription and fraud. TRENDS in Biotechnology, 22(5), 222-226.
56
There are three elements of crime opportunity or the more general term of fraud opportunity, as illustrated by the crime triangle: victim, fraudster or referenced in criminology research as the “criminal,” and guardian including hurdle gaps. It is important to emphasize that there may be very capable guardians and hurdles in place, but the nature of an evolving, emerging threat is that new gaps always occur. The term fraudsters is used since in many incidents, the food fraud is not criminal or even a civil law violations, and may not be considered unethical in many cultures (this last point is a behavioural sciences and social anthropology phenomenon). To adapt the concept, note that as the legs increase in length, the area of the triangle increases, which represents an increase in the crime opportunity. Manipulating any leg of the triangle affects the area of the triangle and the crime opportunity.
Annex I: Detailed Review of Selected Literature
66
Objective: Proving that fraud has occurred through the detection and quantification of food
constituents by DNA.
Data: Basmati rice, olive oil.
Variables: DNA sample.
Summary: The paper presents the many different chemical and biochemical techniques
that have been developed for determining the authenticity of food and, in recent years,
methods based on DNA analysis. These methods have gained increased prominence in
the past years. This is because some techniques, such as immunoassays, work well with
raw foods but lose their discrimination when applied to cooked or highly processed foods.
Also many techniques do not easily distinguish between closely related materials at the
chemical level. For example, olive and hazelnut oils are similar chemically so the usual
analytical methods cannot be applied to detect the adulteration of olive oils with hazelnut
oil. Conventional chemical methods are also not always able to detect country or region of
origin of olive oil. DNA analysis has discriminating power because ultimately the definition
of a variety or species is dependent on the sequence of the DNA in its genome. DNA is
more resilient to destruction by food processing (particularly cooking and sterilization) than
other marker substances. According to the authors the main problem with using DNA
technology in food forensics are (i) the recovery of quality DNA from the vast array of
complex food matrices and (ii) the impact of food processing on the size of DNA that can
be recovered.
Robust DNA-based methods now exist for detecting or confirming the identity of various
meat, poultry and fish species, for identifying potato varieties, for distinguishing true-line
and hybrid basmati rice varieties from other long grain rice and for detecting offal and
neuronal tissue in processed meat products. These methods are being extended to the
identification of premium tea varieties and the regional origin of cold-pressed olive oil.
Krissoff, B., Kuchler, F., Calvin, L., Nelson, K., & Price, G. (2004). Traceability in the US
food supply: economic theory and industry studies (pp. 3-10). US Department of
Agriculture, Economic Research Service.
Objective: show how exogenous increases in food traceability create incentives for farms
and marketing firms to supply safer food by increasing liability costs.
Data: Theoretical paper.
Variables: Theoretical paper.
Method: The authors model formally the linkage between traceability and food safety and
establish the implications of an increase in traceability-liability for food safety. In this
context, liability is defined as the responsibility to pay compensation for damages such as
caused by foodborne illnesses. The capacity to trace the origin of food increases the
possibility of legal remedy and compensation in the case of a food safety incident. The
authors show explicitly the mechanism through which traceability systems create
Annex I: Detailed Review of Selected Literature
67
incentives for firms to supply safer food. Traceability also allows parties to more easily
document that they are not responsible for harm. The authors show that food safety
declines with the number of farms and marketers and imperfect traceability from
consumers to marketers dampens liability incentives to supply safer food by farms.
Advantages: model implies several propositions that can be tested empirically.
Disadvantages: The authors do not discuss how such improved traceability would be
accomplished.
Tested Method: No.
Factors that affect the likelihood of fraud: Traceability.
Troop report (2013), Review of Food Standards Agency response to the incident of
contamination of beef products with horse and pork meat and DNA.
Objective: To review the response by the FSA to incidents of the adulteration of
comminuted beef products with horse and pig meat and DNA , and to make
recommendations to the FSA Board on the relevant capacity and capabilities of the FSA
and any actions that should be taken to maintain or build them.
Summary: 35 interviews were conducted with around 50 individuals, including a wide
range of FSA officials, officials in other Government Departments and Bodies, Ministers,
the Food Safety Authority of Ireland, industry representatives, Local Authority bodies and
consumer representatives. The evidence gathering for the review took place over a six
week period from 17 April to 31 May and included the review of documentation and
interviews with a wide range of individuals and organisations involved in the response to
the incident. The findings show that it was generally recognised that meat is a high value
product, which can be open to adulteration. Species substitution was known about and
action had been taken, but this focussed on for example cases of pork or chicken in beef
or lamb substituted by beef. The desire of companies to source cheap meat was
recognised but thinking was around expected meat such as chicken or pork, or cheaper
sources of beef.
National Audit Office (2013) Food safety and authenticity in the processed meat supply
chain.
Summary: The authors considered the horsemeat adulteration incident as a way to
examine the effectiveness of government’s monitoring and enforcement of legislation for
food safety and composition in England for processed meat products. The authors report
on the clarity of responsibilities, the effectiveness of food intelligence gathering and
analysis, food sample testing and the targeting of resources across the food supply chain.
The data used include the total number of food samples, budget allocated by local
authorities towards food inspection, number of public analysts. The authors do not
examine the nutritional labelling of food or the robustness of the checks on nutrition.
Annex I: Detailed Review of Selected Literature
68
c. Credit card fraud (empirical)
Manuela, P. and Paba A. (2010): "A discrete choice approach to model credit card fraud",
1.
Data: dataset of 320,000 observations from a portfolio of credit cards (Classic, Gold and
Revolving) issued in Italy. The paper employs data on every individual whose application
for a given card was accepted. Clients with a poor credit history are not accepted.
Variables: Gender, civil status, age, occupation and urbanisation affect the risk of fraud.
Method: A logit model. Fraud (dependent variable) and a set of explanatory variables (e.g.
gender, location, credit line, number of transactions in euros and in non-euros currency).
Advantages: not mentioned.
Disadvantages: no.
Tested method? Yes, already used.
Factors to affect the likelihood of fraud: Gender, location, circuit, ownership or “holder”,
outstanding balance, number of transactions in euros, number of transactions in non-
euros, credit line. Nationality (foreign customers 22.25 times more likely to perpetrate a
fraud than nationals).
d. Credit card fraud (theoretical)
Greene, W. (1998). Sample selection in credit-scoring models. Japan and the world
Economy, 10(3), 299-316.
Objective: how sample selection affects the measurement of some variables of interest to
credit card vendors.
Data: sample of observations generated by a major credit-card vendor in 1991. The
sample used was ‘choice based’. At the time the data were generated, the true acceptance
rate was closer to 60%. The credit-card vendor provided the choice based sample so as to
facilitate analysis of the very low default rate. Of 13 444 applications received, 10 499
were accepted. The purpose of the study is more theoretical than empirical and so the
data is only used to illustrate the theoretical points.
Variables: propensity to default (dependent variable), the number of derogatory reports in
an applicant's credit history, income, credit history, the ratio of credit-card burden to
current income, age, average expenditure, dependents, home owner, type of
employment,, months at current address, number of credit bureau enquiries, months
employed.
Annex I: Detailed Review of Selected Literature
69
Method: A binary choice model is used to examine the decision of whether or not to
extend credit. A selectivity aspect is introduced because such models are based on
samples of individuals to whom credit has already been given. A regression model with
sample selection is suggested for predicting expenditures, or the amount of credit. The
same considerations as in the binary choice case apply. Finally, a model for counts of
occurrences is described which could, in some settings also be treated as a model of
sample selection.
Advantages: Acceptance/rejection decisions are based on simple, easy to interpret and
justified criteria.
Disadvantages: If there are factors which enter the acceptance decision but do not appear
explicitly in the rule, and these same factors influence the response in the performance equation,
then the latter equation may produce biased predictions. Thus a predictor of default risk can be
systematically biased because it is constructed from a non-random sample of past applicants, that
is, those whose applications were accepted.
Tested method: Yes, the most common technique used for credit scoring is linear
discriminant analysis. The technique of discriminant analysis rests on the assumption that
there are two populations of individuals, which the authors denote `1' and `0,' each
characterized by a multivariate distribution of a set of attributes, x, including such factors
as age, income, family size, credit history, occupation, and so on.
Factors to affect the likelihood of fraud: the number of derogatory reports in an
applicant's credit history.
e. Credit card fraud and computer science
Chan, P. K., Fan, W., Prodromidis, A. L., & Stolfo, S. J. (1999). Distributed data mining in
credit card fraud detection. Intelligent Systems and their Applications, IEEE, 14(6), 67-74
Objective: Improve fraud warning systems using large scale data mining.
Data: Chase and First Union Bank members of the FSTC provided real credit card data for
the study. The two datasets contain credit card transactions labelled as fraudulent or
legitimate. Each bank supplied 0.5 million records spanning over a year with 20% fraud
and 80% non-fraud distribution for Chase bank and 15% versus 85% for First Union Bank.
Note that in practise fraudulent transactions are much less frequent that 15-20% in the
data.
Variables: N/A.
Method: Combine multiple learned fraud detectors under a “cost model”. They divide a
large dataset of labelled transactions (either fraudulent or legitimate) into smaller subsets,
apply mining techniques to generate classifiers in parallel and combine the resultant base
models by meta-learning from the classifier’s behaviour to generate a meta-classifier.
Annex I: Detailed Review of Selected Literature
70
Their approach treats the classifiers as black-boxes so that a variety of algorithms can be
employed.
Advantages: Efficient approach in generating large number of classifiers and
directapproach for sharing knowledge without sharing data.
Disadvantages: Not as efficient as the fine-grained parallelisation approaches.
Tested method: No.
Factors that affect likelihood of fraud: N/A.
Stolfo, S., Fan, W., Lee, W., Prodromidis, A., & Chan, P. (1997). Credit card fraud
detection using meta-learning: Issues and initial results. In AAAI-97 Workshop on Fraud
Detection and Risk Management.
Objective: Describe initial experiments using meta-learning techniques to learn models of
fraudulent credit card transactions.
Data: A large database, 500,000 records, of credit card transactions from one of the
members of the Financial Services Technology Consortium (FSTC, URL: www.fstc.org).
Each record has 30 fields and a total of 137 bytes. Under the terms of their nondisclosure
agreement, they cannot reveal the details of the database schema, nor the contents of the
data. The data were sampled from a 12-month period, but does not reflect the true fraud
rate.
Variables: N/A.
Method: Meta-learning is used to combine different (base) classifiers from different
learning algorithms and generate a (meta-) classifier that has better performance than any
of its constituents.
Advantages: None stated.
Disadvantages: Lack of effective metrics to guide the selection of base classifiers that will
produce the best meta-classifier.
Tested method: No.
Factors that affect likelihood of detecting fraud: good quality training data.
Brause, R., Langsdorf, T., & Hepp, M. (1999). Neural data mining for credit card fraud
detection. In Tools with Artificial Intelligence, 1999. Proceedings. 11th IEEE International
Conference on (pp. 103-106). IEEE.
Objective: Show how advanced data mining techniques and neural network algorithm can
be combined successfully to obtain a high fraud coverage combined with a low false alarm
rate.
Annex I: Detailed Review of Selected Literature
71
Data: For the analysis the authors used a sample set of 5,850 fraud transactions and
542,858 legal transactions, ordered by their time stamps.
Variables: N/A.
Summary of findings: In this contribution they develop concepts for the statistic-based
credit card fraud diagnosis. They showed that this task has to be based on the very special
diagnostic situation imposed by the very small proportion of fraud data of 1:1000.
Additionally, they showed that, by algorithmically generalizing the transaction data, one
may obtain higher levels of diagnostic rules. Combining this rule-based information and
adaptive classification methods yield very good results.
Advantages: Fraud decisions are about 80% valid.
Disadvantages: None stated.
Tested method: No.
Factors that affect likelihood of fraud: N/A.
Chan, P. K., & Stolfo, S. J. (1998). Toward Scalable Learning with Non-Uniform Class and
Cost Distributions: A Case Study in Credit Card Fraud Detection. In KDD (Vol. 1998, pp.
164-168).
Objective: Find a method that will reduce loss significantly due to illegitimate credit card
transactions.
Data: The Chase Manhattan Bank provided them with a data set that contains 500, 000
transactions between 1995 and1996, about 20% of which are fraudulent.
Method: Their approach is based on creating data subsets with the appropriate data class
distribution, applying learning algorithms to the subset independently and integrating to
optimise cost performance of the classifiers by learning from their classification behaviour.
Advantages: Their method utilises all available training examples and does not change
the underlying learning algorithm. It also handles non-uniform cost per error and is cost
sensitive during the learning process.
Disadvantages: The user needs to run preliminary experiments to determine the desired
distribution based on a defined cost model.
Tested Method: No.
Variables that affect the likelihood of fraud: N/A.
Srivastava, A., Kundu, A., Sural, S., & Majumdar, A. K. (2008). Credit card fraud detection
using hidden Markov model. Dependable and Secure Computing, IEEE Transactions
on, 5(1), 37-48.
Annex I: Detailed Review of Selected Literature
72
Objective: Model the sequence of operations in credit card transaction processing using a
Hidden Markov Model (HMM) and show how it can be used for the detection of frauds.
Data: A simulator is used to generate a mix of genuine and fraudulent transactions.
Variables: N/A.
Method: An HMM is initially trained with the normal behaviour of a cardholder. If an
incoming credit card transaction is not accepted by the trained HMM with sufficiently high
probability, it is considered to be fraudulent. At the same time, they try to ensure that
genuine transactions are not rejected.
Advantages: Comparative studies reveal that the accuracy of the system is close to 80
percent over a wide variation in the input data. The system is also scalable for handling
large volumes of transactions.
Tested Method: Yes, banks use detection systems similar to the one in the paper and
when the system confirms the transaction to be malicious, it raises an alarm, and the
issuing bank declines the transaction. The concerned cardholder may then be contacted
and alerted about the possibility that the card is compromised.
Factors that affect likelihood of fraud: Previous amounts spent on transactions.
Figure 8.1: An intuitive way of how the model works
Quah, J. T., & Sriganesh, M. (2008). Real-time credit card fraud detection using
computational intelligence. Expert Systems with Applications, 35(4), 1721-1732.
Objective: Real-time fraud detection and to present a new approach in understanding
spending patterns to decipher potential fraud cases.
Data: The data used is from the test database (an extraction from the actual banking
database) of a well-known bank.
Annex I: Detailed Review of Selected Literature
73
Variables: Number of transactions performed during the past hours and number of
transactions beyond $X, branch code of the transaction, account number, debit currency in
which transaction is done, debit or credit, terminal or PoS preference, transaction amount.
Method: A multi-layered approach consisting of: the initial authentication and screening
layers, the risk scoring and behaviour analysis layer (core layer), a layer of further review
and decision-making.
Advantages: Dynamic and can adapt to changing patterns in the e-marketplace and to
converge more and more information from all possible avenues for decision-making.
Disadvantages: It requires information not only on the customer profiles but also on
merchant profiles, their selling patterns, rules and policies in the market for accurate fraud
detection.
Tested Method: No.
Factors that affect likelihood of fraud: Deviation from previous pattern of behaviour.
Panigrahi, S., Kundu, A., Sural, S., & Majumdar, A. K. (2009). Credit card fraud detection:
A fusion approach using Dempster–Shafer theory and Bayesian learning. Information
Fusion, 10(4), 354-363.
Objective: Identify an effective way for detecting credit card fraud.
Data: Simulation data.
Variables: Spending pattern which is further categorised into: risk loving, risk neutral and
risk averse.
Method: Their method combines evidences from current as well as past behaviour. The
fraud detection system (FDS) consists of four components, namely, rule-based filter,
Dempster–Shafer adder, transaction history database and Bayesian learner. In the rule-
based component, they determine the suspicion level of each incoming transaction based
on the extent of its deviation from good pattern. Dempster–Shafer’s theory is used to
combine multiple such evidences and an initial belief is computed. The transaction is
classified as normal, abnormal or suspicious depending on this initial belief. Once a
transaction is found to be suspicious, belief is further strengthened or weakened according
to its similarity with fraudulent or genuine transaction history using Bayesian learning.
Advantages: It generates fewer false alarms relatively to other methods. The architecture
is flexible enough so that new rules can also be included at a later stage to further
augment the rule-based component. In addition, Bayesian learning takes place so that the
FDS adapts to the changing behaviour of genuine customers as well as fraudsters over
time.
Tested method: No.
Annex I: Detailed Review of Selected Literature
74
Factors that affect likelihood of fraud: Suspicion score.
Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for
credit card fraud: A comparative study. Decision Support Systems, 50 (3), 602-613.
Objective: To evaluate two advanced data mining approaches, support vector machines
and random forests, together with the logistic regression, as part of an attempt to better
detect (and thus control and prosecute) credit card fraud.
Data: The dataset was obtained from an international credit card operation. The study
uses Artificial Neural Networks (ANN) tuned by Genetic Algorithms (GAs) to detect fraud.
This dataset includes 13 months, from January 2006 to January 2007, of about 50 million
(49,858,600 transactions) credit card transactions on about one million (1,167,757 credit
cards) credit cards from a single country.
Variables: Retail purchase, cash advance, transfer.
Advantages: accessibility for practitioners, ease of use, and noted performance
advantages in the literature.
Disadvantages: instability and reliability issues.
Tested method: Yes, one of the three methods that are presented is used in practise -
logistic regression. “It is well-understood, easy to use, and remains one of the most
commonly used for data-mining in practice”.
Factors that affect likelihood of fraud: N/A.
Dheepa, V. and Dhanapal, R. (2009) "Analysis on credit card fraud detection methods."
International Journal of Recent Trends in Engineering, Vol 2.
Objective: The main task is to explore different views on credit card fraud and see what
can be learned from the application of each different technique.
Data: Theoretical approach.
Variables: N/A.
Summary: Three methods to detect fraud are presented. First, the clustering model is
used to classify the legal and fraudulent transaction using data clusterisation of regions of
parameter value. Secondly, Gaussian mixture model is used to model the probability
density of credit card user’s past behaviour so that the probability of current behaviour can
be calculated to detect any abnormalities from the past behaviour. Lastly, Bayesian
networks are used to describe the statistics of a particular user and the statistics of
different fraud scenarios.
Conclusion: To improve the fraud detection system, the combination of the three
presented methods could be beneficial.
Annex I: Detailed Review of Selected Literature
75
f. Automobile insurance and car accidents
Artıs, M., Ayuso, M., & Guillén, M. (1999). Modelling different types of automobile
insurance fraud behaviour in the Spanish market. Insurance: Mathematics and
Economics, 24(1), 67-81.
Objective: To model fraud behaviour in the automobile insurance industry. The model
should account for different types of fraud and explain individual behaviour. The authors
propose to distinguish the type of fraud chosen, because they assume that different kinds
of fraud may produce different benefits to the individual.
Data: The database has been obtained from a sample of claims of a Spanish company.
Data were collected from 1993 to 1996. Half of the claims are legitimate, the other half are
claims that had been identified as fraudulent. Each fraud has been classified as being for
personal benefit or for a third party benefit. The sample is not strictly random. The
estimation was obtained using maximum likelihood with the correction for choice-based
sampling in order to take into account the effect of the over-representation of fraud claims.
Therefore, weights were included in the estimation procedure.
Variables: type of claim, number of files associated to the claim, insurer did not accept
fault, police officer reported about the accident, presence of witnesses, accident took place
in a nonurban area, number of previous claims and number of years since vehicle
fabrication.
Method: Based on discrete-choice models for fraud behaviour. The authors estimate the
influence of the insured and claim characteristics on the probability of committing fraud.
They correct for choice-based sampling in the estimation due to the oversampling of fraud claims.
In the framework of discrete-choice models, their fraud model classifies each claim into
one of several different classes: legitimate, fraud for personal profit and fraud for a third
party benefit. They use two different approaches: firstly the authors consider a multinomial
logit model (MNL) and, secondly, a nested multinomial logit model (NMNL) is used.
Advantages: Their results are comparable to those proposed by other authors, but they
have accounted for a unified framework with several kinds of fraud.
Disadvantages: Some legitimated claims might be fraud claims that could not be detected
and thus they could improve the model by introducing measurement error in the
dependent variable.
Tested Method: No.
Factors that affect the likelihood of fraud: The number of files associated to a claim is
related to a higher probability of a legitimate claim. All variables are significant at a 5%
significance level except the location of the accident.
Annex I: Detailed Review of Selected Literature
76
g. Consumer goods
Grossman, G. M., & Shapiro, C. (1988). Counterfeit-product trade.
Objective: To characterize a counterfeiting equilibrium and explore its properties.
Data: Theoretical model.
Variables: Price and quality of genuine product and aggregate supply of adulterated
product.
Model: In the presence of counterfeiting, trademark owners compete subject to two
constraints. First, each price - quality offer must be credible, i.e., the manufacturer must
find it optimal to supply the promised quality rather than to run down his supply so that
each firm price its product above marginal cost and earns a flow of quasi - profits that
provide a competitive rate of return to the firm's reputation. Second, each firm must
account for (actual and potential) competition by counterfeiters. Brand name
manufacturers must avoid price/quality combinations that offer positive profits to
counterfeiters. Counterfeiters produce abroad and enjoy a cost advantage, but face the
possibility of confiscation at the border. Detection is more likely if the genuine product is of
higher quality. Counterfeiting also becomes more costly as the aggregate supply of
counterfeits rises, driving up foreign factor prices. In this model, counterfeiting provides an
additional avenue of export for the foreign country.
Factors that affect the likelihood of fraud: price and quality of genuine product and
aggregate supply of adulterated product.
h. Fraud in general
Becker, G. S. (1974). Crime and punishment: An economic approach. In Essays in the
Economics of Crime and Punishment (pp. 1-54). UMI.
Objective: Provide answers to the following questions: what determines the amount and
type of resources and punishments used to enforce a piece of legislation? In particular,
why does enforcement differ so greatly among different kinds of legislation?
Data: It is a theoretical paper so data is not very widely used. He uses an indicative
number on the costs of crime. Economic costs of crime are calculated by using the
President’s commission report 1967 which looks at the expected cost of being caught
when committing a crime. The Crime Commission estimates the direct costs of various
crimes.
Variables: value of crime and cost of crime.
Model: He breaks down the variable that determine the risk of fraud into five categories:
the relations between (1) the number of crimes, called "offenses" and the cost of offenses,
Annex I: Detailed Review of Selected Literature
77
(2) the number of offenses and the punishments meted out, (3) the number of offenses,
arrests, and convictions and the public expenditures on police and courts, (4) the number
of convictions and the costs of imprisonments or other kinds of punishments, and (5) the
number of offenses and the private expenditures on protection and apprehension.
Contribution: demonstrates that optimal policies to combat illegal behaviour are part of an
optimal allocation of resources. Since economics has been developed to handle resource
allocation, an "economic" framework becomes applicable to, and helps enrich, the analysis
of illegal behaviour. At the same time, certain unique aspects of the latter enrich economic
analysis: some punishments, such as imprisonments, are necessarily nonmonetary and
are a cost to society as well as to offenders; the degree of uncertainty is a decision
variable that enters both the revenue and cost functions.
Disadvantages: In reality people usually differ on the amount of damages or benefits
caused by different activities. To some, any wage rates set by competitive labour markets
are permissible, while to others, rates below a certain minimum are violations of basic
rights; to some, gambling, prostitution, and even abortion should be freely available to
anyone willing to pay the market price, while to others, gambling is sinful and abortion is
murder. These differences are basic to the development and implementation of public
policy but have been excluded from his inquiry. The author assumes consensus on
damages and benefits and simply tries to work out rules for an optimal implementation of
this consensus.
Tested Method: No.
Factors that affect the likelihood of fraud: Level of enforcement and the number of past
fraud instances that were detected.
Kou, Y., Lu, C. T., Sirwongwattana, S., & Huang, Y. P. (2004). Survey of fraud detection
techniques. In Networking, sensing and control, 2004 IEEE international conference
on (Vol. 2, pp. 749-754). IEEE.
Objective: presents a survey of current techniques used in credit card fraud detection,
telecommunication fraud detection, and computer intrusion detection. The goal of this
paper is to provide a review of different techniques to detect frauds.
Data: survey paper.
Method 1: Outlier Detection. An outlier is an observation that deviates so much from other
observations as to arouse suspicion that it was generated by a different mechanism.
Unsupervised learning approach is employed to this model. Usually, the result of
unsupervised learning is a new explanation or representation of the observation data,
which will then lead to improved future responses or decisions. Unsupervised methods do
not need the prior knowledge of fraudulent and non-fraudulent transactions in historical
database, but instead detect changes in behaviour or unusual transactions. These
methods model a baseline distribution that represents normal behaviour and then detect
observations that show greatest departure from this norm. Outliers are a basic form of
Annex I: Detailed Review of Selected Literature
78
non-standard observation that can be used for fraud detection. In supervised methods,
models are trained to discriminate between fraudulent and non-fraudulent behaviour so
that new observations can be assigned to classes. Supervised methods require accurate
identification of fraudulent.
Method 2: Neural Networks. A neural network is a set of interconnected nodes designed
to imitate the functioning of the human brain. Each node has a weighted connection to
several other nodes in adjacent layers. Individual nodes take the input received from
connected nodes and use the weights together with a simple function to compute output
values. Neural networks come in many shapes and forms and can be constructed for
supervised or unsupervised learning. The user specifies the number of hidden layers as
well as the number of nodes within a specific hidden layer. Cardwatch features neural
networks trained with the past data of a particular customer. It makes the network process
the current spending patterns to detect possible anomalies.
Method 3: Model-based Reasoning. Model-based detection is a misuse detection
technique that detects attacks through observable activities that infer an attack signature.
There is a database of attack scenarios containing a sequence of behaviours making up
the attack.
Method 4: Data Mining. Data mining approaches can be applied for intrusion detection. An
important advantage of data mining approach is that it can develop a new class of models
to detect new attacks before they have been seen by human experts. A classification
model with association rules algorithm and frequent episodes is developed for anomaly
intrusion detection. This approach can automatically generate concise and accurate
detection models from large amount of audit data. However, it requires a large amount of
audit data in order to compute the profile rule sets. Moreover, this learning process is an
integral and continuous part of an intrusion detection system because the rule sets used
by the detection module may not he static over a long period of time.
Tested Method: Yes, they are widely being used to detect credit card fraud and computer
intrusion detection.
OECD (2008), The Economic Impact of Counterfeiting and Piracy, OECD Publishing.
doi: 10.1787/9789264045521-en.
Objective: The report suggests ways to develop information and analysis, and calls on
governments to consider strengthening legal and regulatory frameworks.
Data: An analysis of international trade data (landed customs value basis57) was carried
out. They also conduct a survey to identify the extent of counterfeiting taking place and
use data on the value of seizures and infringements of IP.
57
Customs value is the value of merchandise assigned by customs officials; in most instances this is the same as the transaction value appearing on accompanying invoices. Landed customs value includes the
Annex I: Detailed Review of Selected Literature
79
Variables: Intensity and frequency of infringing activities, GDP per capita. Other
dependant variables include the share of products within given categories in total exports
from a given economy, dummy variables for preferential agreements, volume of inflowing
FDI, population size and openness rank. They construct a relative propensity index for
importing counterfeit goods from source economies.
Summary: The overall degree to which products are being counterfeited and pirated is
unknown and there do not appear to be any methodologies that could be employed to
develop an acceptable overall estimate. However, insights can be gained through an
examination of various types of information, including data on enforcement and information
developed through surveys. This information has significant limitations, however, and falls
far short of what is needed to develop a robust overall estimate. The General Trade-
Related Index of Counterfeiting for products (GTRIC-p) is constructed in three steps: (1)
first, the general seizure percentages are calculated for each reporting economy; (2) from
these, each product category’s counterfeiting factor is established; and (3) based on these
factors, the GTRIC-p is derived.
Advantages: N/A.
Disadvantages: GTRIC-p is formed on a 2-digit HS basis and establishes the relative
likelihood for products in one chapter to be counterfeit relative to another. Within any one
chapter, there could be considerable variation among products and the relative
counterfeiting propensities must therefore be seen as averages for the hundreds of goods
covered by each HS chapter.
Tested Method: No, OECD suggests that this method should be used by governments
internationally.
Factors that affect the likelihood of fraud: profitability and technology.
Peck, H. (2005). Drivers of supply chain vulnerability: an integrated framework.
International Journal of Physical Distribution & Logistics Management, 35(4), 210-232.
Objective: This paper aims to report on findings of a cross-sector empirical study of the
sources and drivers of supply chain vulnerability.
Data: Data collection involved semi-structured interviews with 47 managers, representing
five tiers of the network involved in the production of four distinct aircraft types.
Interviewees were selected using snowball sampling. The managers concerned performed
a range of supply chain management related roles. They were drawn from across the
aircraft programs (product lines/families) of the prime contractor (the assembler), its first-
and second-tier suppliers, industry associations – including one representing small and
medium sized enterprises – and customers in the UK Ministry of Defence.
insurance and freight charges incurred in transporting goods from the economy of origin to the economy of importation.
Annex I: Detailed Review of Selected Literature
80
Variables: N/A.
Framework: This paper develops a framework rather than analysing econometrically the
data. This paper has taken the findings of exploratory research into sources and drivers of
supply chain vulnerability and, drawing on systems theory, developed a multi-level
framework for analysis, providing the basis of a model (Figure 6.1) to explain the scope
and dynamic nature of supply chain risk.
Advantages: A starting point for developing more complete predictive simulations of the
likely effects of specific actions on dynamic supply chain networks.
Disadvantages: It would have been desirable to conduct in-depth multi-tier case studies in
each of the sectors used to validate the findings of the aerospace case study, immediately
after the initial study was undertaken. Unfortunately this was not possible due to time and
resource limitations.
Tested method: No.
Factors that affect the likelihood of fraud: Unanticipated side-effects or consequential
risks to supply chain processes, arising from specific managerial decisions, requirements
or industry trends. Demands for shorter lead-times, outsourcing and increasing use of
global sourcing and supply, as well as “off-set” (politically determined counter trade
agreements) were among the legitimate and well-intentioned measures identified by
interviewees as sources of risk to supply chain performance.
i. Insurance and tax fraud
Tennyson, S. (1997). Economic institutions and individual ethics: A study of consumer
attitudes toward insurance fraud. Journal of Economic Behavior & Organization, 32(2),
247-265.
Objective: To explore the determinants of consumers’ attitudes toward filing exaggerated
automobile insurance claims.
Data: The data for the study are obtained from a national survey of 1,987 adult individuals
taken in May 1991. The survey was developed by the Insurance Research Council, an
insurance industry research and information organization.
Variables: Age, white, male, years of school, executive, fraction of others who agree with
fraud statement: acceptable to not report income to IRS, good idea to reduce mandatory
insurance, good idea to give up right to sue, serious if any insurer bankrupt, confident of
financial stability of insurer, auto insurance premiums a major problem. The Herfindahl
Index of seller concentration, the percentage of cars insured in the residual market and the
average insurance premium per car (lagged two years).
Annex I: Detailed Review of Selected Literature
81
Method: Binary response variables are constructed which represent “agreement” versus
“disagreement” with each statement about insurance fraud. The variable is assigned a
value of 1 if the respondent “strongly agrees,” “agrees” or “probably agrees” with a
statement, and is assigned a value of 0 if the respondent “probably disagrees,” “disagrees”
or “strongly disagrees.” Respondents who replied “don’t know” to the question are
eliminated from the sample. Given the uneven distribution of responses to the fraud
questions documented the advantages of the ordered probit in this study are uncertain. In
preliminary analysis the authors explored both a 6- class and a 4-class multinomial probit
model. The estimation results were not markedly different from those of the binomial
probit, and the predictive accuracy was poor for most response cells. Hence, only the
results for the binary response variables were reported.
Advantages: Instead of viewing the prevalence of deviant attitudes as a function of
exogenously determined initial conditions, this view acknowledges that the attitudes of a
given population may also depend upon their institutional environment, and the perceived
legitimacy of the institutions in question.
Disadvantages: Using a binary response variable reduces the efficiency of estimation, by
failing to incorporate information regarding the strength of agreement or disagreement with
the statements; however, predictive accuracy of a more general ordered probit model will
be low for those cells in which there are few observations.
Tested method: No.
Factors that affect the likelihood of fraud: The Herfindahl Index of seller concentration,
the percentage of cars insured in the residual market and the average insurance premium
per car (lagged two years).
Andreoni, J., Erard, B., & Feinstein, J. (1998). Tax compliance. Journal of economic
literature, 818-860.
Objective: Characterising and explaining the observed patterns of tax non-compliance
and ultimately finding ways to reduce it.
Data: Theoretical discussion which suggests a number of sources to find appropriate data.
Such data include the household TCMP, state tax amnesty data and the IRS annual
report.
Variables: Tax rates, income, form of penalties, distribution of income.
Method: Tobit model of evasion using data from the Taxpayer’s Compliance Measurement
Programme (TCMP) and including as independent variables after tax, income, the
combined state and marginal tax rate and a variety of other socio-economic indicators.
Advantages: Considers the interaction between tax payers and tax authorities.
Annex I: Detailed Review of Selected Literature
82
Disadvantages: More psychological factors need to be included to analyse non-
compliance behaviour.
Tested model: No.
Factor that affect the likelihood of fraud: Social environment.
Derrig, R. A. (2002). Insurance fraud. Journal of Risk and Insurance, 69(3), 271-287.
Objective: To examine alternative cooperative arrangements that could reduce or
eliminate the potential inefficiency arising from the behaviour of insurance companies that
consider the possible cost savings of the total claim which can reduce the effectiveness of
investigations.
Data: Theoretical model.
Variables: Cost of insurance, subrogation expenses, cost of investigating claims.
Summary: The relationships between the cost of investigation and expected savings, as
well as the determination of the optimal levels of investigation under different
circumstances, are illustrated. The optimal level of investigation is determined when the
slopes of the cost of investigation line and the savings are equal.
Kornhauser, M. E. (2008). Normative and cognitive aspects of tax compliance: Literature
review and recommendations for the IRS regarding individual taxpayers. In 2007 Annual
Report to Congress (Vol. 138, pp. 138-180).
Objective: This report offers the IRS several concrete suggestions for improving individual
taxpayer compliance based on the tax morale literature.
Data: Experimental data.
Variables: Personal sense of integrity, degree of altruism, procedural justice, trust in
government, labels, rules of thumb, framing.
Summary: This Report surveys recent literature concerning the “tax morale” model of tax
compliance as it relates to individuals. It examines some of the cognitive processes
involved such as framing, but it concentrates on the moral, psychological, and social
factors influencing tax compliance.
Factors that affect the likelihood of fraud: rewards for not committing fraud.
Slemrod, J. (2007). Cheating ourselves: the economics of tax evasion. The journal of
economic perspectives, 21(1), 25-48.
Objective: reviews what is known about the magnitude, nature, and determinants of tax
evasion, with an emphasis on the U.S. income tax.
Data: U.S. Department of the Treasury, Internal Revenue Service (2006).
Annex I: Detailed Review of Selected Literature
83
Variables: Sources of income.
Summary: The paper reviews the current state of knowledge on the economics of tax
evasion with an emphasis on the U.S income tax. The main themes in the paper evolve
around the issues of tax evasion by sources of income, the type of people who have been
identified in the literature as more likely to evade (lower income people), the role of big
businesses in the tax system (how much corporation tax is evaded and why) and how the
standard deterrence model of tax evasion performs in practise.
Factors that affect the likelihood of fraud: Married filers and taxpayers younger than 65
have significantly higher average levels of noncompliance than others, and econometric
studies by Clotfelter58 (1983) and Feinstein59 (1991) that control for income and marginal
tax rates come to similar conclusions. Baldry60 (1987) found evidence, in an experimental
setting that men evade more than women.
58
Clotfelter, C. T. (1983). Tax evasion and tax rates: An analysis of individual returns. Review of Economics and Statistics, 65(3), 363-373.
59 Feinstein, J. S. (1991). An econometric analysis of income tax evasion and its detection. The RAND
Journal of Economics, 14-35. 60
Baldry, J. C. (1987). Income tax evasion and the tax schedule: Some experimental results. Public Finance= Finances publiques, 42(3), 357-83.
Annex II: Methodologies Used to Study Fraud
84
9. Annex II: Methodologies Used to Study Fraud
We have identified three main methodologies to assess the risk of fraud for different food
products:
Indices.
Econometrics.
Data mining.
For each of the methodologies we provide a description, examples of how they have been
applied to fraud detection and discuss some advantages and disadvantages for the
purposes of this project.
a. Construction of risk indices
The main objective of this report is to develop a methodology that can establish the level of
risk of fraud for various food products. An approach that has been used in the literature
would be to directly construct one or several indices that capture the risk of fraud.
The construction of an adequate risk index would rely crucially on the following steps:
The selection, a priori, of the relevant factors to be included in the index.
The determination of the relation and relative importance between these factors.
An example of a simple index would be as follows. Suppose that there are n factors
considered and that the relative importance is given by weights (denoted ) for each of
them. The simplest possible risk index could be estimated by the following formula:
We note that this formula is simply an example that consists of a weighted arithmetic
average. Other specifications, such as geometric or other non-linear functions would also
be possible (see examples below).
The main advantage of this approach is the very low data requirement. Even with a small
sample of observations it would be possible to construct one or more indices. There are
two main disadvantages. First, there are significant arbitrary decisions in this approach.
The arbitrariness would include the selection of factors but, more importantly, their
associated weights. Second, this approach would not have any associated statistical
methods to determine how robust the estimates are.
A relevant example of this method is given by OECD (2008). In this study the degree to
which products are being counterfeited and pirated is unknown and there do not appear to
Annex II: Methodologies Used to Study Fraud
85
be any methodologies that could be employed to develop an acceptable overall estimate.61
This report constructs the so-called General Trade-Related Index of Counterfeiting for
products (GTRIC-p) that measures the relative propensity to counterfeit different product
categories in international trade. It is based on two assumptions: i) the counterfeiting factor
of a given product category is positively related to the actual intensity of international trade
in counterfeit goods and ii) differences in counterfeiting factors may be due to the fact that
some products are easier to detect than others.
The GTRIC is constructed in three steps. First, the general seizure percentages are
calculated for each reporting economy. This is done by dividing the sum of the seizure
values or seizure incidents of product k over time by the total value of all seizures over
time.
From these, each product category’s counterfeiting factor is established. These
counterfeiting factors capture the sensitivity of product counterfeiting in a given category
relative to its share in international trade. Counterfeiting factors are defined as:
Based on these factors, the GTRIC-p is derived. It is estimated by taking a transformation
of the counterfeiting factor (CP) which would give relative weights to lower counterfeiting
factors. The GTRIC-p is finally obtained by re-scaling the counterfeiting factor. The
GTRIC-p is formed on a 2-digit HS basis and establishes the relative likelihood for
products in one chapter to be counterfeit relative to another.
b. Econometric models
Econometrics is a body of statistical methods used to analyse economic data. The main
tool used in econometrics is regression analysis. Regressions estimate the (often linear)
relation between a so-called explained variable and (potentially multiple) explanatory
variables. Various statistical methods are used to obtain the estimation that provides the
best fit for the data. The most popular of these methods is Ordinary Least Squares (OLS),
although there are several others that are used to address particular data structures and
data issues. In particular, when the explained variable is binary (or more generally,
discrete), methods such as logistic (or logit) and probit regressions are usually employed.
Logit and probit models estimate the probability that the explained variable would take a
particular value assuming a logistical or normal distribution, respectively.
61
OECD (2008), “The Economic Impact of Counterfeiting and Piracy”, OECD Publishing.
Annex II: Methodologies Used to Study Fraud
86
For this project, the explained variable would be a measure that captures the extent of
fraud. Ideally, this measure could be constructed from the number of fraud incidents
detected expressed as a percentage of the total investigations conducted by authorities.
Alternatively, this could be a binary variable that indicates whether any instance of fraud
was detected in a given point in time. The explanatory variables would include the factors
that are expected to affect the likelihood of fraud. Many of these factors have been
suggested in the literature. A comprehensive list of these factors was presented above
together with a discussion of each.
In broad terms, regression analysis will use a particular method (e.g. OLS, logit or probit)
to estimate a relation of the following form:
The chosen method will estimate the coefficients that best fit the data. Therefore, this
method “lets the data speak” when quantifying the relative importance of the various
factors affecting the risk of fraud. In contrast, a methodology based on indices would
choose the weights of these factors arbitrarily, potentially giving high importance to factors
that have little real explanatory power for assessing the likelihood of fraud.
In addition to the estimates of the coefficients, regression analysis provides the following:
Confidence intervals around coefficients. A smaller interval is interpreted as a
higher degree of confidence in the estimation. Moreover, it is possible to establish
whether a coefficient is statistically significantly different from zero (i.e. the factor
does not affect the risk of fraud).
Measures of goodness of fit, such as R-squared. These capture what fraction of the
risk fraud can be explained by the proposed factors.
A series of diagnostic tests that identify the appropriateness of the particular
statistical method chosen.
Econometric estimations typically have higher data requirements than the construction of
indices. If the number of observations is too low, the estimated coefficients will have large
confidence intervals, to the point that they might not be statistically distinguishable from
zero. In addition, it is necessary to have data on past incidents of fraud and the
explanatory factors for the same points in time.
The economics literature has used econometric modelling to address different types of
fraud, including food, insurance and tax among others.
Pouliot (2012) attempts to estimate food fraud based on data for import refusal in the
United States.62 His approach uses a particular estimation method aimed at detecting
structural breaks in the data. He concludes that economic variables can be used as
leading indicators of fraud. As an additional point, Pouliot (2012) acknowledges that, while
62
Pouliot, S. (2012). On the economics of adulteration in food imports: application to US fish and seafood imports. Cahier de recherche/Working paper, 2012, 15.
Annex II: Methodologies Used to Study Fraud
87
it is not accounted for in his model, the enforcement level should be introduced as a
control variable to avoid biases in the estimations.
In other sectors, Manuela and Paba (2010) used a logit model to estimate the effect of
various factors on the risk of credit card fraud.63 Artıs, Ayuso and Guillén (1999) develop a
discrete choice methodology based on a multinomial logit model to estimate the risk of
automobile insurance fraud.64 Their approach uses a multinomial (instead of binary)
approach since they consider different categories of claims: legitimate, fraudulent for
personal profit and fraudulent for third party benefit. Their dataset includes claims data for
three years, of which half were legitimate and the other half fraudulent. Another example of
econometric modelling is Andreoni, Erard, and Feinstein (1998), who survey a variety of
papers that employ econometric methods (such as tobit model) to estimate the
determinants of tax compliance.65 These analyses are typically based on the US Taxpayer
Compliance Measurement Program (TCMP) data.
c. Data mining
With the increasing availability of very large data sets in some fields (sometimes referred
as ‘big data’), several methods have been developed in computer science to identify
patterns in the data. These methods are sometimes labelled collectively as data mining.
The main advantage of these methods is that they are largely automated and require
relatively few assumptions on the part of the designer on the particular form of the
relationship between the variables. In fact, these methods attempt to discover these
relationships themselves from the data.
When compared to econometric methods, these methods have the advantage of allowing
for the identification of highly non-linear and/or clustered patterns. In other words, they are
less restrictive in terms of functional forms. The main disadvantages are two-fold. First, the
data requirements are substantially higher (typically in the order of at least tens of
thousands data points). Not surprisingly, if more information is to be extracted from the
data, more data is needed. Second, the scope for testing the statistical validity of the
established relationships is more limited.
Data mining includes a variety of methods. These can be broadly classified in the following
categories:
Cluster analysis / nearest-neighbour methods. Clustering consist of grouping
observations according to their similarities. There are a large number of algorithms
used to detect patterns in the data. An example of cluster analysis is the class of
nearest-neighbour methods, where the notion of closeness is determined by a
dissimilarity function.
63
Manuela, P. and Paba A. (2010): "A discrete choice approach to model credit card fraud". 64
Artıs, M., Ayuso, M., & Guillén, M. (1999). Modelling different types of automobile insurance fraud behaviour in the Spanish market. Insurance: Mathematics and Economics, 24(1), 67-81.
65 Andreoni, J., Erard, B., & Feinstein, J. (1998). Tax compliance. Journal of economic literature, 818-860.
Annex II: Methodologies Used to Study Fraud
88
Artificial neural networks (ANN). ANNs are models based on biological nervous
systems in that nodes (or ‘neurons’) are connected by a network. These models
allow for multiple logic ‘layers’ (i.e. multiple types of patterns)
Machine learning / decision trees / rule-based learning. This type of models
optimises prediction rules according to experience and some performance
measure. Decision trees and rules are particular forms of representing prediction
rules.
Data mining techniques are used to detect fraud in areas where there is abundant data
available. The main field in which these methods are applied is credit card fraud. For this
type of fraud banks typically possess millions of observed transactions, with a high
proportion of them identified as fraudulent. Data mining techniques are then used to detect
patterns in fraudulent transactions and predict the likelihood of fraud in new transactions.
Stolfo, Fan, Lee, Prodromidis, and Chan (1997)66 and Chan, Fan, Prodromidis, and Stolfo
(1999)67 apply data mining techniques to look for patterns in credit card transactions data.
Their dataset contains several months of data and at least half a million of credit card
transactions, with substantial numbers of both with legitimate and fraudulent ones. The
specific methods employed are meta-learning via classifiers and parallelisation. Also in this
literature, Srivastava, Kundu, Sural, and Majumdar, (2008) used an alternative method:
Hidden Markov Models (HMM).68
Panigrahi, et al. (2009) propose the use of Bayesian learning to tackle credit card fraud.69
They describe a fraud detection system (FDS) that determine the suspicion level of each
incoming transaction based on the extent of its deviation from good pattern. The
transaction is classified as normal, abnormal or suspicious depending on this initial belief.
Once a transaction is found to be suspicious, belief is further strengthened or weakened
according to its similarity with fraudulent or genuine transaction history using Bayesian
learning.
Bhattacharyya, Siddhartha, et al. (2011) compared the performance of Artificial Neural
Networks (ANN) methods with the results logistic regression (with a binary fraud
variable).70 They find that logistic regressions performs competitively and often surpassed,
more sophisticated data mining techniques in some performance measures. Their analysis
is based on a dataset of approximately 50 million credit card transactions.
66
Stolfo, S., Fan, W., Lee, W., Prodromidis, A., & Chan, P. (1997). Credit card fraud detection using meta-learning: Issues and initial results. In AAAI-97 Workshop on Fraud Detection and Risk Management.
67 Chan, P. K., Fan, W., Prodromidis, A. L., & Stolfo, S. J. (1999) “Distributed data mining in credit card
fraud detection” Intelligent Systems and their Applications, IEEE, 14(6), 67-74. 68
Srivastava, A., Kundu, A., Sural, S., & Majumdar, A. K. (2008) “Credit card fraud detection using hidden Markov model” Dependable and Secure Computing, IEEE Transactions on, 5(1), 37-48.
69 Panigrahi, Suvasini, et al. "Credit card fraud detection: A fusion approach using Dempster–Shafer theory
and Bayesian learning." Information Fusion 10.4 (2009): 354-363. 70
Bhattacharyya, Siddhartha, et al. "Data mining for credit card fraud: A comparative study." Decision Support Systems 50.3 (2011): 602-613.
Annex II: Methodologies Used to Study Fraud
89
In food, cluster analysis has been used by the Economically Motivated Adulteration (EMA)
database of food fraud elaborated by NCPFD (see the Annex on data sources) to establish
susceptibility of fraud for several food ingredients. This method is based on five criteria
identified by expert evaluations: the level of complexity of composition of the ingredient,
variability of the ingredient, selectivity of the ID test(s), specificity of the assay(s), and the
ability to detect EMA based on a loss of function in the final food product. The resulting
scores of the evaluations were used to perform a cluster analysis to yield distinct groups of
ingredient monographs based on EMA susceptibility. These groups are separated by the
following characteristics: higher susceptibility based on ID tests and assays, generally
lower susceptibility to EMA based on all attributes, generally higher susceptibility to RMA
based on all attributes and pending review. In contrast to direct index construction, this
categorisation contains no ranking or score provided for any of the ingredients on the
website.
Annex III: Data Sources
90
10. Annex III: Data Sources
In this section we identify and describe some of the main candidate data sources that
would feed the proposed methodology. We primarily review databases with previous
incidents of food fraud and economic data.
A large number of the data sources presented here are publicly available, although some
of them are only available by subscription and a small minority only allows access to policy
makers and institutions. We used electronic search engines and articles (both from
academic and policy literatures) to identify the relevant datasets. We explored the
databases that were accessible to us analytically and catalogued the available data
offered in each one of them. We also report the existence of other data sources that might
be relevant but for which we currently have no access.
In the remainder of this section we describe the content of these data sources. In a
separate Annex screenshots of some of these databases are presented, providing more
detail on the way in which they are structured.
a. Food fraud data
First we explore the available datasets that contain information on previous instances of
food fraud. For each database, we use the information available to us to identify:
Time coverage.
Geographical coverage.
Products covered.
Data on the extent of fraud / adulteration detected.
At the moment we have identified the following datasets:
USP Food Fraud Database:71 the dataset lists observed and reported food adulterants
since 1980 and a directory of possible detection methods reported in peer-reviewed
scientific journals. The database includes at present 1305 records, including 1000 records
with analytical methods collected from 677 references. In the future, this database may
expand to include additional publically available articles published before 1980 and in
other languages, as well as data outside the public domain. Most of the data entries refer
to incidents that occurred in US.
Time period covered: 1980-2010 for all food ingredients. The database contains all
the reports that have been published since 1980 on each possible food ingredient.
Data on the extent of fraud: this is not uniform and depends on the available
reports. Some reports are able to provide more details on the extent of the fraud
such as how many units have been adulterated and which geographical areas have
been affected while others merely report a particular incident without providing any
71
http://www.foodfraud.org/node?destination=node
Annex III: Data Sources
91
background information to allow us to analyse how significant that incident has
been.
EMA research database:72 every year the NCFPD compiles a database with documented
incidents of Economically Motivated Adulterations since 1980 in an online, searchable
database for the USA. This database provides information about the food product,
adulterant, the type of EMA, known health consequences and how the incident was
discovered. The dataset is initially available on a free trial basis and subsequently on a
paid-for-access basis. We do not have information on the price for this service.
Time period covered: 1980- present for all products.
Data on the extent of fraud: the products are organised in the following categories:
alcoholic beverages, animal food products, coffee and tea, dairy products, eggs,
fish and seafood products, fruit juices and concentrates, functional food ingredients,
grains and grain products, honey and other natural sweeteners, infant formula,
meat and meat products, oils and fats, spices, other food products and other
beverages. For each incident the user can find details on the year that it began and
ended, the number of illnesses or deaths that it caused, the type of adulteration
together with the number of references, the name of the consumer brands that were
affected, the adulterant, the produced location and the distributed location.
The FSA Food Fraud Database:73 the information in this database is only available to local
authorities and other governmental organisations. The database includes reported
incidents of food fraud in the UK. The information included in the database is received
from a range of sources including local authorities, consumers, industry, government
departments and other enforcement bodies.
Time period covered: 2007 – present.
Data on the extent of fraud: the database connects food fraud incidents as reported
by local authorities. The scope of the dataset is best understood through an
example. In the dataset there are a number of nodes which are interconnected.
These nodes can be of different types, such as local authorities, retailers, suppliers
and the food products involved. When a local authority reports suspicions of
fraudulent behaviour by a given supplier then this is documented in the data. In
case that the specific supplier has been reported before then the new information
will be immediately linked to his profile. At the same time, the database contains
information about the retailers who bought products from that specific supplier and
sometimes intelligence about the people who have supplied the given supplier with
various inputs. This way, a complex network is created which might be investigated
if enough evidence/concern is present to justify an investigation. It contains about
1400 investigations so far and covers the entire UK. While the unit of analysis of
this database are the reports received by the FSA, it would be theoretically possible
to aggregate them by food product.
72
https://www.foodshield.org/ 73
http://www.food.gov.uk/enforcement/enforcework/foodfraud/foodfrauddatabase#.U4WhFXJdVBk
Annex III: Data Sources
92
HorizonScan:74 this database is available by subscription only. However, it can be
accessed for a short period of time on a free trial basis. Data are gathered, wherever
possible, from official government sources and to date 66 countries around the world are
monitored daily. Data for countries other than the UK are gathered from reliable web
sources. The dataset can be requested initially on a free trial basis and subsequently it
costs £1500 to continue to have access to the dataset.
Time period covered: 1982-present for all reported products.
Data on the extent of fraud: the database contains reports on hundreds of products,
however, the level of detail available is limited. The user is able to find information
on the type of fraud that has occurred (adulteration or imitation, expiry date
changes, fraudulent health certificate/adulteration, produced without inspection,
unapproved premises, unauthorised/ unsuitable transport), the exporting and
importing country and finally the date when the incident was reported. However, it is
not possible to determine the extent of the fraud and how many people have been
affected.
Rapid Alert System for Food and Feed (RASFF):75 the Rapid Alert System for Food and
Feed (RASFF) was developed in order to provide food and feed control authorities a way
to exchange and access information about measures taken in response to serious risks
detected in relation to food or feed. The RASFF portal features an interactive searchable
online database. The information included in the database is arranged by classification,
hazard and product category and country of origin of the product notified. An ‘alert
notification’ is transmitted when a food is detected that presents a considerable risk and is
on the market which means that an immediate action is in another country than the
notifying country. Alerts are triggered by the Member State that has detected the problem
and has proceeded to take relevant measures, such as withdrawal or recall. The data is
only available to the relevant authorities.
Time period covered: 1991-present for all reported products.
Data on the extent of fraud: the dataset contains only information on the products
that were detected to be hazardous or adulterated. However, there could be a
number of products that have crossed borders without being detected even though
they were representing a serious hazard. The possibility to detect follow-up
notifications and volumes traded allows us to get a picture of the extent of fraud. In
2013, a total of 3205 original notifications were transmitted through the RASFF, of
which 596 were classified as alert, 442 as information for follow-up, 705 as
information for attention and 1462 as border rejection notification.
Zhichuchuangwai database:76 This is a database of past instances of food fraud in China.
The data is publicly available in Chinese. The data was collected from the Chinese food
safety issues News Archive.
Time period covered: 2004-2011 for all products.
74
https://secure.fera.defra.gov.uk/deskcheck/ 75
http://ec.europa.eu/food/food/rapidalert/index_en.htm 76
www.zccw.info
Annex III: Data Sources
93
Data on the extent of fraud: the data is broken down by Chinese region and
analytical statistics are provided for each region on the level of fraud detected in
each one of them. The products that are analysed include milk, tea powder, pork,
rice, wine, soy sauce, cooking oil, waste oil, beef, beverage, moon cake, candied
honey, wine, jelly bean, sprouts, egg, flour, dumplings, drinking bottled water, seeds
and vegetables, health food, ginger bread, mineral water, instant noodles, cabbage,
cake, ham, beans ice-cream cakes, steamed apple juice, wine, chocolate biscuits,
lamb, dried bean milk , canned mushroom etc.
FSA Food Authenticity Programme:77 this program conducted a number of studies to
investigate whether the food purchased by the consumer matches its description. These
studies included consumer research, the introduction of markers that characterise the
authenticity of foods, validation testing methods and undertaking surveys. The latter
activity produced a considerable amount of data that could be used for the purposes of this
project. For every examination that they conduct a report is published with the findings.
These reports are publicly available.
Time period covered: 2000-2008.
Data on the extent of fraud: the reports published are very detailed in terms of the
laboratory findings and the extent of fraud that was identified. The surveys report
the number of producers investigated, together with the number of incidents. While
the time coverage of the whole program is significant, each survey was conducted
for a specific product (or group of them) covering much more limited periods
(typically a few months).
Database on International Intellectual Property (DIIP) Crime:78 this database is compiled
by INTERPOL and fed by sources such as Operation Opson. It collects information about
trafficking in illicit, adulterated and counterfeit goods and deals with transnational cases.
The data gathered is then analysed in order to establish potential links between
transnational and organised cross-sector criminal activity and also to develop strategic
illicit trade crime reports.
INTERPOL does not disclose specific information contained in the DIIP. However, the
affected industries and stakeholders are notified in the form of referrals indicating that two
or more industries are being targeted by the same transnational organized criminals. While
we currently do not have access to this database, we consider that it might be possible to
obtain it in the near future.
Time period covered: 2008- present for all reported products.
Data on the extent of fraud: The dataset could provide information about who is
involved in counterfeit and illicit foodstuffs in the UK, how are the supply chain
organised and whether there are any key enablers to these supply chains.
77
http://tna.europarchive.org/20100929190231/http://www.food.gov.uk/science/research/choiceandstandardsresearch/authenticityresearch/
78 http://www.interpol.int/Crime-areas/Trafficking-in-illicit-goods-and-counterfeiting/Databases
Annex III: Data Sources
94
POISON:79 this database is part of an electronic tool utilises data from a database
denominated FoodFraudster. The developers are a small private undertaking based in the
United States named FoodQuest TQ.80 The dataset is based on information that is
available on the web (via web scraping) for any new information that becomes available
that relates to food fraud and food safety. The information is obtained using a number of
sophisticated algorithms that retrieves the data, structures it and applies a probabilistic
system filters that weigh information according to the most relevant and reliable sources.
The process is then supervised by subject experts. At present, the database contains 1100
types of fraud for a selected group of products. In addition to food fraud data from
POISON, FoodFraudster uses similar web scraping methods to obtain relevant economic
data on the selected products.
Time period covered: any date found.
Data on the extent of fraud: the database focuses on 6 products, beef; honey; fish;
olive oil; rice, and; cocoa on a worldwide basis. It contains intelligence on when a
fraud took place, how many instances were detected, what steps the authorities
took, what are the possible symptoms, the country from which the fraudulent
product came from etc. Each data entry is given a specific weight depending on its
reliability and that feeds through to raise an alarm whenever a particular area is
under high risk of fraud because many reports came to surface. The system on its
own is able to create a risk profile for each category of food and for a number of
countries internationally.
UK Food Surveillance system:81 is a national database for central storage of analytical
results from feed and food samples taken by enforcement authorities (local authorities and
port health authorities) as part of their official controls. Information about each sample and
the results of analysis are entered onto the system, and then validated, using the data
entry tool. The database is password protected and can be accessed by enforcement
authorities and laboratories to search for anonymised local, regional, and national
datasets, and identify trends and areas of non-compliance that can help develop sampling
plans. This database might provide some measures of the level of investigative efforts
devoted by local authorities on different food products. The Food Surveillance System is a
database complied and managed by the FSA. The database maintains a record of food
and feed samples taken by local authorities and examined by public analysts across UK.
The dataset covers almost the entire Northern Ireland, 62% of all the local authorities in
England and 77% of all the local authorities in Wales. The way it works is that for every
food sample taken by local authorities a new record is created which documents all the
tests taken upon that sample. The outcome of these tests is also documented so that
public authorities are able to take action when a number of hazardous incidents are
reported. The dataset contains a number of useful information on each sample. This
information includes the premise where it was taken (retail, manufacturing etc.), reason for
taking the sample, follow ups, description of the product, category of the product,
79
http://nfpcportal.com/FQTools/FoodFraudster/tabid/329/Default.aspx 80
http://foodquesttq.com/ 81
http://www.food.gov.uk/enforcement/monitoring/fss/#.U4d6mnJdVBk
Annex III: Data Sources
95
packaging and labelling information and a picture of the label. Public authorities are able to
extract the public analyst’s report and read it and also track the samples that were taken
but not analysed yet. In 2013 about 31000 samples were collected and over 200 000 tests
were conducted on these samples.
Time period covered: 2006 – present for all reported products.
Data on extent of fraud: public authorities are able to filter their searches according
to the tests that were taken, the type of food tested or whether the test taken has
deemed the sample satisfactory or not. At the same time, public authorities are able
to have answers to the questions of how many times a given food sample was
tested and out of those times how many instances of fraud were detected.
b. Economic data
This database category will include all the identified datasets that provide economic
information which could affect the probability of food fraud. We believe that prices and
volumes traded are the most significant variables in determining the possibility of food
fraud and therefore we break down our analysis into datasets that contain information on
prices and datasets that contain information on volumes.
Economic data – prices
HMRC Database catalogue – imports and exports:82 this database provides overseas
statistics on the international trade between UK and the rest of the world. The data can be
broken down by commodity code or international trade classification. The data is collected
by HMRC's statistical and administrative systems and is available for a considerable
number of food products. The primary dataset relates to EU trade in goods arrivals
(imports) and dispatches (exports) data. The database is a publicly available dataset.
Period covered: 1996- present available by month, by year and quarterly.
Variables: Commodity - HS2 to CN8, or SITC 1-5 hierarchy, EU indicator - world,
EU, non-EU and continent groupings, country, year/month (or quarter for RTS), flow
- import, export, arrival, DispatchPort - UK place of clearance (for non-EU trade
only), total value and total volume.
Products: Live animals, meat and edible meat offal, fish and crustaceans, molluscs
and other aquatic invertebrates, dairy produce; birds' eggs; natural honey; edible
products of animal origin, not elsewhere specified or included, products of animal
origin not elsewhere specified or included, live trees and other plants; bulbs, roots
and the like; cut flowers and ornamental foliage, edible vegetables and certain roots
and tubers, edible fruit and nuts; peel of citrus fruits or melons, coffee, tea, mate
and spices, cereals, products of the milling industry; malt; starches; inulin; wheat
gluten, oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit;
industrial or medical plants; straw and fodder.
82
https://www.uktradeinfo.com/Statistics/BuildYourOwnTables/Pages/Home.aspx
Annex III: Data Sources
96
Eurostat:83 The Absolute Agricultural Prices database includes the prices on main
agricultural outputs and inputs. Since 2006 only annual prices have been collected. Before
that the dataset also included monthly observation. The Member States provide Eurostat
with the required annual price series. The way that prices are measured by Eurostat is by
considering the amount they directly contribute to farmer’s income. Therefore, selling
prices are recorded at the marketing stage (the price at which the producer sells to the
trader), while purchase prices are recorded at the last marketing stage (the price from the
trader to the producer).
Period covered: For several countries data availability goes back to 1970, for most
of the countries data is available from 1990. Monthly data are available until 2006.
Variables: time, country, currency, product, producer price, consumer price.
Products: rice, chick peas, dried peas, dried beans, broad beans, lentils, main crop
potatoes, sugar, tobacco, soya, lemons, cherries, apricots, garlic, melons,
asparagus, cauliflower, chicory.
FAO GIEWS Food Price Data and Analysis Tool:84 the GIEWS started in 2008-2009 and it
contains basic food prices. This activity was part of the FAO Initiative on Soaring Food
Prices (ISFP). The database currently includes 1168 monthly domestic retail and/or
wholesale price series of major foods consumed in 190 countries covering a total of 20
different food commodity categories. Sources include meteorological information, agencies
operating satellites for earth observation, news services such as Reuters, Associated
Press, other news organizations, information from national institutions available through
publications or web sites, various reports and studies. They also send questionnaires to
various partners (FAO offices, government agencies and NGOs)
Period covered: 2000-present. Since 2012, data can also be downloaded on a
monthly basis.
Variables: country, market, commodity, price, year, currency, weight.
Products: bread, wheat, rice, soybean oil, sugar, mutton meat, potatoes, beef meat,
maize, millet, milk, prawns, bananas, barley, yam.
FAOSTAT database:85 this database provides a large selection of time series and cross
sectional data that are related to hunger issues, food and agriculture for 245 countries and
territories and 35 regional areas from 1961 to present. It also provides a number of tools
for the visualisation of the data and some basic statistical analysis (univariate and
multivariate regressions). Most of the data originated from country sources received
through the FAO Questionnaire. Periodicity of national data collections vary as countries
follow different national practices and methodologies. Data collection at the national level
is normally monthly, but can be weekly for some countries. FAO collects annually the
average prices from the countries on an annual basis. All the data is publicly available.
Period covered: 1966-present for some 200 commodities, representing over 97
percent of the world’s value of gross agricultural production in 2006.
83
http://epp.eurostat.ec.europa.eu/portal/page/portal/agriculture/data/database 84
http://www.fao.org/giews/pricetool/ 85
http://faostat3.fao.org/faostat-gateway/go/to/home/E
Annex III: Data Sources
97
Variables: producer prices, producer price indices, consumer price indices, country,
region, item, year.
Products: apples, barley, cabbages and other brassicas, carrots and turnips,
cauliflowers and broccoli, leeks, other alliaceous vegetables, lettuce and chicory,
milk, whole fresh cow, mushrooms and truffles, onions, dry, pears, potatoes, wheat.
Farmers weekly:86 this is a UK focused database that provides statistical information for
farmers. It is constructed by the Farmers Weekly magazine and is publicly available. The
information is provided to the magazine from local businesses and local authorities.
Period covered: past three months, on daily, weekly and monthly basis.
Variables: prices, quotas, region, time period, currency.
Products: HGCA grain prices, potatoes, grains, oilseeds, pulses, vegetables, hay
and straw, straights, milk, meat, cattle, sheep, pig.
Seafish authority market data:87 the database contains market data on the UK seafood
market, which includes information on retail price fluctuations at species level, and
category-based consumption trends. The reports produced by the authority together with
the monthly syndicated market data is not publicly available and only industry related
businesses can gain access to this information. More high level information such as
summaries of the market is publicly available online. The source of the data presented on
the website is the BTS Trade Statistics and the MMO reports.
Period covered: 1974-present, more detailed statistics are available for the period
between 2011 and 2014.
Variables: retail sales, top species by value and volume, seafood sales, share of
trade between major retailers, household purchases of fish, UK landings, UK ports,
import countries, export countries.
Products: salmon, tuna, cod, haddock, warm-water prawns, mackerel, pollack,
scampi, whitting, tilapia, sardines, trout, sea bass, mussels, sole, crabstick, kipper,
crab, scallops, basa, anchovy, pilchards, herring, squid, coley, sea bream, monk
fish, halibut, crayfish, lobster, hake.
IMF Primary Commodity Prices:88 this dataset presents data on primary commodity prices
on a monthly, quarterly and yearly basis. The prices are quoted in nominal terms in US
dollars. The data is collected directly by the IMF and is publicly available.
Period covered: 1980- present, annual quarterly and monthly.
Variables: price (2005=100), country, index, currency.
Products: food and beverage, beverage, industrial input, timber, cotton, wool,
rubber, and hides price indices, copper, aluminium, iron ore, tin, nickel, zinc, lead,
and uranium, crude oil (petroleum), natural gas, and coal, bananas, barley, beef,
cocoa, coffee, rapeseed oil, fishmeal, groundnut, lamb.
86
http://www.fwi.co.uk/business/prices-trends/ 87
http://www.seafish.org/research--economics/market-insight/market-data 88
http://www.imf.org/external/np/res/commod/index.aspx
Annex III: Data Sources
98
Defra - Wholesale fruit and vegetable prices:89 The fruit and vegetable wholesale price
database contains data on the average price at wholesale markets in England. Prices are
collected for a selection of the most common home-grown fruit, vegetables and flowers.
Prices are collected from Birmingham, Bristol, Liverpool and New Spitalfields with flower
prices also collected from New Covent Garden. The five most usual prices are collected
from each market along with the percentage sold at this price. Additionally information on
the supply of produce at each market is recorded.
Period covered: 2004-2014 on a monthly basis.
Variables: fruit or vegetable, quality, units, average monthly price.
Products: blackberries, blackcurrants, cherries, cooking apples, dessert apples,
gooseberries, pears, plums, raspberries, strawberries, asparagus, beetroot, brussel
sprouts, onions, cabbage, cauliflower, leeks, lettuce, spinach, tomato, turnip,
swede, sweet corn, rhubarb.
Defra – commodity prices dataset:90 this database contains the prices for selected
agricultural and horticultural produce and is published on a weekly or monthly basis. The
data source depends on the item but includes prices taken from trade journals or other
organisations in addition to prices collected by the Department for Environment, Food and
Rural Affairs.
Period covered: 2009 - present, on weekly monthly and yearly basis.
Variables: price per tonne, product, time.
Products: animal feed, bananas, cattle compensation prices, hay and straw,
livestock, cereals, poultry, eggs, butter, cheese, potatoes, sugar, sheep and pigs.
Defra – UK milk and composition of milk:91 the data available in this dataset is gathered
through a monthly survey run by Defra in England and Wales to collect information on the
volume, value and protein content of milk purchased from farms. Similar surveys are run in
Scotland and Northern Ireland. Additional information is collected by the Rural Payments
Agency (RPA) on the protein and butterfat content of the milk. The UK average farm-gate
milk price, protein content and butterfat content are then calculated.
Period Covered: 1991 - present, on weekly monthly and yearly basis.
Variables: price excluding bonus payment, price including bonus payment, butterfat
%, protein %, time.
Products: milk.
Gov.uk - Overseas trade in food, feed and drink:92 this database contains a variety of
statistics on UK imports and exports of food, feed and drink based on data collected by
HM Revenue and Customs. These statistics include Defra a long term series showing the
value of UK imports, exports and balance of trade for total food, feed and drink from 1936,
detailed figures on the value and volume of UK imports and exports of food, feed and
89
https://www.uktradeinfo.com/Statistics/BuildYourOwnTables/Pages/Home.aspx 90
https://www.gov.uk/government/statistical-data-sets/commodity-prices 91
https://www.gov.uk/government/publications/uk-milk-prices-and-composition-of-milk 92
https://www.gov.uk/government/statistical-data-sets/overseas-trade-in-food-feed-and-drink
Annex III: Data Sources
99
drink, degree of processing and commodity type from 1988, and a series showing the UK’s
food production to supply ratio (commonly referred to as the “self-sufficiency” ratio) from
1956. The dataset is publicly available.
Period covered: 1936 - present on a yearly basis.
Variables: imports, exports, balance of trade, time, value and volume of UK imports
and exports, food production to supply ratio.
Products: animal oils and fats, not chemically modified, apples, fresh, apricots,
cherries, peaches, plums and sloes (fresh), bacon and ham, bananas, barley,
unmilled, beef and veal, beef products (incl. corned beef), beer, bread, crispbreads,
savoury biscuits, butter, cereal, milled, cereal, rolled or flaked, cheese, chocolate,
cider & other fermented beverages, cocoa, coffee, condensed milk, crustaceans,
dog or cat food for retail, edible offal and other meat, eggs & egg products, fish &
crustaceans prepared or preserved, fish fresh or chilled, fish frozen, fish live, flours,
meals and pellets of meat, offal or fish, grapes, fresh or dried, honey, ice cream,
infant food for retail, jams, juice, lamb and mutton, lemons and limes, lettuce and
chicory, fresh or chilled, margarine, milk and cream.
Mundi index:93 this website contains statistics on countries commodity and trade data. It
also contains charts and maps compiled from various data sources. The datasets are
publicly available.
Period covered: 1992 – present, yearly.
Variables: production, consumption, exports, imports, prices.
Products: crude oil, jet fuel, gasoline, diesel, coffee, tea, barley, maize, rice, wheat,
bananas, oranges, beef, poultry, lamb, swine, salmon, shrimp, sugar, coconut oil,
palm oil, peanut oil, rapeseed oil, cotton, rubber, wood.
Brand View:94 this is an international price and promotions intelligence tool that provides
information to retailers and manufacturers so that they can measure and manage their
price position. It is available on a 14 day free trial and after that on a paid-for-access
subscription basis.
Period covered: 2009 – present, on a daily, weekly, monthly and yearly basis.
Variables: the database contains information on price fluctuations across retailers
for specific products.
Products: we have been given access to data on chilled meals as part of a trial.
However, the website keeps track of all the products that are presented on major
retailer’s websites and the prices at which those products are sold through the
website. These are consumers prices and now wholesale prices.
Economy Watch – Economics Statistics Database:95 the price Index indicators that are
available on this website have been constructed using IMF data from 1980 onwards. Using
that data the website also makes forecasts about future price indicators up to the end of
93
http://www.indexmundi.com/ 94
http://www.brandview.com/ 95
http://www.economywatch.com/
Annex III: Data Sources
100
2014. The price indices measure the average cost of either single items or baskets of
goods on a global basis for the year in question. The data is publicly available but is not
UK specific. The indices available are world indicators.
Period covered: 1980 – present, on a yearly basis.
Variables: prices and year.
Products: cereal, vegetable oils, meat, seafood, sugar, bananas, and oranges.
Economic data – volumes
Comtrade:96 the United Nations Commodity Trade Statistics Database contains detailed
imports and exports statistics reported by statistical authorities of close to 200 countries or
areas. It concerns annual trade data from 1962 to the most recent year and is publicly
available.
Period covered: 1962 – present, on a yearly and monthly basis.
Variables: imports and exports prices and volumes.
Products: meat and edible meat offal, fish, crustaceans, molluscs, aquatic
invertebrates, dairy products, eggs, honey, edible animal product, products of
animal origin, live trees, plants, bulbs, roots, cut flower, edible vegetables and
certain roots and tubers, edible fruit, nuts, peel of citrus fruit, melons, coffee, tea,
mate and spices, cereals, milling products, malt, starches, inulin, wheat gluten, oil
seed, grain, seed, fruit, gums, resins, vegetable saps and extracts, vegetable
plaiting materials, vegetable products, animal, vegetable fats and oils, cleavage
products, meat, fish and seafood food preparations, sugars and sugar
confectionery, cocoa and cocoa preparations, cereal, flour, starch, milk preparations
and products, vegetable, fruit, nut, food preparations, miscellaneous edible
preparation, beverages, spirits and vinegar, residues, wastes of food industry,
animal fodder, tobacco and manufactured tobacco substitutes.
Eurostat – international trade data: this database covers both extra- and intra-EU trade:
Extra-EU trade statistics cover the trading of goods between Member States and a non-
member countries. Intra-EU trade statistics cover the trading of goods between Member
States. "Goods" means all movable property including electricity. The main source of
statistical information are mainly the traders on the basis of Customs (extra-EU) and
Intrastat (intra-EU) declarations. Data are collected by the national authorities of the
Member States and compiled according to a harmonised methodology established by EU
regulations before transmission to Eurostat.
Period covered: 1999 – present, on a yearly basis.
Variables: reporting country, reference period, trade flow, product, trading partner
mode of transport. trade value (in Euro), trade quantity in 100 kg, trade quantity in
supplementary units, gross and seasonally adjusted trade value (in million Euro),
unit-value indices, gross and seasonally adjusted volume indices, growth rates of
96
http://comtrade.un.org/db/
Annex III: Data Sources
101
trade values and indices, trade value (in billion Euro), shares of Member States in
EU and world trade, shares of main trading partners in EU trade.
Products: Food, drinks and tobacco.
ONS - Fish production:97 this database covers catch and trade statistics for the UK fishing
industry. The catch and landings data available include information on the quantity, value,
species and area of capture by UK vessels landing into the UK and abroad, and foreign
vessels landing into the UK. The overseas trade statistics bring together the data on the
fish and fish products available for consumption, imports, exports and household
consumption. The data sources include logbooks, landing declarations, sales notes and
personal contact with fishermen and merchants. The method used for collecting data
depends upon the size of vessel and location of landings. All the data are publicly
available.
Period covered: 1866 – present, on a yearly basis.
Variables: landings by UK vessels, production, UK vessels into key ports, size of
UK fishing fleet, number of UK fishermen, imports and exports of fish, GDP for fish,
world catch by sea area.
Products: fish, tuna and mackerel.
FAOSTAT – Agricultural Production Index:98 this database looks at the relative level of the
aggregate volume of agricultural production for each year in comparison with the base
period 2004-2006. They are based on the sum of price-weighted quantities of different
agricultural commodities produced after deductions of quantities used as seed and feed
weighted in a similar manner. Production quantities of each commodity are weighted by
2004-2006 average international commodity prices and summed for each year. The data is
publicly available.
Period covered: 1961 – present, on a yearly basis.
Variables: area harvested, yield, production quantity, seed.
Products: Crops, processed crops, live animals, livestock primary, livestock
processed
NOAA – National Marine Fisheries Service:99 the NOAA Fisheries, Fisheries Statistics
Division has automated data summary programs that anyone can use to rapidly and easily
summarize U.S. commercial fisheries landings.
Period covered: 1990 – present, on a yearly basis.
Variables: number of landings.
97
http://www.statistics.gov.uk/hub/agriculture-environment/fish/fish-production/index.html 98
http://faostat3.fao.org/faostat-gateway/go/to/download/Q/*/E 99
http://www.st.nmfs.noaa.gov/commercial-fisheries/
Annex III: Data Sources
102
c. Other data considerations
The selected methodology is likely to include other factors beyond the economic ones. We
expect them to be, depending on the product, a subset of the factors listed in the literature
review above. Given the wide variety of variables that could potentially be included we do
not provide a systematic review of all these sources.
We note that many of the non-economic factors, such as product and distribution
characteristics, are likely to be product specific and not change significantly over time.
Therefore, if the analysis is performed on a single product these factors would not
introduce any variation to contribute to the explanatory power of the model. To conclude
this annex, we discuss potential gaps in data availability.
Data sources not identified
The type of data that we are currently missing involves variables that are difficult to
measure or even define. For example, there are multiple sources in the literature that
indicate that the complexity of the supply chain plays an important role in determining the
likelihood of food fraud (e.g. the Elliott review). According to the literature, the longer and
more complex the supply chain the higher the probability of fraud. However, there are
multiple metrics that could be used to capture this variable. Moreover, we have not found a
universal data source that could be used to construct this variable, in whatever form it is
defined. We envisage that expert judgement and advice would be a major input in
elaborating these measures for the case study and other future applications.
Annex IV: Econometric methodology
103
11. Annex IV: Econometric methodology
While not necessarily an econometric requirement, it is good practice to provide some
description of the data before conducting any estimations. The type of statistics that are
recommendable are:
Summary table: it would contain the maximum, minimum and average value of each
variable. Additional information could include the standard error.
Linear correlation table: this square table estimates the pairwise linear correlation
between all variables. Correlations with the explained variable provide a less
sophisticated quantification of the effect of the explained variable (e.g. it would not
control for other variables being constant). More importantly, this table would be
useful to anticipate multicollinearity issues. These occur when two or more
explanatory variables are highly correlated. Therefore, regression methods might
struggle to attribute the effect of these variables separately, especially in the case of
small datasets.
Bi-variate charts, typically between the explained variable and other variables: this
type of charts usually provides insight into the nature of the correlation with the
explanatory variables.
T-tests: these are particularly useful when the explained variable is binary (e.g. if
the variable is whether fraud was observed in the period). It would be possible to
calculate the mean of the explanatory variables for observations where the
explained variable is equal to zero and one, respectively. These means would
typically differ. However, the t-test would evaluate whether the difference is
statistically significant.
Models: specifications and estimation methods
The proposed econometric approach is not a single model but a family of them. In fact, it is
considered good practice to estimate the desired relationships using different models, to
explore the robustness of the estimations. The models can vary depending on their
specification (the set of variables that are chosen) and the estimation method.
Based on a number of statistical diagnostics and tests (see below), it is possible to
determine which model and specification fits better the data and, therefore, provides the
more reliable estimations.
Given the nature of food fraud data (in particular of the explained variable), the
methodology consists of the three following classes of methods:
Ordinary Least Squares (OLS): This method postulates a relationship of the form:
where the s are the estimated coefficients. This method chooses the coefficients so that
the sum of squared errors is minimised. It is the most popular method used in economics
and has many advantageous properties, such producing unbiased estimates. The
Annex IV: Econometric methodology
104
explained variable for this method can be defined as the fraction of non-compliant tests out
of the total number of samples taken, maximising the amount of information available in
the data. However, it has an important disadvantage: the linearity of the model allow in
principle for the risk of fraud to be unbounded. This method could lead to contradictory
results, since the risk of fraud should always be bounded between zero and one.
Binary methods: These methods are particularly appropriate when the explained variable
can take only values between zero and one. The method would estimate the probability of
fraud using a cumulative probability distribution as the functional form instead of a linear
function, as postulated by OLS. This method is very popular since, by definition,
cumulative distribution functions are strictly increasing ranging from zero to one. The most
common methods are logit and probit, which use the logistic and normal cumulative
distribution functions. In the case of fraud, the explained variable would take a value of one
if fraud was detected in that period and zero otherwise.
Multinomial methods: These models are an extension of the binary methods whereby the
explained variable can take more than two values and these values are ordered. The most
used methods included the (ordered) multinomial logit (for categorical variables) and
Poisson or negative binomial methods (for integers). In the case of categorical values, the
explanatory variable could be constructed using categories such as low, medium and high
risk. In the case of integers, commonly referred as count methods, the explanatory
variable could be defined as the number of identified cases of fraud.
In addition, for each model it is possible to try different specifications either by using
different sets of explanatory variables or by defining these variables in one of the following
forms:
Levels: the variable is expressed in its original form.
Differences: the variable is expressed as the difference between the current and
previous period. A regression using differences would establish the relation
between changes in the variables from one period to the next.
Logarithms: the variable is expressed as the logarithm of the original. This might
serve two purposes. First, it modifies the functional form of the regression
equations, which might provide a better fit for the data. Second, the interpretation
of the coefficients is made in terms of “elasticities”. That is, instead of capturing the
effect of an increase in one unit, the coefficient captures the effect of an increase in
one per cent.
Lagged: the variable is expressed as the level of the previous period (or periods).
The regression is then well-suited to capture changes that manifest themselves with
a delay.
Polynomial: the variable is expressed as the different powers of the original.
Therefore, instead of estimating a linear equation, the regression estimates a
polynomial.
Annex IV: Econometric methodology
105
Diagnostics and tests
Given the large number of possible models and the conceptual differences in the methods
proposed above, it is important to count with criteria to select the most appropriate
estimations. This selection is assisted by the following statistics and tests:
Statistical significance of individual and joint variables. It is advisable to perform a
statistical test that evaluates the hypothesis that the variables have no explanatory
power at all (i.e. that the real coefficients are equal to zero). These tests are
standard in any econometric methodology. The null hypothesis that the variables
have no statistical significance can be evaluated with different levels of confidence,
typically ranging between 90 and 99 per cent. It is important to notice that these
tests might erroneously accept the null hypothesis if the data sample is small.
Explanatory power (e.g. R-squared) and goodness of fit (adjusted R-squared or
Akaike information criterion). These statistics capture the amount of variation in the
explained variable that can be attributed to variation in the explanatory variables. In
the case of the goodness of fit statistics, specifications that include a large number
of variables are penalised.
Heteroskedasticity (White test). OLS and other methods work under the assumption
that of errors are independent and identically distributed. If this assumption is not
satisfied, the coefficients might be biased. Heteroskedasticity refers in particular to
the case in which the variance in the errors is not uniform. The White test
investigates whether this is observed in the data. In case of detected
heteroskedasticity, it is possible to conduct a “robust” estimation that would correct
for this bias.
Auto-correlation (Breusch-Pagan test). Another violation of the methods’
assumptions occurs when the errors are correlated with each other over time,
creating biased results. The Breusch-Pagan test investigates whether this is
observed in the data.
Annex V: Linear Correlations
106
12. Annex V: Linear Correlations
Table 12.1: Linear Correlations
non compliant samples
fraud percentage
log of price differences india
log of price differencespakistan
log of indian exchange rate
log of basmati export from india to uk
log GDP India
log GDP Pakistan
log GDP UK
Log Basmati production India
Log of number of samples
non compliant samples
1
fraud percentage
0.62* 1
log of price differences india
0.28* 0.07*
1
log of price differences in pakistan
0.29* 0.15*
0.76* 1
log of indian exchange rate
0.3* -0.19*
0.42* 0.49* 1
log of basmati export from india to uk
0.07* -0.28*
0.17* 0.08* 0.6* 1
log GDP India
-0.01* -0.74*
-0.11* 0.24* 0.27* -0.79* 1
log GDP Pakistan
0.16* -0.73*
0.06* 0.3578* 0.71* 0.69* 0.82*
1
log GDP UK
0.14* -0.76*
0.14* 0.43* 0.61* 0.14* 0.9*
0.98*
1
log Basmati production India
-0.11* 0.0 0.19* 0.24* 0.40* 0.25* 0.20*
0.42*
0.39*
1
log of number of samples
0.62* 0.13*
0.55* 0.54* 0.60* -0.04* -0.1*
0.29*
0.26*
-0.23* 1
Note: * means the correlation is significant at a 95% significance level.
Annex VI: Econometric Estimation
107
13. Annex VI: Econometric Estimation
Table 13.1: Complete OLS results - India
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Constant -
0.2
0.44 0.0
3
0.18 -
0.46
-
0.84
0.0
9
0.11
***
0.14
**
0.01 0.0
4
0.1
1
-
0.0
8
-
0.2
1
2.0
9
4.85 2.8
2
2.7
9
2.7
9
48.
25
-
228
.21
395.7
1***
0.1
1
-
1.3
3
log_pr_diff_ind_ext
L0 0.0
5
0.98
*
1.6
0***
1.51
**
1.40
**
1.41
**
0.0
5
0.0
4
0.05 0.0
7
0.0
8
0.0
8
0 0.5
6
0.15 0 0.2
1
0.3
4
L1 -
1.03
*
-
3.0
4***
-
2.95
***
-
2.82
***
-
2.88
**
L2 1.4
6**
0.89 1.05 1.08
L3 0.55 -
0.27
-
0.05
L4 0.74 0.2
L5 0.39
d.log_pr_diff_ind_ext
L0 0.9
7*
1.59
**
1.51
**
1.63
***
L1 -
1.45
**
-
1.44
**
L2 -
0.54
L^2 5.62
*
Annex VI: Econometric Estimation
108
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
log_pr_ratio_ind_ext
L0 0.1 1.8
7*
3.6
0***
L1 -
1.9
3*
-
7.1
2***
L2
3.7
8***
log_basmati_prod_ind
L0 0 0.1
9
0
L1 -
0.3
7
0
L2 -0.4
log_basmati_exportq_ind_u
k
L0 -
0.3
6
0.8
5
0.8
5
L1 -
1.2
2
-
1.2
2
L2 *multicolin
earity
log_rice_cons_uk 22.
16
-
31.99
*
log_gdp_ind -
9.9
7***
-
14.94
***
Annex VI: Econometric Estimation
109
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
log_samples
L0 0.0
3
0 -
0.0
2
L1 0 -
0.0
1
L2 0.0
13
Number of Observations 21.
00
21.0
0
21.
00
21.0
0
21.0
0
21.0
0
21.
00
21.0
0
21.0
0
21.0
0
21.
00
21.
00
21.
00
21.
00
21.
00
21.0
0
17.
00
17.
00
17.
00
21.
00
12.
00
12.00 21.
00
12.
00
7.0
0
R2 0.0
0
0.36 0.1
8
0.40 0.46 0.49 0.1
6
0.36 0.40 0.32 0.0
1
0.1
6
0.4
5
0.0
0
0.0
8
0.13 0.1
1
0.1
5
0.1
5
0.5
4
0.1
5
0.88 0.0
2
0.4
7
0.4
0
F statistic 0.0
8
3.25 1.2
3
2.63 2.61 2.22 3.6
5
5.15 3.72 4.22 0.1
5
1.7
0
4.6
2
0.0
4
0.4
9
0.57 0.8
9
0.7
5
0.7
5
10.
61
0.8
1
20.24 0.1
5
2.3
6
0.3
4
Prob > F 0.7
7
0.05 0.3
3
0.07 0.07 0.10 0.0
7
0.02 0.03 0.03 0.7
1
0.2
1
0.0
2
0.9
6
0.6
9
0.69 0.4
3
0.5
4
0.5
4
0.0
0
0.4
7
0.00 0.8
6
0.1
5
0.8
4
Heteroskedasticity
(Breusch-Pagan test),
chi2(1) =
0.0
4
12.5
1***
9.0
7***
13.8
5***
17.7
5***
14.8
6***
6.1
6*
12.8
0***
13.6
4***
17.8
7***
0.1
9
6.7
1**
6.9
3***
0.0
4
5.9
4*
10.8
1***
1.6
9
2.0
2
2.0
3
1.0
7
5.5
5
0.05 0.1
6
0.7
3
0.1
6
Autocorrelation (Breusch-
Godfrey)
0.6
7
0.48 0.6
4
0.53 0.23 0.25 0.0
0
0.50 0.50 0.41 0.2
0
0.0
1
1.3
8
0.6
6
0.1
1
0.80 0.0
1
0.0
1
0.0
1
3.3
9*
0.0
3
0.95 0.0
8
2.4
7
6.4
0
Adjusted R2 -
0.0
5
0.25 0.0
3
0.25 0.29 0.27 0.1
2
0.29 0.29 0.24 -
0.0
4
0.0
7
0.3
5
-
0.1
1
-
0.0
8
-
0.09
-
0.0
1
-
0.0
5
-
0.0
5
0.4
9
-
0.0
4
0.84 -
0.0
9
0.2
7
-
0.8
0
Akaike Information Criterion 1.1
1
-
4.32
1.0
9
-
3.41
-
3.93
-
2.85
-
2.4
9
-
6.31
-
5.41
-
4.88
1.0
4
-
0.4
4
-
7.3
2
3.1
1
3.4
5
4.38 -
19.
99
-
18.
67
-
18.
67
-
13.
16
7.9
4
-
13.87
2.8
6
-
23.
87
-
6.8
9
Annex VI: Econometric Estimation
110
Table 13.2: Key to the labels of the variables
Abbreviation Name of variable
log_pr_diff_ind_ext logarithm of the price difference between India and the world
d.log_pr_diff_ind_ext difference in the logarith of the price difference between India and the world
log_pr_ratio_ind_ext logarithm of the price ratio of Indian price to the world price
log_basmati_prod_ind logarithm of the basmati productionin India
log_basmati_exportq_ind_uk logarithm of the basmati quantity exported from India to UK
log_rice_cons_uk logarithm of UK consumption if Indian Basmati rice
log_gdp_ind logarithm of Indian GDP
log_samples logarithm of the number of samples tested by the FSA
log_pr_pak logarithm of the Pakistani basmati rice
d.log_pr_pak difference in the logarithm of the price of Pakistani rice
log_pr_ratio_pak logarithm of the price ratio of Pakistani price to the world price
L0 variable at current level
L1 variable lagged by one period
L2 variable lagged by two periods
L3 variable lagged by three periods
Table 13.3: Complete OLS results -Pakistan
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Constant -0.82 -1.17 -0.33 -0.33 -0.54 0.16 .14*** 0.12** 0.12 -0.86 -0.05 -0.12 0.06 -0.61
log_pr_pak
L0 0.14 -0.77 -0.93* -0.93* -0.95 -0.86 0.11
L1 0.98* 1.94*** 1.93** 1.83* 1.59*
Annex VI: Econometric Estimation
111
1 2 3 4 5 6 7 8 9 10 11 12 13 14
L2 -0.94* -0.93 -0.74 -0.39
L3 -0.01 -0.26 -0.64
L4 0.21 0.76
L5 -0.47
d.log_pr_pak
L0 -0.89* -0.97* -.95** 0.15
L1 0.99** 0.97*
L2 0.04
L^2 -1.74
log_pr_ratio_pak
L0 0.21 -1.07 -1.6*
L1 1.40 3.57***
L2 -1.9**
log_samples
L0 0.01
Annex VI: Econometric Estimation
112
1 2 3 4 5 6 7 8 9 10 11 12 13 14
L1
L2
Number of Observations 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00
R2 0.02 0.16 0.33 0.33 0.34 0.41 0.12 0.33 0.33 0.03 0.03 0.13 0.34 0.03
F statistic 0.43 1.76 2.80 1.98 1.54 1.61 2.62 4.35 2.75 0.31 0.52 1.40 2.95 0.23
Prob > F 0.52 0.20 0.07 0.15 0.24 0.22 0.12 0.03 0.07 0.74 0.48 0.27 0.06 0.80
Heteroskedasticity (Breusch-Pagan test), chi2(1) = 1.30 12.81*** 14.79*** 14.67*** 10.46*** 10.57*** 15.19*** 17.83*** 16.64*** 0.51 0.48 10.47*** 12.5*** 0.52
Autocorrelation (Breusch-Godfrey) 0.19 1.69 1.08 1.08 1.01 0.18 1.43 0.88 0.88 0.20 0.22 1.29 0.49 0.14
Adjusted R2 -0.03 0.07 0.21 0.16 0.12 0.15 0.07 0.25 0.21 -0.07 -0.02 0.04 0.23 -0.08
Akaike Information Criterion 0.73 -0.55 -3.23 -1.23 0.52 0.19 -1.51 -5.09 -3.10 2.48 0.64 0.17 -3.60 2.67
Annex VI: Econometric Estimation
113
Table 13.4: Complete OLS results – Pakistan
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Constant -0.82 -1.17 -0.33 -0.33 -0.54 0.16 .14*** 0.12** 0.12 -0.86 -0.05 -0.12 0.06 -0.61
log_pr_pak
L0 0.14 -0.77 -0.93* -0.93* -0.95 -0.86 0.11
L1 0.98* 1.94*** 1.93** 1.83* 1.59*
L2 -0.94* -0.93 -0.74 -0.39
L3 -0.01 -0.26 -0.64
L4 0.21 0.76
L5 -0.47
d.log_pr_pak
L0 -0.89* -0.97* -.95** 0.15
L1 0.99** 0.97*
L2 0.04
L^2 -1.74
log_pr_ratio_pak
Annex VI: Econometric Estimation
114
L0 0.21 -1.07 -1.6*
L1 1.40 3.57***
L2 -1.9**
log_samples
L0 0.01
L1
L2
Number of Observations 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00 21.00
R2 0.02 0.16 0.33 0.33 0.34 0.41 0.12 0.33 0.33 0.03 0.03 0.13 0.34 0.03
F statistic 0.43 1.76 2.80 1.98 1.54 1.61 2.62 4.35 2.75 0.31 0.52 1.40 2.95 0.23
Prob > F 0.52 0.20 0.07 0.15 0.24 0.22 0.12 0.03 0.07 0.74 0.48 0.27 0.06 0.80
Heteroskedasticity (Breusch-Pagan test), chi2(1) = 1.30 12.81*** 14.79*** 14.67*** 10.46*** 10.57*** 15.19*** 17.83*** 16.64*** 0.51 0.48 10.47*** 12.5*** 0.52
Autocorrelation (Breusch-Godfrey) 0.19 1.69 1.08 1.08 1.01 0.18 1.43 0.88 0.88 0.20 0.22 1.29 0.49 0.14