designing safe, profitable automated stock trading agents

Upload: hundredwaters7387

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Designing safe, protable automated stock trading agents

    1/8

    Designing safe, profitable automated stock trading agentsusing evolutionary algorithms

    Harish Subramanian, Subramanian Ramamoorthy, Peter Stone, Benjamin J. KuipersArtificial Intelligence Laboratory, Department of Computer Sciences

    The University of Texas at AustinAustin, TX 78712, USA

    [email protected], [email protected],

    (pstone,kuipers)@cs.utexas.edu

    ABSTRACT

    Trading rules are widely used by practitioners as an effective meansto mechanize aspects of their reasoning about stock price trends.However, due to the simplicity of these rules, each rule is suscepti-ble to poor behavior in specific types of adverse market conditions.Naive combinations of such rules are not very effective in miti-

    gating the weaknesses of component rules. We demonstrate thatsophisticated approaches to combining these trading rules enableus to overcome these problems and gainfully utilize them in au-tonomous agents. We achieve this combination through the use ofgenetic algorithms and genetic programs. Further, we show thatit is possible to use qualitative characterizations of stochastic dy-namics to improve the performance of these agents by delineatingsafe, or feasible, regions. We present the results of experimentsconducted within the Penn-Lehman Automated Trading project. Inthis way we are able to demonstrate that autonomous agents canachieve consistent profitability in a variety of market conditions, inways that are human competitive.

    Categories and Subject Descriptors

    [Real World Applications]: finance, intelligent agents, automatedtrading

    General Terms

    Algorithms, Design, Experimentation, Performance

    Keywords

    Genetic Algorithms, Genetic Programming, Finance, Application,Fitness Evaluation

    1. INTRODUCTIONRecent developments in the automation of exchanges and stock

    trading mechanisms have generated substantial interest and activ-ity within the machine learning community [13, 17], including theevolutionary algorithms community [3, 4, 5, 8].

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO 06 Seattle, WA USACopyright 200X ACM XXXXXXXXX/XX/XX ...$5.00.

    When building autonomous agents, it is desirable to structurethem in a modular fashion - as combinations of simple interpretablepieces. Such a structure could enable the identification of specifictransferable features in the strategy that contribute to the observedsuccess. This is crucial if these algorithms are to be adopted in thereal world of banks and investments, where the stakes are high and

    practitioners need some assurance of safety and, at least statistical,bounds on profitability. Correspondingly, when a strategy is foundto be unprofitable, it can often be improved if its failure can beattributed to specific causes that can then be eliminated.

    One way to achieve such modular design is through the use oftechnical trading rules. Such rules are widely used by practition-ers. For instance, a survey of foreign exchange traders in London[19] estimates that up to 90% of them use some form of tradingrules in daily practice. As such, the popularity of the rules fliesin the face of theoretical objections arising from the efficient mar-kets hypothesis, [6, 12], that asserts that stock markets are randomprocesses lacking in systematic trends that may be exploited bymechanized rules. However, with the emergence of high frequencyfinance [7] it has been demonstrated that the efficient market hy-

    pothesis is not entirely valid in the short time scales at which au-tonomous agents are able to operate. Moreover, recent evidence[1, 11] has suggested that the profitability of trading rules can berigorously characterized and the simplicity of these rules does notnecessarily imply a lack of theoretical justification for their use. Asstated in [1], there is ample evidence to suggest that trading rulesare very well suited to autonomous agents operating in electronicmarkets.

    A more pressing concern, from the perspective of agent design,is that each trading rule is profitable in certain environments or mar-ket conditions and there are corresponding market conditions underwhich it could be unprofitable. This dichotomy suggests that norule can be used in isolation for any reasonable period of time. Italso implies that the emphasis of design needs to be on compos-ite rules - as noted in [4], we need to move beyond the technicaltrading rule and consider the technical trader. In fact, when peo-ple use trading rules, they observe several indicators and use their

    judgement to arrive at a composite decision. Naive combinationsof trading rules are rarely, if ever, useful and prudent. So, the ques-tion arises as to how the human judgement involved in combiningthese rules can be emulated in an automated strategy acting withina computer program? We will address this question in this paper.

    Another key aspect of human judgement is the way we han-dle asymmetry between upside and downside risks. Traders mightwelcome the occasional windfall profits but they certainly wantto avoid bankruptcy at all costs; and they act in such a way thatthese preferences are satisfied. For an automated agent emulating

  • 7/29/2019 Designing safe, protable automated stock trading agents

    2/8

    such behavior, we seek some assurance that worst-case losses arebounded below at an acceptable level. In an uncertain trading envi-ronment, and with limited information, are there mechanisms thatenable us to arrive at such bounds? It seems unlikely that we canestablish these bounds in an absolute sense - after all, this is a sto-chastic environment notorious for its inherent risks. However, wedo seek a statistical characterization that assures us that our profitsare consistent in an acceptable way. This is what we mean by the

    term safe agent design.In this paper, we present an approach to autonomous agent de-sign that explicitly addresses these issues. We show that it is pos-sible to achieve safety by qualitatively characterizing the domainof profitability of each rule and devising multiple model strategiesthat attempt to navigateaway from unsaferegions of the state space.We establish the soundness of this concept by discussing the prop-erties of a simple hand-constructed agent that won an open multia-gent competition. We then show that through the use of evolution-ary algorithms acting within this framework of safe agent design,we can construct fairly sophisticated agents that achieve levels ofperformance that are at least comparable to, and often superior to,what humans are usually able to achieve. We present two differentdesigns based on evolutionary algorithms. A genetic algorithm isused to optimize a weighted combination of technical rules. Eachgeneration of candidate solutions is optimized to increase the aver-age returns and reduce the volatility of returns - over a fixed periodof training days. Simulations are run to tune the weights to traderobustly over varying market conditions. A similar design is alsoimplemented with a genetic program that evolves complex rulesfrom a pool of constituent rules and boolean operators to combinethem. We evaluate various fitness functions and performance mea-sures including a simple measure of risk adjusted return and onethat penalized only downside volatility. We compare their mer-its and effects on the agents, and evaluate their performance whenused as a tuning parameter for the GA.

    Trading on real world electronic markets, that fulfil orders in anautomated fashion, is a costly exercise. To allow us the freedomof experimenting in a limited, but realistic electronic exchange,

    we use a simulated environment as part of the Penn-Lehman au-tomated trading (PLAT) project [9]. It uses real-world, real-timestock data and incorporates complete order book information; sim-ulating the effects of matching orders in the market. It also allowsthe participants to program their own strategies with ease withinthe client-server mechanism, and trade with other agents as well asthe external market.

    2. THE PENNLEHMAN AUTOMATED

    TRADING PROJECTThe Penn-Lehman Automated Trading (PLAT) project is a broad

    investigation of algorithms and strategies for automated trading infinancial markets. The agents described in this paper were con-structed as a part of this research effort. All agents implementedwithin this project trade using the Penn Exchange Simulator (PXS).PXS is a software simulator for automated trading that merges au-tomated client limit orders with real world, real time order dataavailable via modern Electronic Crossing Networks (ECN). Thesimulator periodically queries the ECN to get order book informa-tion, which it then uses to match buy and sell orders. The orderbooks from the Island external market and the internal market cre-ated by PLAT agents are merged, i.e., the simulator tries to matchthe agent orders against actual Island trades and if no such match isavailable, it tries to match against another agents order.

    Some of the agents described in this paper were initially de-

    signed to participate in an open multi-university competition. Theparticipants in the competitions were researchers from several ma-

    jor research universities in the USA and UK. The contests eachconsisted of 10 days of live trading. As mentioned earlier, the mar-ket clearing mechanism combined the order books received fromNASDAQ and an order book that was internal to the simulatedenvironment. The rules of this competition include the followingclauses that define the trading task: (1) The task is that of intra-

    day trading. Each agent begins with no money and no shares at thestart of each trading day. The agent may take long or short posi-tions during the day in an attempt to improve net worth. (2) Thereare no limits on the number of shares that can be held at any giventime. However, every agent must liquidate its position at the endof the trading day. Any shares held at the end of the day will bevalued at 0. Any shares borrowed from the simulator will need tobe purchased at a cash value of twice the current price. (3) PXSruns in transaction cost/rebate mode for the competition. Everytime PXS executes a trade, one side of the order must have alreadybeen sitting in one of the order books, and the other side of the or-der must have been the incoming order. For each share executedby PXS, the party whose order was already in the books receives arebate of $0.002, and the party that was the incoming order pays atransaction fee of $0.003. This is exactly the policy used by IslandECN. (4) The performance measure for the contests is a form of theSharpe ratio [16]. Additionally, for our experiments, we employ amodified risk adjusted return as a measure based on the policy thatwe are not perturbed by the possibility of volatile return structureprovided the strategy is mostly profitable. Sortino ratio [15], is amodification of the Sharpe ratio that differentiates harmful volatil-ity from volatility in general, using a value for downside deviationonly.

    The discussion in this paper will be framed in terms of the fol-lowing variables and functions:

    Price, pn, is a random variable (where n is a timestep)

    Volume of shares, vn = f(vn1,...,v0, pn1,...,p0)

    Cash in hand, cn = g(cn1, vn, pn)

    Net Worth, or Value, Vn = pn vn + cn

    The goal, for a single day of trading, is to maximize Vn.

    The performance of the agent is evaluated by Sharpe andModified Sortino ratios based on Vn at the end of severaldays of trading.

    The goal of agent design is to design the function f to maxi-mize the performance measure.

    3. QUALITATIVE BEHAVIORS OF

    TRADING RULESComposite rules need to be constructed from simpler rules. What

    should the basis set of simpler rules consist of, using which wecompose the more complex rules? The intuitive answer is that theymust complement each other. For instance, given a particular setof trends during a trading day, if one rule were unprofitable thenanother rule(s), adapted to the specific conditions of the day, shouldbe able to take over. As such the space of trends is infinite, so weneed a more compact language to characterize rules in a basis set.In this Section, we present one way to approach this issue. Tradingrules behave as predictive filters [14]. Profitability is a function ofthe dynamic properties of the predictive filter and the dynamics ofthe stock price. In this section, we consider the qualitative behavior

  • 7/29/2019 Designing safe, protable automated stock trading agents

    3/8

    of one rule, the contrarian strategy, but this analysis can be extendedto other rules as well.

    Consider the Contrarian strategy, defined in terms of sign rela-tionships. As long as an upward or downward trend is detected, thevolume is corrected downward or upward respectively.

    vn =

    8

    vn1 + : pn <

    vn1 : < pn < (1)

    Often, the decision pn > can not be made instantaneously[14]. In practice, this is approximated by using moving averages.An arithmetic moving average rule, based on a moving window canbe used. A more useful form of this approximation can be achievedby comparing two windows of different lengths.

    ms1X

    j=0

    pnj/ms

    ml1X

    j=0

    pnj/ml + pn > (2)

    where ml > ms.In order to understand the behavior of this rule, we need a char-

    acterization of the dynamics of the price. Consider a price processthat is a mean reverting log random walk (at any given time, thereturns can be assumed to be randomly chosen from a lognormaldistribution) with drift,

    xn = ln(pn) ln(pn1)

    xn = + xn1 + n (3)

    where n iid(0, 2).

    We may characterize the conditions required for profitable oper-ation (with the trading rule of equations 1,2, window sizes ms, mland the price process of equation 3) in terms of the probability dis-tribution of daily returns. In specific cases, the conditions can bederived analytically. For instance, [2] presents analytical condi-tions for profitability, with ms = 1, ml = 2, and [1] includes somegeneralizations. However, in general, this becomes somewhat in-volved. An alternate approach to gaining a similar understanding is

    through the use of Monte Carlo simulation. We adopt the followingprocedure to perform a simulation study:

    1. Select models for the price process and trading rules

    2. Generate a large number of sample paths of the price processand apply the trading rule. A typical setting might consistof 3000 sample paths, each path corresponding to a daystrading, with trading opportunities separated by 1 minute in-tervals.

    3. Collect sample statistics of the value at the end of the samplepaths

    Performing such an experiment for the Contrarian strategy yieldsthe following characterization:

    The rule is profitable, in a statistically significant sense, in amean-reverting log random walk market.

    The rule results in zero expected profits in a simple log ran-dom walk market.

    The rule is potentially loss-making in a log random walkmarket with drift, i.e., = 0.

    If this rule were to serve as a component of a composite strat-egy, then the primary requirement is that whenever = 0, weshould avoid this rule and prefer another rule that is profitable un-der that condition. Many trading rules can be characterized in this

    way. Even when such a formal characterization is hard to comeby, it stands to reason that if we have a sufficiently diverse bagof rules, we can compensate the weaknesses of each rule by substi-tuting another rule that is of a complementary nature. In this way,the composition is significantly more robust than each individualrule. The above characterization is qualitative to the extent that weare able to make statements based just on the sign of key quanti-ties, e.g., = 0 or = 0. This ensures that a composition based

    on these rules can be expressed compactly and without dependingon detailed identification of market conditions. In situations wherethe rules do not admit such a qualitative representation, we can stilluse the same approach to partitioning state space, but we will needmore detailed quantitative information about the price process.

    4. COMPOSING MULTIPLE TRADING

    RULESIn the previous Section, we arrived at the rationale for when a

    particular rule should be invoked. How do we make this decisionin real time? Characterizing statistical distributions in real-timeis hard. In this Section, we propose an alternate, surrogate, testthat allows us to make this decision. The state of a trading agent

    can be represented by two important variables - cash in hand, cn,and volume of shares, vn. In a phase space defined by these twovariables, depicted in Figure 1, the result of a days trading definesa trajectory. Ideally, the agent follows a path along the direction ofincreasing cn while staying close to vn = 0, thus minimizing risk.

    Now, we wish to identify, without fitting statistical distributions,when the agent is in one of the unprofitable regions of state space.Intuitively, the agent is in an unprofitable region when its perfor-mance falls below that of a simple random walk market. In thecn vn phase space, the random walk market yields a line that hasthe property that a trading rule that is neutrally profitable wouldstay on this line, exchanging cash for shares and vice versa, but notmaking any profit. A profitable agent will follow a trajectory thatlies on one of the half planes defined by this line. An unprofitable

    agent enters the other half plane. By observing the trajectory ofthe agent in real time, we can detect unprofitability by monitoringentry into this undesirable half-plane.

    In addition, for a day trading agent such as the agents partici-pating in the PLAT competitions, one of the requirements is that itmust unwind its position by the end of the day or risk the possi-bility of great losses overnight (due to market activity outside trad-ing hours). This acts as a boundary value constraint that implicitlyrestricts the maximum position taken during the day. Combiningthese two requirements, we specify that a trading rule, e.g., theContrarian strategy, should be invoked only as long as (Vn p0 vn + cn > 0) (cn > cnmin) (vnmin < vn < vnmax). Thistest encodes the intuitive idea of falling below the performanceof the simple random walk market and hence acts as a surrogate testfor safety, from the notation introduced in Section 2 for net valueand cash/share positions.

    Our approach to safe composition consists of ensuring that wehave a collection of trading rules that have complementary direc-tional properties and that we monitor their progress using the mech-anism described above in order to make changes. A simple agentthat embodies this approach is the the following: If(Vn p0 vn +cn > 0) (cn > cnmin) (vnmin < vn < vnmax) then invokethe Contrarian strategy, else Divest (i.e., reduce share position) us-ing a Safe Contrarian strategy (a modified version of the originalrule).This rule is kept deliberately simple to illustrate ideas. In later sec-tions, we will discuss more complex combinations. However, we

  • 7/29/2019 Designing safe, protable automated stock trading agents

    4/8

    Figure 1: Agent behavior represented along the cash-share axes

    note that even such a simple rule, constructed using the principlesoutlined above, is capable of impressive results. This strategy func-tions such that the Contrarian strategy is invoked and the agent isguaranteed non-negative returns. In a drifting market, the Contrar-ian rule is unprofitable and eventually the divestment strategy isinvoked, which brings the agent back towards the desirable region.

    This agent was submitted to the April 2004 PLAT competition.The experiment consisted of an economy of six agents that included

    the proposed agent, called SMM, four other competing agents withthe same trading objective as ours, and a VWAP agent intended toprovide liquidity. We used Sharpe ratio, a measure of risk adjustedreturns, as a means of evaluating the performance of these agents.The results of the live competition show that the SMM achievedconsistent profitability - as reflected by the Sharpe ratio that wasseveral multiples higher than that of other agents.

    Test Set SMM #1 #2 #3 #4

    Avg. Profit 2519 239 4725 1057 148

    Std. Dev. 1009 316 6551 1829 3413

    Sharpe ratio 2.49 0.76 0.72 0.58 0.04

    The primary benefit of the proposed monitoring mechanism is that

    the trading rule is able to adjust its risk by divestment, when marketconditions are unfavorable, in order to avoid excessive losses. Inlater sections, we will discuss combinations of several rules, someof them more sophisticated than the ones described above. How-ever, the foregoing discussion provides us reason to believe thatour approach and monitoring mechanism will allow us to achievea measure of robustness. With this encouraging evidence, we mayuse evolutionary optimization to aggressively search for higher lev-els of performance. This will be the focus of the remainder of thispaper.

    5. EVOLUTIONARY AGENT DESIGN

    Having understood the basic principles of composite agent de-sign, we apply them to the design of an evolutionary learning algo-rithm that uses optimization to determine the best weighting be-tween a set of rules. If each component rule yields a decision,Di {1, 0, 1} corresponding to {Sell, Do Nothing, Buy}, thena combined decision would be Dcomb =

    P

    iDi. The goal ofthe evolutionary optimization algorithm is to determine the best set

    of weight vectors, i that would optimize the fitness function in aneffort to maximize the performance of the trading agent. The aimof our design is to maximize profitability. In addition, we incor-porate our test for safety to bias the search algorithm towards safeand feasible regions. The evolutionary algorithm learns both thetrading decision and the trade volume simultaneously.

    The evolutionary learning process identifies the best strategy bysearching among the weights, to identify the right composite trad-ing rule. However, in practice, the competing objectives of max-imizing profits and managing risk to avoid penalties makes thislearning problem hard. To evaluate the benefit of bootstrappingsuch learning agents by providing the qualitative characterizationand allowing the exploration to focus on identifying quantitativeoptima, we performed several experiments. These, in turn, isolatethe various elements of the agent design. We use two variations ofthe general agent architecture. The first experiment is based on agenetic algorithm agent, called GAA. In this agent, i is encodedas binary strings and hence quantized to discrete values of volume.A variation of this agent called GAAMM is defined as,If (Vn p0 vn + cn > 0) (cn > cnmin) (vnmin