-
Takumi Feiyang Li authoredTakumi Feiyang Li authored
- Group 4 FIN 556 High Frequency Trading Final Report
- Teammates
- Introduction
- Project Description:
- Main objective of project:
- Tools and various sources for developing the data, strategy and back testing:
- Technologies
- Programming Languages:
- Software:
- Packages:
- Refencence:
- Git repo layout:
- Instruction for using project:
- Detailed Description of project:
- Kalman Filter:
- State extrapolation equation
- Kalman Gain equation
- Estimate uncertainty update
- Hidden Markov model:
- Markov model:
- Markov Model Mathematical Specification
- Hidden Markov Model Mathematical Specification:
- Filtering of Hidden Markov Models
- Strategy Studio Implementation and Result:
- Initial result
- Kalman Filter:
- Hidden Markov Model:
- Back testing results Visualisation:
- Final conclusion on the implementation of strategy:
- Postmortem Summary:
- Sirisha Pullabhatla:
- 1. What did you specifically do individually for this project?
- 2. What did you learn as a result of doing your project?
- 3. If you had a time machine and could go back to the beginning, what would you have done differently?
- 4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
- 5. What advice do you offer to future students taking this course and working on their semester long project. Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
- Victor Fan
- 1. What did you specifically do individually for this project?
- 2. What did you learn as a result of doing your project?
- 3. If you had a time machine and could go back to the beginning, what would you have done
- 4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
- 5. What advice do you offer to future students taking this course and working on their semester long project (be sides “start earlier”… everyone ALWAYS says that). Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
- Eric Liro
- 1. What did you specifically do individually for this project?
- 2. What did you learn as a result of doing your project?
- 3. If you had a time machine and could go back to the beginning, what would you have done differently?
- 4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
- 5. What advice do you offer to future students taking this course and working on their semester long project (be sides “start earlier”… everyone ALWAYS says that). Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
- Takumi Li
- 1. What did you specifically do individually for this project?
- 2. What did you learn as a result of doing your project?
- 3. If you had a time machine and could go back to the beginning, what would you have done differently?
- 4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
- 5. What advice do you offer to future students taking this course and working on their semester long project (be sides “start earlier”… everyone ALWAYS says that). Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
Group 4 FIN 556 High Frequency Trading Final Report
Teammates
Takumi Li - feiyang3@illinois.edu (Team Leader)
-
My name is Feiyang (Takumi) Li and I am a master student in CS at the University of Illinois at Urbana-Champaign, graduated in December 2022. I also have dual bachelor’s degrees in CS and Statistics from UIUC. I’m a resourceful student skilled in Software Engineering, Data Science and Quant trading, with a strong foundation in object-oriented programming, app development, math, and problem-solving. I’m also a team player offering a dynamic personality and a willingness to learn. In addition, I have earned the Bloomberg Market Concepts certificate to improve my skillsets in finance.
-
Feel free to reach me at feiyang.li@outlook.com or via LinkedIn: https://www.linkedin.com/in/takumi-feiyang-li/
Eric Liro - eliro2@illinois.edu
-
I am Eric Liro, a Master's Student in Computer Science at the University of Illinois at Urbana-Champaign. I am graduating in December 2022, and have taken courses in distributed systems, security, machine learning, databases, and communication networks. I have experience working in industry as a Software Development Intern at Amazon and Panasonic. My skills include multiple high-level languages such as Python, C++, Java, Javascript, and Kotlin. I also am experienced with frontend development and cloud platforms. My interests include infrastructure, finance, and cloud.
-
My LinkedIn is located here: https://www.linkedin.com/in/eric-liro/
Victor Fan - victorf4@illinois.edu
-
Victor is a Master's student studying Financial Engineering at the University of Illinois at Urbana-Champaign graduating in December 2022. Victor intends on becoming a quantitative researcher and trader and is still currently on the job hunt, Victor is proficient at Python, C++, and R and has worked on a Practicum Project in Fall 2022 with JPM Chase for pricing conditionally callable convertible bonds.
-
My Linkedln is https://www.linkedin.com/in/victor-fan-274a4a170/
Sirisha Pullabhatla - sirisha3@illinois.edu
-
I am pursuing my Masters at University of Illinois- Urbana Champaign in Financial Mathematics, majoring in Quantitative Finance and Mathematical computation of derivatives, graduating in May 2023. I have a Bachelors in Finance majoring in Finance and Mathematics minoring in Statistics. I have an experience of 7 years working with UBS as a Foreign Exchange Analyst for 4 years and Simple Derivatives pricing analyst in Franklin Templeton for 3 years. I love to solve puzzles, programming in Python, C++ and MATLAB. Sirisha also has 2 other Bachelors in Indian Classical Music and Violin.
-
Please feel free to reach out to me sirisha3@illinois.edu, my LinkedIn is www.linkedin.com/in/sirishapullabhatla .
Introduction
Project Description:
We have done a semester long project for Fin556 – “Algorithmic Market MicroStructure”, instructed by Professor David Lariviere.
Main objective of project:
-
Develop a high frequency trading strategy using nano-second data,
-
Develop the quantitative strategy, back test based on the data and visualize the performance of the strategy.
Tools and various sources for developing the data, strategy and back testing:
Data source:
The main data source for this project is IEX. We have adapted Professor Lariviere IEX data downloader and parser for the project.
Strategy Development:
We have used Strategy Studio as the base to develop and back test the trading strategies. It a proprietary software from RCM used for trading strategy development and testing. We have created new trading strategy inside the Strategy Studio, by implementing its interface (for example, inheriting the Strategy class). We then have back tested in Strategy Studio using the data from the Data Retrieval and parsing section, it creates three files orders, fills and profit – and – loss out of which we can know the performance of the strategy.
Visualizing:
We have adapted the Strategy Studio Back testing output and we have utilized packages to plot the data and know the performance by seeing the pictures.
Technologies
Programming Languages:
-
Git bash:
-
Git Bash is an application for Microsoft Windows environments which provides an emulation layer for a Git command line experience
-
We have used it for feeding the data into git and for automation.
-
Python:
-
We have used Python for developing our strategy initially and debug the issues we faced in the strategy.
-
For visualizing of the back test output, we have used python Matplotlib package for plotting the data.
-
C++:
-
We then developed our strategy in C++, Strategy Studio provides the interface in C++ and it allows us to implement various strategies in C++.
Software:
-
Strategy Studio:
-
RCM has sponsored their Strategy Studio for implementing and back testing our strategy with market data.
-
Eigen:
-
Eigen is an open-source C++ linear algebra library that supports the header only Kalman Filter library.
-
Jupyter notebook:
-
It is a light weight dev tool for python, we have developed both Kalman filter and Hidden Markov Model in Jupyter notebook.
-
Git lab:
-
We have used Gitlab to store all our strategy and data for the project.
-
Virtual Box/Vagrant:
-
We have used Virtual box to setup the virtual machine to use strategy studio. The vagrant box contained Software Studio and necessary environment.
-
VS code:
-
We have used VS code to SSH into vagrant to do devs, and develop our C++ code.
-
Operating systems:
-
We have used Ubuntu 20.04 as the operating system.
Packages:
Below are the python packages we have used to implement both Kalman Filer and HMM
-
Numpy
-
Pandas
-
Matplotlib
-
Pykalman
-
Seaborn
-
Hmmlearn.hmm
-
Gaussian HMM
-
Datetime
-
Warnings
-
Argparse
-
Glob
Refencence:
Special thanks to the Kalman Filter provided by Hayk Martiros.
https://github.com/hmartiro/kalman-cpp
Git repo layout:
├── HMM
│ ├── CvHMM.h
│ ├── HMM.py
│ ├── HMMStates.csv
│ └── Trades
│ └── ...
│
├── Kalman
│ ├── Kalman.py
│ └── Trades
│ └── ...
│
├── images
│ └── ...
├── Strategy
│ ├── Group4Strat.cpp
│ ├── Group4Strat.h
│ ├── HMMStates.csv
│ ├── Makefile
│ ├── automateBacktesting.sh
│ ├── kalman_filter.cpp
│ ├── kalman_filter.h
│ ├── HMM_Code
│ │ └── main.cpp
│ └── README.md
├── strategy_changes
│ ├── Makefile
│ ├── SimpleMomentumStrategy.cpp
│ ├── SimpleMomentumStrategy.h
│ ├── SimpleMomentumStrategy.o
│ ├── SimpleMomentumStrategy.so
│ ├── automateBacktesting.sh
│ ├── build_group4_strategy.sh
│ ├── build_momentum_strategy.sh
│ ├── go.sh
│ ├── run_group4_backtest.sh
│ ├── run_momentum_backtest.sh
│ └── README.md
├── visualization code
│ ├── FIN 556 visualization AAPL.ipynb
│ ├── FIN 556 visualization QQQ.ipynb
│ ├── FIN 556 visualization SPY.ipynb
│ ├── AAPL_run
│ │ └── ...
│ ├── SPY_run
│ │ └── ...
│ └── QQQ_run
│ └── ...
└── README.md
Instruction for using project:
To run our strategy, you need to follow below steps:
-
Download and install virtual box and vagrant
-
Clone professor's vagrant setup (https://gitlab.engr.illinois.edu/shared_code/strategystudioubuntu2004)
-
cd strategystudioubuntu2004 and clone this repo
-
Using the recursively cloned repo iexdownloadparser, download and parse market data for the desired back testing period and symbols. Details on directions of using the IEX downloader/parser can be found on the README.md under the project root directory.
-
Boot up vagrant machine and cd /vagrant/fin556/strategy and make
-
../backtest.sh to run the back test. Modify the script for the desired backtesting period and symbols
-
For visualizing, python codes are posted in the visualization code folder. Simply run the jupyter notebook and the visualization.
Detailed Description of project:
Kalman Filter:
Kalman filter is the most famous and widely used strategy in accurate prediction of a variable which cannot be measured directly. Kalman filter was first used in calculating the position of Apollo space rockets by NASA to make sure it was in right path.
How is it used in trading?
Well, we can say that Kalman filter is used to find any hidden arbitrage opportunities in many Futures market, and also predict the future stock price.
Mathematical terms of Kalman Filter:
It uses the concept of normal distribution and it gives us an idea about the accuracy of the estimate. As we already know the concepts of normal distribution, mean, variance. Mean μ and variance σ2
Normal distribution generates a bell shaped curve, through which we can visualize the data where One standard deviation contains 68.26% of the population, Two standard deviations contain 95.44% of the population while three contain 99.74%. Statistically speaking, bell curve give us the probability of expecting the value of a given observation between two points. Hence, using distribution functions, also called probability density functions (PDF), we can ‘predict’ with certain ‘confidence’. The formula for PDF is:
Kalman Filter is a type of prediction algorithm. Its success depends on estimated values and its variance from the actual values. In Kalman Filter, we assume that depending on the previous state, we can predict the next state. It doesn’t derive the equations, but it predicts the values of something which cannot be directly measured. Thus, there will obviously be some error in the predicted value and the actual value. If the system or process itself contains some errors, then it is called measurement noise. If the process when the measurement takes place has certain factors which are not taken into account, then it is called as process noise. For example, If we are predicting the Apollo Rocket’s position, and we could not know the wind position during the initial phase, then we will encounter some error between the actual location and the predicted location. Kalman Filter is used to reduce these errors and successfully predict the next state. Mr. Rudolf Kalman developed the status update equation taking into account three values, ie
-
True value
-
The estimated or predicted value
-
Measured value
Equation would be
- Current state estimated value= Predicted value of current state + Kalman Gain * ( measured value - predicted value of the state)
The cycle of prediction goes as below
Let us understand this equation, we can say that given the measured values we will take the average of the values to estimate the true value. To work this equation, we take one measurement which becomes the measured value. In the initial step, we guess the predicted value. Now since the average is computed, in this example, the Kalman gain would be (1/N) as with each successive iteration, the second part of the equation would be decreasing, thus giving us a better-estimated value. We should note that the current estimated value becomes the predicted value of the current state in the next iteration. For now, we knew that the actual weight is constant, and hence it was easy to predict the estimated value. But what if we had to take into account that the state of the system (which was the weight in this case) changes. For that we will now move on to the next equation in the Kalman Filter tutorial i.e. State extrapolation.
State extrapolation equation
The state extrapolation system helps us to find the relation between the current state and the next state i.e. predict the next state of the system. Until now, we understood that the Kalman filter is recursive in nature and uses the previous values to predict the next value in a system. While we can easily give the formula and be done with it, we want to understand exactly why it is used. In that respect, we will take another example to illustrate the state extrapolation equation. Now, let’s take the example of a company trying to develop a robotic bike. If you think about it, when someone is riding a bike, they have to balance the bike, control the accelerator, turn etc. Let’s say that we have a straight road and we have to control the bike’s velocity. For this, we would have to know the bike’s position. As a simple case, we measure the wheels’ rotation to predict how much the bike has moved. We remember that the distance travelled by an object is equal to the velocity of the object multiplied by the time travelled. Now, Let’s suppose we measure the rotation at a certain instant of time, ie Δt. If we say that the bike has a constant velocity v, then we can say the following:
The predicted position of the bike is equal to the current estimated position of the bike + the distance covered by the bike in time Δt. Here the distance covered by the bike will be the result of Δt multiplied by the velocity of the bike. Suppose that the velocity is kept constant at 2 m/s. And the time Δt is 5 seconds. That means the bike moves 10 metres between every successive measurement. But what if we check the next time and find out the bike moved 12 metres. This gives us an error of 2 metres. This could mean two things,
-
The device used to measure the velocity has error (measurement error)
-
The bike is moving with different velocities, in this instance maybe it is a downhill slope (process error)
We try to find out how to minimise this error by having different gains to apply to the state update equation. Now, we will introduce a new concept to the Kalman filter tutorial, ie the α - β filter. Now, if we recall the status update equation, it was given as,
Current state estimated value= Predicted value of current state + Kalman Gain * ( measured value - predicted value of the state)
We will say that α is used to reduce the error in the measurement, and thus it will be used to predict the value of the position of the object. Now if we keep the α in place of the Kalman gain, you can deduce that a high value of α gives more importance to the measured value and a low level of α gives less weightage to the measured value. In this way, we can reduce the error while predicting the position. Now, if we assume that the bike is moving with different velocities, we would have to use another equation to compute the velocity and which in turn would lead to a better prediction to the position of the bike. Here we use β in place of Kalman gain to estimate the velocity of the bike. We tried to see the relation of how α and β impact the predicted value. But how do we know for sure the correct value of α and β in order to get the predicted value closer to the actual value, ie the Kalman Gain equation.
Kalman Gain equation
The errors, whether measurement or process, are random and normally distributed in nature. In fact, taking it further, there is a higher chance that the estimated values will be within one standard deviation from the actual value. Now, Kalman gain is a term which talks about the uncertainty of the error in the estimate. Put it simply, we denote ρ as the estimate uncertainty. Since we use σ as the standard deviation, we would denote the variance of the measurement σ2 due to the uncertainty as ⋎. Thus, we can write the Kalman Gain as,
Kalman gain= (Uncertainty in estimate) ÷ (Uncertainty in estimate + Uncertainty in measurement)
In the Kalman filter, the Kalman gain can be used to change the estimate depending on the estimate measure. Since we saw the computation of the Kalman gain, in the next equation we will understand how to update the estimate uncertainty. Before we move to the next equation in the Kalman filter tutorial, we will see the concepts we have gone through so far. We first looked at the state update equation which is the main equation of the Kalman filter. We further understood how we extrapolate the current estimated value to the predicted value which becomes the current estimate in the next step. The third equation is the Kalman gain equation which tells us how the uncertainty in the error plays a role in calculating the Kalman gain.
Estimate uncertainty update
As we know that with every successive step, the Kalman Filter continuously updates the predicted value so that we get the estimated value as close to the actual value of a variable, thus, we have to see how this uncertainty in the error can be reduced. While the derivation of the equation is lengthy, we are only concerned about the equation. Thus, the estimate uncertainty update equation tells us that the estimate uncertainty of current state varies from the previous estimate uncertainty by the factor of (1 - Kalman gain). We can also call this the covariance update equation.
Hidden Markov model:
Markov model:
A Markov Model is a stochastic state space model involving random transitions between states where the probability of the jump is only dependent upon the current state, rather than any of the previous states. The model is said to possess the Markov Property and is "memoryless". Random Walk models are another familiar example of a Markov Model. Markov Models can be categorised into four broad classes of models depending upon the autonomy of the system and whether all or part of the information about the system can be observed at each state.
If the model is still fully autonomous but only partially observable then it is known as a Hidden Markov Model. In such a model there are underlying latent states (and probability transitions between them) but they are not directly observable and instead influence the "observations". An important point is that while the latent states do possess the Markov Property there is no need for the observation states to do so. The most common use of HMM outside of quantitative finance is in the field of speech recognition.
Markov Model Mathematical Specification
In quantitative finance the analysis of a time series is often of primary interest. Such a time series generally consists of a sequence of T discrete observations X1, …, XT. An important assumption about Markov Chain models is that at any time t, the observation Xt captures all of the necessary information required to make predictions about future states. This assumption will be utilised in the following specification. Formulating the Markov Chain into a probabilistic framework allows the joint density function for the probability of seeing the observations to be written as:
This states that the probability of seeing sequences of observations is given by the probability of the initial observation multiplied T−1 times by the conditional probability of seeing the subsequent observation, given the previous observation has occurred. It will be assumed that the latter term, known as the transition function, p(Xt∣Xt−1) will itself be time-independent. In addition, since the market regime models considered in this series will consist of a small, discrete number of regimes (or "states"), say K, the type of model under consideration is known as a Discrete-State Markov Chain (DSMC). Thus if there are K separate possible states, or regimes, for the model to be in at any time t then the transition function can be written as a transition matrix that describes the probability of transitioning from state j to state at any time-step . Mathematically, the elements of the transition matrix A are given by (below), As an example it is possible to consider a simple two-state Markov Chain Model. The following diagram represents the numbered states as circles while the arcs represent the probability of jumping from state to state:
Hidden Markov Model Mathematical Specification:
The corresponding joint density function for the HMM is given by:
In the first line this states that the joint probability of seeing the full set of hidden states and observations is equal to the probability of simply seeing the hidden states multiplied by the probability of seeing the observations, conditional on the states. This makes sense as the observations cannot affect the states, but the hidden states do indirectly affect the observations. The second line splits these two distributions into transition functions. The transition function for the states is given by p(zt∣zt−1) while that for the observations (which depend upon the states) is given by p(xt∣zt). As with the Markov Model description above it will be assumed for the purposes that both the state and observation transition functions are time-invariant. This means that it is possible to utilise the K×K state transition matrix A as before with the Markov Model for that component of the model. However, for the application considered here, namely observations of asset returns, the values are in fact continuous. This means the model choice for the observation transition function is more complex. The common choice is to make use of a conditional multivariate Gaussian distribution with mean μk and covariance σk. This is formalised below:
That is, if the state zt is currently equal to k, then the probability of seeing observation xt, given the parameters of the model θ, is distributed as a multivariate Guassian.
Filtering of Hidden Markov Models
With the joint density function specified it remains to consider the how the model will be utilised. In general state-space modelling there are often three main tasks of interest: Filtering, Smoothing and Prediction. They will be repeated here for completeness:
-
Prediction
-
Filtration
-
Smoothing
Filtering and smoothing are similar, but not identical. Smoothing is concerned with wanting to understand what has happened to states in the past given current knowledge, whereas filtering is concerned with what is happening with the state right now. Mathematically the conditional probability of the state at time t given the sequence of observations up to time t is the object of interest. This involves determining p(zt∣x1:T). We have used Kalman filter for the process of prediction, filtration and smoothing the data.
Strategy Studio Implementation and Result:
-
For Strategy Studio, due to the constraints of running in C++, we needed to convert a lot of our previous code which were primarily done in python. To do this first we utilized a Kalman Filter implementation in C++ by Hayk Martirosyan, to natively do the Kalman filter in the C++ code. Using this we were able to implement the signal derived from utilizing trade prices to calculate the Kalman Mean. This was done by feeding in the trades data natively in strategy studio through the usage of the OnTrades() function in strategy studio. Through this we were able to use a 1 state 1 measurement Kalman filter to use on trade prices. This way we could calculate the Kalman mean immediately and use it as a predictor on the fair price of the instrument. Using this Kalman mean we compared it with resting orders on the order book, if the Kalman mean is greater than the current best ask (best price we are able to buy at), we think the fair price is larger than the best price we can buy it, so we should buy, we place an round lot order of 100 for that price. The same is done for the best bid, with us selling if the Kalman mean is less than the best bid. When the Kalman mean exists between the bid ask spread. We seek to empty our positions and will send an order to either sell if we are long or buy if we are short till we are at a position of 0. We keep track of orders by adding the order_id to a map, we can use this to check and alter our order if we choose to take the other side of the trade or want to go neutral after the fact. We keep track of our positions in the market and we cap out our exposure to any one instrument to a position of 500, due to odd lots from partial fills, we actually see potential slight overages over 500 at select points. We seek to only have at most one order in the book for each instrument that we trade, and we change the order depending upon our new view.
-
For the HMM based on our research and python analysis, we planned on using the Hidden Markov Model to do Regime Detection to augment our Kalman strategy, with the Hidden Markov model seeking to categorize high volatility (potential for downturn) in state 1 and a much more stable (Normal) state 0. We planned on using this signal as an on/off switch for the Kalman filter as state 1 represents unpredictability that the Kalman will likely not do well in and avoid the downturns. We struggled to implement the HMM in the C++, we originally were planning on using a OpenCV implementation of HMM that we had found online, however after trying for a while to get it to work within the VM, we realized that the OpenCV implementation was not actually compatible with what we needed to do anyways due to the implementation only taking discrete values. We were also running into weird issues including other potential implementations of HMM in C++ into the VM and to run within Strategystudio. We finally settled on running all of the data through the HMM code in python and printing the state decisions from the HMM into a .csv file and reading that at runtime as the states at runtime during Strategystudio.
Initial result
We have initially tested both the Kalman filter prediction and Hidden Markov on Yahoo data to check if our strategy works well, we have the below results which have helped us decide on whether taking this strategy for project.
Kalman Filter:
Hidden Markov Model:
Back testing results Visualisation:
- This is the cumulative Profits and Lost (PnL) for QQQ data from 2021/01/04 to 2022/07/01. Unfortuanately, our strategy did not do well, and we lost $-4467352.45 at the end. However, we noticed a jump in profits around 2022/02/14.
- We zoom in to the month around 2022/02/14, and we found out that there was a profit of roughtly $100,000 made on the date of 2022/02/16, which is our higest profits in the entire back testing period.
- To understand of our strategy is actually trading, we also ploted the cumulative postion of the entire back testing period. However, the plot for the entire period looks like a block that we cannot observe much information.
- We tried to plot the cumulative postion of only one month of 2022/02/01 - 2022/02/28. We can observe that the postion is cleared overnight, and the postion is always between 0 and 600.
- We tried to plot the cumulative postion of just only day of 2022/02/07, we can confirm the above observation of the postion is always between 0 and 600. No other particular interesting pattern seems to exists.
- We finally tried to plot 3 hours of cumulative position between 14:30 and 17:00 on the date of 2022/02/07. We can observe that there are partially filled order (not whole hundreds), and the following order always try to fill the order to the nearest whole hundreds position.
-
We plotted the traded price during the entire back testing period.
-
We noticed that there are some trades executed at a much lower price, indicated by the red arrows.
-
We assume these may be caused by data issues or edge cages of the back testing frameworks.
- We also plotted the trades counts (sum by 15 trading days), and we observe that the strategy had an obervious increase in the number of trades during the period from 2021/11/15 to 2022/04/15.
- We have took a data of one and half year from 01 Jan 2021 to 25 May 2022 for testing our results and we have implimented Kalman filter strategy for back testing and we can see that our strategy has performed bad. We find that the difference in discrete and continuous data would have made the strategy work in different way.
- We zoom in to the month around 2021/11/06, and we found out that there was a profit of roughtly $100,000 made on the date of 2021/11/10, which is our higest profits in the entire back testing period.
- We finally tried to plot 3 hours of cumulative position between 14:30 and 17:00 on the date of 2022/02/07. We can observe that there are partially filled order (not whole hundreds), and the following order always try to fill the order to the nearest whole hundreds position.
- We tried to plot the cumulative postion of only one month of 2021/02/22 - 2021/02/26. We can observe that the postion is cleared overnight, and the postion is always between 0 and 600.
- As we can see that SPY has a downward trend from 2021 Jan to 2022 May, this have effected our strategy and has potentially pulled us into losses.
- We have plotted the tick data of IEX with a gap of 15 days including all the prices of the Index and we can see that there is a lot of difference in the amount of data which was available for backtesting and these fluctuations and discontinuity in data may have resulted in failing the strategy.
- This is the cumulative Profits and Lost (PnL) for AAPL data from 2021/01/04 to 2022/07/01. Unfortuanately, our strategy did not do well, and we lost arount $2600000 at the end.
- This is a smooth down-trending line compared of that of QQQ and SPY, without a jump in profits during some periods.
- We tried to plot the cumulative postion of only one day of 2022/02/07. We can also observe that the postion is cleared overnight, and the postion is always between 0 and 600.
-We finally tried to plot 3 hours of cumulative position between 14:30 and 17:00 on the date of 2022/02/07. No other particular interesting pattern seems to exists.
-
We plottedthe traded price during the entire back testing period.
-
We noticed that there are some trades executed at a much lower price, similar to QQQ and SPY.
-
We assume these may be caused by data issues or edge cages of the back testing frameworks.
- We also plotted the trades counts (sum by 15 trading days), and we observe that the strategy had an obervious increase in the number of trades during the period from 2021/11/15 to 2022/04/15 as well.
Final conclusion on the implementation of strategy:
-
The general procedure for our project is as follows: we started out with research into algorithms we had heard about and initially believed were viable for trading. We then compiled the research into the mathematical algorithms and thought about how to tackle the solution for trading. We then tested various basic strategies to get a feel of trading strategies and see how they are implemented in general. Afterwards, we started implementing the chosen algorithms into a Python file for basic testing. We started this process with Yahoo! Finance data and then tried to tune parameters for that data to some extend. We later started downloading vast amounts of data from IEX and then started converting some of the algorithms to parse through that dataset. Additionally, we looking into the VM setup to see what would be necessary for testing in the Vagrant environment. Once we got the IEX data working with our algorithms, we added functionality to that code to allow for various parameters to be changed. Next, we started porting over our algorithms to actually strategies in C++ to allow for testing and started running through tests. We evaluated the PNL of the test runs and kept tuning parameters to see what changes would be worthwhile to make to result in a better and more reasonable PNL. We then worked on visualizing the data and showing what our results are.
-
Overall, our strategy runs well but we do not believe that the chosen algorithms provide a good way to minimize losses. We believe that the actual trades need to be tuned still to adjust how much we are trading and to ensure that the losses aren't too large. For instance, we are only trading in round lot increments, which might cause too significant changes in profits and losses, and should consider odd lots to have more granular trades. Additionally, we should look into other algorithms because fundamentally they may not be suitable for trading. For example, the Kalman implementation tracks data backwards, which may not be the best approach for fast trades. In hyperbolic terms, it monitors yesterday's trades to determine prices, which is not particularly useful for today's trading. Additionally, our implementation of the HMM falls victim to the same drawback, which means we ought to change the implementations and the choices in algorithms for the trading strategies. As a whole, it would be more beneficial to look into longer term trading strategies rather than to try and maximize short term profits, because in the long run we fall significantly short of our initial capital.
Postmortem Summary:
Sirisha Pullabhatla:
1. What did you specifically do individually for this project?
-
I worked on the Kalman filter research part. I have read a lot of papers to understand the mathematics behind the Kalman filter and how it works in real world and also in prediction of stock prices.
-
I have built the Kalman filter strategy in python and has tested if it works perfectly on stock price prediction, I have used the Yahoo data for initial testing.
-
I have developed initial visualisation for Kalman filter.
-
I have also tried using another strategy for predicting the stock price using stochastics Oscillator and MACD.
-
I have downloaded a part of the tick data we use for Backtesting (Jan to July 2021).
-
For documentation, I have written the whole project description and made everything look elegant.
2. What did you learn as a result of doing your project?
-
I have learnt how market microstructure works and high frequency trading concepts and how are they applied in the real trading firms.
-
I have acquired knowledge in developing our own strategy using parameters and how to implement them to work.
-
I have learnt a lot about the Kalman filter and Hidden Markov model and fascinated by the way they work in solving impossible things and they are useful to everyone using in their field.
-
I have learnt more packages in python which are unique and usage of them in different scenarios.
-
While searching for the right strategy for the product, I have learnt about many other strategies which can be adapted to predict things easily.
-
I have also learnt that it is not that easy to come up with a strategy in prediction of stock prices for a good portfolio.
-
I have gained experience in using Strategy Studio, Gitlab, Git bash, Linux and many other software which were a part of this project.
-
I have learnt how back testing is done with tick data and order books.
-
I have also learnt converting the models implemented in python into C++ and converting the Word document into Markdown file.
3. If you had a time machine and could go back to the beginning, what would you have done differently?
-
I could have researched on developing new strategies which are more quantitative in nature and which shows more accurate results.
-
I could have developed the Kalman Filter and HMM in python and C++ a little earlier so that we would have much more time on back testing the strategies.
-
I would have engaged with team and helped each other in figuring out things.
4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
I would have helped my team members in back testing more of the strategy and have meaningful visualizations. Although we have good strategy which have a base of many years to predicting the results, but we have tried it with the continuous data, we have the strategy showing the negative PnL and showed us loss of money. Due to lack in time, we were not able to develop the HMM strategy in C++ and have it working exactly, if we had time I could have helped more in building the syntax and back tested more.
5. What advice do you offer to future students taking this course and working on their semester long project. Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
If I would have talked to future students I will advise them to select a better strategy which can handle continuous data and also a strategy which is more accurate in predicting the prices. I will suggest them to start the project well in advance from start of the semester and work on it almost every week by meeting team members and professor to discuss on further steps and how can they improve their strategy, take help on the setup of VM and back testing well in advance from professor, as to have an adequate amount of time to check their results of strategies and change their strategy if needed or implement more strategies which helps them to reach their goal easily. I would advise all the teams to divide work on the implementation according their area of expertise and work together by setting deadlines and updating the strategy every weekly to know where they are and help others to meet the deadlines.
Victor Fan
1. What did you specifically do individually for this project?
I worked on the HMM side of the project, I did research into HMM’s and their potential usage in financial fields. I worked on implementing an HMM in python at first using yahoo data and then using IEX trades data as a way of detecting Regime switches in the market. I was also deeply involved with the Strategystudio C++ implementation and did most of the C++ programming for the strategy
2. What did you learn as a result of doing your project?
I gained experience in the pitfalls of strategy development and learned a lot of project management (version control). I also gained a lot of knowledge of working with high frequency financial data and as well as high frequency backtesting software (Strategystudio). I understand Hidden Markov Models and Kalman Filters much more now and can implement both.
3. If you had a time machine and could go back to the beginning, what would you have done
differently?
I would have spent more time looking into the different functionality built into Strategy Studio, there were a lot of functionality and built in analytics that I did not have time to explore. Also I would’ve been more proactive at debugging and trying to get as many people that can to get the backtesting software running. Another key would be to do validation early (before c++ implementation), check if signals are actually accurate in python before c++ implementations.
4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
I would continue to work to try to implement an continuous HMM package in C++ and try to get that working so that we could run HMM in real time throughout the backtest. I would also work to optimize or investigate other strategies to attempt to get a positive PnL.
5. What advice do you offer to future students taking this course and working on their semester long project (be sides “start earlier”… everyone ALWAYS says that). Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
Look into Strategy studio capabilities early (look at the example strategies and included files). Have as many people to get Strategy studio running as you can (Murphy’s Law is real and much more likely in this case to end with computer failure or obscure errors). Validate signals and strategies early, “Fail Fast” is real, figure out whether signals work or make sense so that you don’t have to implement into Strategy studio to find out it is terrible.
Eric Liro
1. What did you specifically do individually for this project?
-
I worked on doing research in the Kalman filter during the early stages of the project to see the algorithm behind it's smoothing, any necessary parameter changes, and how we can use it for our strategy.
-
I also implemented various trading strategies to visualize how well they work, and this was based on using Yahoo! Finance data. The strategies include MACD, Exponential Moving Average, Money Flow Index, Moving Average, and a Relative Strength Index.
-
I compiled the information and research I had on the Kalman filter approach into a document for reference later.
-
I worked on getting a Kalman filter up and running using Yahoo! Finance data for testing.
-
I got the VM and environment up and running for myself and was the go-to person for debugging VM issues. This took a while to fully understand because it included what to change the scripts to, what needed to be done to run a strategy, and how to interact with the Strategy Studio code.
-
I modified the data parsing for our code implementations and introduced a way to bin the data so as to provide a variety of options for reading in the data. Additionally I implemented a normalization algorithm to prevent any issues with the data itself.
-
I also got the Strategy compiled and fixed issues in development of the strategy. I then retrieved the data and got the code up and running to push the changes for visualization.
-
I wrote an interactive way to test our HMM and Kalman filter strategies on Python, allowing for a more convenient method of testing and checking our code.
2. What did you learn as a result of doing your project?
-
I learned that creating a viable trading strategy is not a trivial endeavor by any means. In particular, it seemed to be fine to make profits during bullish periods, but it seemed like losses were proportionally greater than bearish ones. This created issues because it is certainly difficult to weather difficult periods automatically and effectively.
-
Furthermore I learned about how to deal with the shell scripts and interacting with the command line including debugging issues and dealing with active problems during development.
-
I also learned quite a lot about the various algorithms for trading, and got an in-depth understanding of both Kalman filters and HMM.
3. If you had a time machine and could go back to the beginning, what would you have done differently?
-
If I could go back to the beginning, I would kick the tires on the project earlier. I spent a lot of time doing research into the Kalman filter strategy, but I think this was unnecessary to gain such an in-depth understanding of the algorithm given that we didn't end up writing our own implementation but rather just used a package for it. This would have allowed more time to try out different strategies and play with the paramters.
-
Additionally, I would have starting learning about the VM and its various parameters earlier to fully understand what would be necessary to try getting things running. This would certainly be useful as a significant portion of my time was debugging issues and getting to know the environment.
4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
If I was to continue working on the project, I would try to get openCV working in the VM environment so we could use its various packages, particularly HMM, to work on a real-time computation of the states as the trades occur and orders get filled. This would hopefully allow a better strategy as there'd be states for when our orders should be placed, which would be useful. Additionally, I would tune some of the parameters for the Kalman filter to try coming up with a better smoothing for the data. Finally, I would write a different shell script that could automate the generation of a new test because there are too many manual steps currently and it would be nice to write a simple shell script for running.
5. What advice do you offer to future students taking this course and working on their semester long project (be sides “start earlier”… everyone ALWAYS says that). Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
-
I would highly recommend having a good idea of what exactly the project will be. For instance, once choosing an algorithm or implementation for testing, it would be good to understand how it can relate to trading. In our group, once we got Kalman and HMM working on the data, we were then unsure for a day what exactly we should do with them now. This would be alleviated if we understood more what they are capable of and what we want to achieve.
-
I would also recommend keeping the research stage to a minimum and understand that in almost all cases, there will not be individual implementations of any esoteric algorithm.
-
While I would recommend to 'start earlier', I would specifically recommend downloading the data as early as possible and also learning the ins and outs of the VM as early as possible. This will save a tremendous amount of time later, and will be certainly worthwhile for implementing strategies later.
Takumi Li
1. What did you specifically do individually for this project?
-
I am the group leader - I lead the group to brainstorm the goals of the project, set up weekly meetings, reminded everyone about deadlines, recorded and upload group meetings, submit group assignments on everyone’s behalf.
-
During the first 4 weeks, I discussed various potential project topics and shared with teammates about resources to explore ideas. I also wrote parts of the project propose.
-
I worked on both HMM and Kalman filter research. I was not quite familiar with the concepts of these two algorithms, so I needed to learn about them via reading relevant research papers and YouTube videos.
-
During week 5 - 6, to better understand how Kalman filter works to predict the stock moves, I implemented a test code for myself using python and Project Jupyter. I didn’t have the data from IEX and that time and I decided to use data from Yahoo finance, and I also implemented visualization of 'actual stock price', 'predicted stock price', 'posterior estimation'. Codes are in the “Takumi-Kalman-test” branch, however, it was not used in production of this project so it was not merged to the main branch.
-
I am the one who is responsible for downloading, parsing, and cleaning the IEX data, and I initially provided data that the group needs in the “data-ready” branch, but later I switched to using box folder to share the parsed data for better convenience. I parsed a year and half (20210104 - 20220701) data of SPY, QQQ, and AAPL.
-
I tried to work with the C++ side because I had some previous experience. Understanding how the VM works and fixing bugs took me about two weeks, and I would like to appreciate Eric’s help with the process. However, the strategy implemented by our group failed to run on machine because of a “Caught signal 1” alert, and the code stops to run. Spending days debugging this, I and Eric could not resolve this issue, and thus we decide to only use Eric’s PC to run our strategy.
-
I implemented the visualization based on the strategy results data provided by Eric, and the codes are pushed to the “visualization code” folder in main branch. The images are sent to Victor to write descriptions and conclusions, and I merged those into the report.
-
For the project report in MarkDown, I cleaned up the doc sent by Sirisha, merged the summary sent by everyone, and inserted the images and Git repo layout. I also merged the codes (needed) from different braches into the main branch.
2. What did you learn as a result of doing your project?
-
I have gained a decent understanding in using the Kalman filter and Hidden Markov model to solve complex problems, and have found them to be valuable tools in a variety of fields.
-
I learned how to work with strategy studio and how to do backtest strategies. I also learn how to set-up the Virtual box and Vagrant.
-
I have gained knowledge about a range of other strategies that can be used to make predictions more easily. I have gained a comprehensive understanding of market microstructure and high frequency trading concept.
-
However, I have also learned that developing a successful strategy for predicting stock prices and creating a good portfolio is not easy, and a new strategy developed is likely to fail (loss money) without extensive back testing and live testing.
-
I learnt to write Markdown files for README, and how to merge branches.
-
Being a group leader, I learnt how to manage and set up a technical project meeting and monitor the develop process.
3. If you had a time machine and could go back to the beginning, what would you have done differently?
-I would get started on making VM working on my machine early, because it really slows down the timeline to get the project done.
-
I would probably spend more time brainstorming project topic, and try to find more interesting strategies.
-
I would get the VM set up on a SSD and save 2T of data on SSD, and probably save more data on an HDD and move to SSD parts by parts. I assume this would speed up parsing.
-
I would connect to teammate more frequently, and set up a more strict deadlines.
4. If you were to continue working on this project, what would you continue to do to improve it, how, and why?
-
If given more time, I would like to explore other strategy that can help the strategy with HMM and Kalman filter.
-
I would also like to seek professor’s help to debug the run time issue on my local machine, so that I can better help with tuning the parameters and improving the model.
5. What advice do you offer to future students taking this course and working on their semester long project (be sides “start earlier”… everyone ALWAYS says that). Providing detailed thoughtful advice to future students will be weighed heavily in evaluating your responses.
-
Spend more time on exploring interesting strategies and brainstorming trading ideas.
-
Make sure the VM works on your local machine early.
-
Do not be afraid to share your thoughts with your teammates or classmates from other groups.