How to convert a JSON into a HDF5 file

You scraped a bunch of data from a cryptocurrency exchange API into JSON but you figured that it’s taking too much disk space ? Switching to HDF5 will save you some space and make the access very fast, as it’s optimized for I/O operations. The HDF5 format is supported by major tools like Pandas, Numpy and Keras, data integration will be smooth, if you want to do some analysis.

Flattening the JSON

Most of the time JSON data is a giant dictionary with a lot of nested levels, the issue is that HDF5 doesn’t understand that. If we take the below JSON:

json_dict = {'Name':'John', 'Location':{'City':'Los Angeles','State':'CA'}, 'hobbies':['Music', 'Running']}

The result will look like this in a DataFrame:

Nested DataFrame
Nested DataFrame

We need to flatten the JSON to make it look like a classic table:

Flatten DataFrame
Flatten DataFrame

We’re going to use the flatten_json() function (more info here):

def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

Loading into a HDF5 file

Now the idea is to load the flattened JSON dictionary into a DataFrame that we’re going to save in a HDF5 file.

I’m assuming that during scraping we appended each record to the JSON, so we have one dictionary per line:

def json_to_hdf(input_file, output_file):
    
    with pd.HDFStore(output_file) as store:
        with open(input_file, "r") as json_file:
            for i, line in enumerate(json_file):
                try:
                    flat_data = flatten_json(ujson.loads(line))
                    df = pd.DataFrame.from_dict([flat_data])
                    store.append('observations', df)
                except:
                    pass

Let’s break this down.

Line 3: we initialize the HDFStore, this is the HDF5 file, it’s handling the file writing and everything.

Lines 4 & 5: we open the file and read it line per line

Line 7: we transform the line into a JSON dictionary and then we flatten it

Line 8: we transform the flatten dictionary into a Pandas DataFrame

Line 9: we append this DataFrame into the HDFStore

Et voilà, you now have your data in a single HDF5 file, ready to be loaded for your statistical analysis or maybe to generate trading signals, remember, it’s optimized for Pandas and Numpy so it’ll be faster than reading from the original JSON file.

Trading with Coinbase Pro (GDAX) API in Python

Coinbase Pro (formerly known as GDAX) is one of the biggest cryptocurrency exchange, you can trade a large panel of cryptocurrencies against USD, EUR and GBP. I chose to trade on Coinbase Pro because it supports a lot of pairs and the liquidity is usually very good, we can easily implement an algorithmic trading strategy on this exchange.

The most traded currencies are:
– Bitcoin (BTC)
– Ethereum (ETH)
– yearn.finance (YFI)
– Litecoin (LTC)

The Setup

Fortunately for us, Coinbase Pro provides an API to get market data, to get balances for each currency and to send buy/sell orders to the market. You can find a documentation here.

I found a Python wrapper for their API on GitHub, this one is super easy to use.
You can install the package like this:

pip install cbpro

Once it’s installed, you need to insert the appropriate import in your code:

import cbpro

Now you need to get an API key in order to be able to retrieve your account balances and to send orders to the market. If you just want to get market data you can skip that part.
Go to https://pro.coinbase.com/profile/api , click on Create new key, now you have the API key and you may need to get some email validation to see the secret key (which you also need). Check the options you want, if you want to trade via the API, just select the appropriate check box, same for withdrawals.

Using the API

In your code, you need to set up the connection so that you can get authenticated:

auth_client = cbpro.AuthenticatedClient(key, b64secret, passphrase)

If you want to get market data for a ticker. Note that authentication is not required for this method:

auth_client.get_product_order_book('BTC-USD')

Now to send an order, it’s pretty simple:

# Buy 0.01 BTC @ 100 USD
auth_client.buy(price='100.00',#USD
size='0.01',#BTC
order_type='limit',
product_id='BTC-USD')

You’ll get a JSON object, with an id for the order that you can track using auth_client.get_fills(order_id=”d0c4560b-4e6d-41d9-e568-48c4bfca13e6″):

{
"id": "d0c4560b-4e6d-41d9-e568-48c4bfca13e6",
"price": "0.10000000",
"size": "0.01000000",
"product_id": "BTC-USD",
"side": "buy",
"stp": "dc",
"type": "limit",
"time_in_force": "GTC",
"post_only": false,
"created_at": "2020-11-20.T10:12:45.12345Z",
"fill_fees": "0.0000000000000000",
"filled_size": "0.00000000",
"executed_value": "0.0000000000000000",
"status": "pending",
"settled": false
}

To manage your risks, you’ll need to retrieve your balances:

balance = auth_client.get_accounts()
print("ETH="+str(balance[0]["balance"]))

With this basic API you can code any algorithmic strategy in Python for Coinbase Pro, you can try to predict the value of a cryptocurrency using our previous tutorials for example.

5 Mistakes To Avoid In Your Trading Strategy

#1 Not learning to code

This one is the most important, before starting anything you should learn about programming. Coding will make you assimilate a certain logic that’s close to mathematical formulas and can help you formalize your trading process. It’s essential to be able to understand everything that’s “under the hood”, what if you strategy starts to slow down after a few months and you’re not able to improve it yourself.

You won’t learn programming in a day, you should take your time to learn and understand the process. Fortunately, there are multiple free methods you can use to learn about Python. You can use websites like EDX, Coursera, and Udacity.

#2 Backtesting and training on the same period

Let’s say you found the perfect strategy that makes +300% in the 2014 period, you may want to backtest it on a different period, the strategy may work in that specific time but it could make you lose a lot on another period. This beginner mistake has a name: overfitting. Ideally you want to split your data set into at least 2 parts: train and test. But if you want to have a rock-solid performance, you can try K-Fold cross validation, it’ll split your data set into K parts, train 1 part and test it on the other ones, and so on.

#3 Not backtesting enough

Backtest, backtest and backtest. Use different time periods, adjust the trading size, the strategy could work by buying 100$ worth of stocks at a time but what if you want to scale it ? You could introduce slippage and of course broker fees.

Backtesting is good but paper trading is better, you should run the strategy in real-time but without any broker connection, this way you can simulate how it’s going to behave with current market situation.

#4 Not having a risk management strategy

Risk management is going to make a difference during bear markets or high-volatility periods. You can limit the maximum exposure and ignore any buying signal if you hit the limit, or automatically close any position older than a few days. These are suggestions, it’s important to make sure you won’t get stuck with a growing loss over time.

#5 Having unreliable data

Your strategy will be based on financial data, either real-time, minute or daily data, a single data point can destroy your profits. You need to make sure it’s coming from a reliable source and not some random websites, a good source is Quandl, some of their datasets are free.

Simple strategy backtesting using Zipline

Zipline is a backtesting engine for Python, if you’re a Quantopian member you should be familiar with it since it’s the one they’re using. It provides metrics about the strategy such as returns, standard deviations, Sharpe ratios etc. basically everything you need to know in order to validate or not a strategy before going live.

Zipline can be install using pip:

pip install zipline

If you’re on Windows I suggest using Conda:

conda install -c Quantopian zipline

Here is the basic structure of a strategy in Zipline:

from zipline.api import order, record, symbol
def initialize(context): pass
def handle_data(context, data): order(symbol('AAPL'), 10) record(AAPL=data.current(symbol('AAPL'), 'price'))

In initialize you can set some global variables used for the strategy such as a list of stocks, certain parameters, the maximum percentage of portfolio invested.
Then handle_data is entered at every tick, that’s where your strategy logic should be. You can check previous articles and incorporate strategies into your code.

Let’s breakdown the handle_data() code.

The order() function let you create an order, here we specify the AAPL ticker (Apple stock) with a quantity of 10. A positive value means you’re buying 10 stocks, a negative value would mean you’re selling the stock.

Then, the record() function allows you to save the value of a variable at each iteration. Here, you’re saving the current stock price under the variable named AAPL, you’ll then be able to retrieve that information in the backtest result, this way you can compare your strategy performance versus the stock price.

Now you want to finally backtest the strategy and see if it’s profitable. To do that, run the following command:

zipline run -f your_strategy.py --start 2015-1-1 --end 2020-1-1 -o your_strategy.pickle

This command is going to run the backtest between 2015-01-01 and 2020-01-01 and output the result into a pickle file for later analysis. The pickle is simply a Pandas DataFrame with a line per day and (a lot of) columns regarding your strategy, such as the return, the number of orders, the portofolio size and so on.

 

Will Bitcoin Ever Be Regulated?

This article by Vlad Andrei was originally published at Albaron Ventures

As Bitcoin and other digital assets continue to grow in adoption and popularity, a common topic for discussion is whether the U.S. government, or any government for that matter, can exert control of its use.

There are two core issues that lay the foundation of the Bitcoin regulation debate:

The digital assets pose a macro-economic risk. Bitcoin and other cryptocurrencies can act as surrogates for an international currency, which throws global economics a curveball. For example, countries such as Russia, China, Venezuela, and Iran have all explored using digital currency to circumvent United States sanctions, which puts the US government at risk of losing its global authority.
Bitcoin logo

International politics and economics are a very delicate issue, and often sanctions are used in place of military boots on the ground, arguably making the world a safer place.

The micro risks enabled by cryptocurrency weigh heavily in aggregate. One of the most attractive features of Bitcoin and other digital assets is that one can send anywhere between a few pennies-worth to billions of dollars of Bitcoin anywhere in the world at any time for a negligible fee (currently around $0.04 to $0.20 depending on the urgency.)

However, in the hands of malicious parties, this could be very dangerous. The illicit activities inherently supported by a global decentralized currency run the gamut: terrorist funding, selling and buying illegal drugs, ordering assassinations, dodging taxes, laundering money, and so on.

Can Bitcoin Even Be Regulated?

Before diving deeper, it’s worth asking whether Bitcoin can be regulated in the first place.

The cryptocurrency was built with the primary purpose of being decentralized and distributed– two very important qualities that could make or break Bitcoin’s regulation.

By being decentralized, Bitcoin doesn’t have a single controlling entity. The control of Bitcoin is shared among several independent entities all over the world, making it nearly impossible for a single entity to wrangle full control over the network and manipulate it as they please.

By being distributed, Bitcoin exists at many different locations at the same time. This makes it very difficult for a single regulatory power to enforce its will across borders. This means that a government or other third party can’t technically raid an office and shut anything down.

That being said, there are several chokepoints that could severely hinder Bitcoin’s adoption and use.

1. Targeting centralized entities: exchanges and wallets

A logical first move is to regulate the fiat onramps (exchanges) , which the United States government has finally been getting around to. In cryptocurrency’s nascent years, cryptocurrency exchanges didn’t require much input or approval from regulatory authorities to run. However, the government started stepping in when cryptocurrency starting hitting the mainstream.

The SEC, FinCEN (Financial Crimes Enforcement Network), and CFTChave all played a role in pushing Know Your Customer (KYC) protocols and Anti-Money Laundering (AML) policies across all exchanges operating within U.S borders.

Cryptocurrency exchanges have no options but to adhere to whatever the U.S. government wants. The vast majority of cryptocurrency users rely on some cryptocurrency exchange to utilize their cryptocurrency, so they will automatically bend to exchange-imposed regulation.

Regulators might not be able to shut down the underlying technology that powers Bitcoin, but they can completely wreck the user experience for the great majority of cryptocurrency users, which serves as enough of an impediment to diminish the use of cryptocurrency for most.

2. Targeting users

The government can also target individual cryptocurrency users. Contrary to popular opinion, Bitcoin (and even some privacy coins) aren’t anonymous. An argument can be made that Bitcoin is even easier to track than fiat because of its public, transparent ledger.

Combined with every cryptocurrency exchange’s willingness to work with U.S. authorities, a federal task force could easily track money sent and received from certain addresses and pinpoint the actual individual with it. Companies such as Elliptic and Chainalysis have already created solid partnerships with law enforcement in many countries to track down illicit cryptocurrency uses and reveals the identities behind the transactions.

Beyond that, we dive into the dark web and more professional illicit cryptocurrency usage. Although trickier, the government likely has enough cyber firepower to snipe out the majority of cryptocurrency-related cybercrime. In fact, coin mixers (cryptoMixer.io), coin swap services (ShapeShift) and P2P bitcoin transactions (localbitcoins.com) have been investigated for several years now and most of them have had to add KYC and adhere to strict AML laws.

Final Thoughts

Ultimately, it’s going to take a lot to enforce any sort of significant global regulation on Bitcoin, with the most important factor being a centralization and consensus of opinion. The majority of the U.S. regulatory alphabet agencies fall into the same camp of “protect the good guys, stop the bad guys”, but there isn’t really a single individual piece of guidance to follow. Currently, cryptocurrencies are regulated in the US by several institutions: CFTC, SEC, IRS, making it difficult to create overarching regulatory guidelines.

In short, yes– Bitcoin can be regulated. In fact, its regulation has already started with the fiat onramps and adherence to strict KYC & AML laws. While in countries such as Ecuador, Bolivia, Egypt and Morocco Bitcoin ownership is illegal, in the US, it would take some bending of the moral fabric of the Constitution in order for cryptocurrency ownership rights to be infringed.

However, it cannot be shut down. There are still ways to buy, sell, and trade Bitcoin P2P, without a centralized exchange. It would take an enormous effort by any government to completely uproot something as decentralized as Bitcoin, but that future seems more dystopian than tangible.

Three Strategies for Choosing What Cryptocurrency to Invest in Next

This article by Steven Buchko was originally published at CoinCentral.com

Finding the Next Bitcoin

You’ve probably thought it at one point or another: “I missed the Bitcoin payday. How do I decide what cryptocurrency to invest in now that I know about the market?”

The bad news: It’s unlikely that any other cryptocurrency will see the same astronomical growth that Bitcoin experienced over the last few years, and impossible to predict it.

The good news: There’s still plenty of opportunities to invest in up-and-coming cryptocurrencies that could potentially bring you 10-100x returns. This comes with a heavy note of caution, because as you may know, cryptocurrencies are incredibly volatile. This is not investment advice, and you should gain/lose money on your own research and intuition.

In this article, we’ll go over some basic strategies you can follow when searching for what cryptocurrency to invest in next. We’re focusing on high risk, high reward options here. If you’re looking for general investment tips, you should check out our article on how to build a proper cryptocurrency portfolio instead.

Scour Initial Coin Offerings (ICOs)

Initial Coin Offerings (ICOs) have quickly become the standard for blockchain startups to raise funding for their project. In an ICO, the team hosts a crowdsale in which you purchase tokens that you can use on their platform. You can also trade these tokens in the secondary market (exchanges) after the ICO.

For example, Golem held an ICO to distribute the first GNT tokens. The purpose of these tokens is to purchase computing power in the Golem network, but traders also buy and sell them on exchanges.

Participating in ICOs can be a lucrative trading strategy. If you invested in the NEO crowdsale (at the time the project was called AntShares), your return on investment (ROI) would be ~160,000% currently. Populous, about 5,000%. OmiseGo, around 4,000%. You get the picture.
ICO ROIs

Source: ICOBench

ICO gains do come with the highest amount of risk, though. The majority of ICOs will fail, and already almost half have done so already.

ICO Research

It’s important that you do your due diligence when picking what cryptocurrency to invest in pre-ICO. There are a ton of things to look at when evaluating a cryptocurrency, but the most important attributes are:

Team and advisors – The team should have experience in blockchain technology or at least the industry that they’re targeting. Preferably both. Having reputable advisors is also a strong sign that the ICO could succeed.
Clear problem/solution – The project’s white paper should clearly define what problem the project is aiming to solve and how the cryptocurrency solves it. Make sure it’s not just a document full of marketing BS.
Token distribution – The team should be distributing over fifty percent of the tokens to crowdsale participants if not much, much more. Be hesitant about projects in which the team and advisors keep a significant proportion of tokens.

Other things to take note of are: any notable partnerships, whether the team has already created a product, and the size of the industry they’re targeting. All of these things could lead to a favorable investment.

Check Lesser Known Exchanges

Even if you missed your chance to participate in an interesting ICO, you can still invest once the coin hits exchanges. At this time, there’s often a brief spike followed by an immediate dump as ICO investors look to cash-in on short-term gains. This is a prime opportunity to get coins you’re interested in for ICO-level (or even lower) prices.

Beyond the short post-ICO period, you still have time to invest in a coin before major exchanges begin to list it. Cryptopia and decentralized exchanges such as IDEX are goldmines for these types of coins. The same research strategies mentioned above apply to coins in this category as well.

IDEX Exchange

Search through coins with a small market cap (<$100 million) that haven’t been listed on a large exchange like Binance yet. You can check CoinMarketCap to see which exchanges coins are on. Make sure you research appropriately and find coins that you believe to have solid fundamentals.

Once you’ve found a coin you’re confident in, purchase it, and (this is the hardest part) wait. It could take days, weeks, or even months for your coin to reach a respectable amount of awareness. If you truly believe in the fundamentals of the coin, though, this timeframe shouldn’t matter. Once the coin joins a major exchange, feel free to trade it accordingly.

Time Important Events

Another popular strategy in selecting what cryptocurrency to invest in is to choose coins based on project roadmaps and event calendars. This is a short-term strategy and usually much harder to execute than the other ones that we’ve covered.

The price of cryptocurrency tends to rise after an important partnership announcement or development milestone. If you follow certain projects on Twitter or are active in their Telegram channel, you usually find out about these announcements ahead of the less involved general public.

With that information, you can sometimes buy into a project early and ride the wave up following the announcement. This has some potential downsides, though. Correct timing is incredibly difficult to accomplish. And, in a bear market, even the most impressive announcements can get crushed under the negative sentiment.

Additionally, the rest of the market may not react to the news the way that you expect. A recent example of this is Verge’s PornHub partnership announcement. While some supporters saw this as positive news, the majority of the market didn’t, and the price crashed accordingly.

Stay Vigilant

Most importantly, you just need to stay vigilant when looking for what cryptocurrency to invest in. New investment opportunities occur every day when you’re actively looking for them. Join subreddits, follow crypto traders on Twitter, constantly research new projects – in essence, engulf yourself in the blockchain space. You never know what gems you’ll stumble upon.

How to use a Random Forest classifier in Python using Scikit-Learn

Random Forest is a powerful machine learning algorithm, it can be used as a regressor or as a classifier. It’s a meta estimator, meaning it’s using a specified number of decision trees to fit and predict.

We’re going to use the package Scikit-Learn in Python, it’s a very useful library which contains a lot of machine learning algorithms and related tools.

Data preparation

To see how Random Forest can be applied, we’re going to try to predict the S&P 500 futures (E-Mini), you can get the data for free on Quandl. Here is what it looks like:

Date Open High Low Last Change Settle Volume Previous Day Open Interest
2016-12-30 2246.25 2252.75 2228.0 2233.5 8.75 2236.25 1252004.0 2752438.0
2016-12-29 2245.5 2250.0 2239.5 2246.25 0.25 2245.0 883279.0 2758174.0
2016-12-28 2261.25 2267.5 2243.5 2244.75 15.75 2245.25 976944.0 2744092.0

The column Change needs to be removed since there’s missing data and this information can be retrieved directly by substracting D close and D-1 close.

Since it’s a classifier, we need to create classes for each line: 1 if the future went up today, -1 if it went down or stayed the same.

import numpy as np
import pandas as pd

def computeClassification(actual):
if(actual &amp;gt; 0):
return 1
else:
return -1

data = pd.DataFrame.from_csv(path='EMini.csv', sep=',')

# Compute the daily returns
data['Return'] = (data['Settle']/data ['Settle'].shift(-1)-1)*100

# Delete the last line which contains NaN
data = data.drop(data.tail(1).index)

# Compute the last column (Y) -1 = down, 1 = up
data.iloc[:,len(data.columns)-1] = data.iloc[:,len(data.columns)-1].apply(computeClassification)

Now that we have a complete dataset with a predictable value, the last colum “Return” which is either -1 or 1, let’s create the train and test dataset.

testData = data[-(len(data)/2):] # 2nd half
trainData = data[:-(len(data)/2)] # 1st half

# X is the list of features (Open, High, Low, Settle)
data_X_train = trainData.iloc[:,0:len(trainData.columns)-1]
# Y is the value to be predicted
data_Y_train = trainData.iloc[:,len(trainData.columns)-1]

# Same thing for the test dataset
data_X_test = testData.iloc[:,0:len(testData.columns)-1]
data_Y_test = testData.iloc[:,len(testData.columns)-1]

Using the algorithm

Once we have everything ready we can start fitting the Random Forest classifier against our train dataset:

from sklearn import ensemble

# I picked 100 randomly, we'll see in another post how to find the optimal value for the number of estimators
clf = ensemble.RandomForestClassifier(n_estimators = 100, n_jobs = -1)
clf.fit(data_X_train, data_Y_train)

predictions = clf.predict(data_X_test)

predictions is an array containing the predicted values (-1 or 1) for the features in data_X_test.
You can see the prediction accuracy using the method accuracy_score which compares the predicted values versus the expected ones.

from sklearn.metrics import accuracy_score

print "Score: "+str(accuracy_score(data_Y_test, y_predictions))

What’s next ?

Now for example you can create a trading strategy that goes long the future if the predicted value is 1, and goes short if it’s -1. This can be easily backtested using a backtest engine such as Zipline in Python.
Based on your backtest result you could add or remove features, maybe the volatility or the 5-day moving average can improve the prediction accuracy ?

Using matplotlib to identify trading signals

Finding trading signals is one of the core problems of algorithmic trading, without any good signals your strategy will be useless. This is a very abstract process as you cannot intuitively guess what signals will make your strategy profitable or not, because of that I’m going to explain how you can have at least a visualization of the signals so that you can see if the signals make sense and introduce them in your algorithm.

We’re going to use matplotlib to graph the asset price and add buy/sell signals on the same graph, this way you can see if the signals are generated at the right moment or not: buy low, sell high.

Data preparation

For this tutorial I picked a very simple strategy which is a crossing moving average, the idea is to buy when the “short” moving average, let’s say 5-day is crossing the “long” moving average, let’s say 20-day, and to sell when they cross the other way.

First of all, we need to install matplotlib via the usual pip:

pip install matplotlib

This example requires pandas and matplotlib:

import pandas as pd
import matplotlib.pyplot as plt

I’m using the E-mini future dataset from Quandl, see this article.

Loading data and computing the moving averages is pretty trivial thanks to Pandas:

data = pd.DataFrame.from_csv(path='EMini.csv', sep=',')

# Generate moving averages
data = data.reindex(index=data.index[::-1]) # Reverse for the moving average computation
data['Mavg5'] = data['Settle'].rolling(window=5).mean()
data['Mavg20'] = data['Settle'].rolling(window=20).mean()

Now the actual signal generation part is a bit more tricky:

# Save moving averages for the day before
prev_short_mavg = data['Mavg5'].shift(1)
prev_long_mavg = data['Mavg20'].shift(1)

# Select buying and selling signals: where moving averages cross
buys = data.ix[(data['Mavg5'] &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;= data['Mavg20']) &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; (prev_short_mavg &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;= prev_long_mavg)]
sells = data.ix[(data['Mavg5'] &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;= data['Mavg20']) &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; (prev_short_mavg &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;= prev_long_mavg)]

buys and sells is now containing all dates where we have a signal.

Plotting the signals

The interesting part is the graphing of this, the syntax is simple:

plt.plot(X, Y)

We want to display the E-Mini price and the moving averages is pretty simple, we use data.index because the dates in the DataFrame are in the index:

# The label parameter is useful for the legend
plt.plot(data.index, data['Settle'], label='E-Mini future price')
plt.plot(data.index, data['Mavg5'], label='5-day moving average')
plt.plot(data.index, data['Mavg20'], label='20-day moving average')

But for the signals, we want to put each marker at the specific date, which is in the index, and at the E-Mini price level so that visually it’s not too confusing:

plt.plot(buys.index, data.ix[buys.index]['Settle'], '^', markersize=10, color='g')
plt.plot(sells.index, data.ix[sells.index]['Settle'], 'v', markersize=10, color='r')

data.ix[buys.index][‘Settle’] means we take the ‘Settle’ field in the data DataFrame

plt.ylabel('E-Mini future price')
plt.xlabel('Date')
plt.legend(loc=0)
plt.show()

Here is the final result:

Conclusion

In conclusion, you can interpret this by noticing that most buying signals are at dips in the curve and selling signals are at local maximums. So our signal generation looks promising, however without a real backtest we cannot be sure that the strategy will be profitable, at least we can validate or not a signal.
The main advantage of this method is that we can instantly see if the signals are “right” or not, for example you can play with the short and long moving average, you could try 10-day versus 30-day etc. and in the end you can pick the right parameters for this signal.

Create a trading strategy from scratch in Python

To show you the full process of creating a trading strategy, I’m going to work on a super simple strategy based on the VIX and its futures. I’m just skipping the data downloading from Quandl, I’m using the VIX index from here and the VIX futures from here, only the VX1 and VX2 continuous contracts datasets.

Data loading

First we need to load all the necessary imports, the backtest import will be used later:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from backtest import backtest
from datetime import datetime

For the sake of simplicity, I’m going to put all values in one DataFrame and in different columns. We have the VIX index, VX1 and VX2, this gives us this code:

VIX = "VIX.csv"
VIX1 = "VX1.csv"
VIX2 = "VX2.csv"

data = []
fileList = []
# Create the base DataFrame
data = pd.DataFrame()

fileList.append(VIX)
fileList.append(VIX1)
fileList.append(VIX2)

# Iterate through all files
for file in fileList:
# Only keep the Close column
tmp = pd.DataFrame(pd.DataFrame.from_csv(path=file, sep=',')['Close'])

# Rename the Close column to the correct index/future name
tmp.rename(columns={'Close': file.replace(".csv", "")}, inplace=True)

# Merge with data already loaded
# It's like a SQL join on the dates
data = data.join(tmp, how = 'right')

# Resort by the dates, in case the join messed up the order
data = data.sort_index()

And here’s the result:

Date VIX VX1 VX2
02/01/2008 23.17 23.83 24.42
03/01/2008 22.49 23.30 24.60
04/01/2008 23.94 24.65 25.37
07/01/2008 23.79 24.07 24.79
08/01/2008 25.43 25.53 26.10

Signals

For this tutorial I’m going to use a very basic signal, the structure is the same and you can replace the logic with your whatever strategy you want, using very complex machine learning algos or just crossing moving averages.

The VIX is a mean-reverting asset, at least in theory, it means it will go up and down but in the end its value will move around an average. Our strategy will be to go short when it’s way higher than its mean value and to go short when it’s very low, based on absolute values to keep it simple.

high = 65
low = 12

# By default, set everything to 0
data['Signal'] = 0

# For each day where the VIX is higher than 65, we set the signal to -1 which means: go short
data.loc[data['VIX'] &gt; high, 'Signal'] = -1

# Go long when the VIX is lower than 12
data.loc[data['VIX'] &lt; low, 'Signal'] = 1

# We store only days where we go long/short, so that we can display them on the graph
buys = data.ix[data['Signal'] == 1]
sells = data.ix[data['Signal'] == -1]

Now we’d like to visualize the signal to check if, at least, the strategy looks profitable:

# Plot the VX1, not the VIX since we're going to trade the future and not the index directly
plt.plot(data.index, data['VX1'], label='VX1')
# Plot the buy and sell signals on the same plot
plt.plot(sells.index, data.ix[sells.index]['VX1'], 'v', markersize=10, color='r')
plt.plot(buys.index, data.ix[buys.index]['VX1'], '^', markersize=10, color='g')
plt.ylabel('Price')
plt.xlabel('Date')
plt.legend(loc=0)
# Display everything
plt.show()

The result is quite good, even though there’s no trade between 2009 and 2013, we could improve that later:

Backtesting

Let’s check if the strategy is profitable and get some metrics. We’re going to compare our strategy returns with the “Buy and Hold” strategy, which means we just buy the VX1 future and wait (and roll it at each expiry), this way we can see if our strategy is more profitable than a passive one.
I put the backtest method in a separate file to make the main code less heavy, but you can keep the method in the same file:

import numpy as np
import pandas as pd

# data = prices + dates at least
def backtest(data):
cash = 100000
position = 0
total = 0

data['Total'] = 100000
data['BuyHold'] = 100000
# To compute the Buy and Hold value, I invest all of my cash in the VX1 on the first day of the backtest
positionBeginning = int(100000/float(data.iloc[0]['VX1']))
increment = 1000

for row in data.iterrows():
price = float(row[1]['VX1'])
signal = float(row[1]['Signal'])

if(signal &gt; 0 and cash - increment * price &gt; 0):
# Buy
cash = cash - increment * price
position = position + increment
print(row[0].strftime('%d %b %Y')+" Position = "+str(position)+" Cash = "+str(cash)+" // Total = {:,}".format(int(position*price+cash)))

elif(signal &lt; 0 and abs(position*price) &lt; cash):
# Sell
cash = cash + increment * price
position = position - increment
print(row[0].strftime('%d %b %Y')+" Position = "+str(position)+" Cash = "+str(cash)+" // Total = {:,}".format(int(position*price+cash)))

data.loc[data.index == row[0], 'Total'] = float(position*price+cash)
data.loc[data.index == row[0], 'BuyHold'] = price*positionBeginning

return position*price+cash

In the main code I’m going to use the backtest method like this:

# Backtest
backtestResult = int(backtest(data))
print(("Backtest =&gt; {:,} USD").format(backtestResult))
perf = (float(backtestResult)/100000-1)*100
daysDiff = (data.tail(1).index.date-data.head(1).index.date)[0].days
perf = (perf/(daysDiff))*360
print("Annual return =&gt; "+str(perf)+"%")
print()

# Buy and Hold
perfBuyAndHold = float(data.tail(1)['VX1'])/float(data.head(1)['VX1'])-1
print(("Buy and Hold =&gt; {:,} USD").format(int((1+perfBuyAndHold)*100000)))
perfBuyAndHold = (perfBuyAndHold/(daysDiff))*360
print("Annual return =&gt; "+str(perfBuyAndHold*100)+"%")
print()

# Compute Sharpe ratio
data["Return"] = data["Total"]/data["Total"].shift(1)-1
volatility = data["Return"].std()*252
sharpe = perf/volatility
print("Volatility =&gt; "+str(volatility)+"%")
print("Sharpe =&gt; "+str(sharpe))

It’s important to display the annualized return, a strategy with a 20% return over 10 years is different than a 20% return over 2 months, we annualize everything so that we can compare strategies easily. The Sharpe Ratio is a useful metric, it allows us to see if the return is worth the risk, in this example I just assumed a 0% risk-free rate, if the ratio is > 1 it means the risk-adjusted return is interesting, if it’s > 10 it means the risk-adjusted return is very interesting, basically high return for a low volatility.
In our example we have a pretty nice Sharpe ratio of 4.6 which is quite good:

Backtest =&gt; 453,251 USD
Annual return =&gt; 38.3968478261%

Buy and Hold =&gt; 53,294 USD
Annual return =&gt; -5.07672097648%

Volatility =&gt; 8.34645515332%
Sharpe =&gt; 4.60037789945

Finally, we want to plot the strategy PnL vs the “Buy and hold” PnL:

plt.plot(data.index, data['Total'], label='Total', color='g')
plt.plot(data.index, data['BuyHold'], label='BuyHold', color='r')
plt.xlabel('Date')
plt.legend(loc=0)
plt.show()

The strategy perfomed very well until 2010 but then from 2013 the PnL starts to stagnate:

Backtest

Conclusion

I showed you a basic structure of creating a strategy, you can adapt it to your needs, for example you can implement your strategy using zipline instead of a custom bactktesting module. With zipline you’ll have way more metrics and you’ll easily be able to run your strategy on different assets, since market data is managed by zipline.
I didn’t mention any transactions fees or bid-ask spread in this post, the backtest doesn’t take into account all of this so maybe if we include them the strategy would lose money!

Load market data from Quandl

In the previous articles, we loaded market data from CSV files, the drawback is that we’d need to redownload the CSV file every day to get latest data. Why not get them directly from the source ? Quandl is a website aggregating market data from various sources: Yahoo Finance, CBOE, LIFFE among others.

Fortunately for us, Quandl has an API in Python which let you access its data. First of all, you’ll need to get your personal API key here, here is a basic code snippet:

import quandl

quandl.ApiConfig.api_key = 'YOUR_API_KEY'
VIXCode = "CHRIS/CBOE_VX1"

VX1 = quandl.get(VIXCode)

The quandl.get() method returns a Pandas data frame with the dates in the index and open/high/low/close data, this depends on the data source, you may get more information like volume etc.

In conclusion now you can directly work with that data frame, you can merge it with other data, apply some calculations and use it as an input in a machine learning algorithm. The main advantage is that you’ll always get the latest data, no need to redownload a file.