Bitcoin volatility analysis

7 minute read

Published: January 08, 2017

Analysis of Bitcoin Volatility

Here I study whether Bitcoin’s volatility has been decreasing. It is widely reported that it has been the case, and indeed if one looks at Eli Dourado’s btcVol.info it seems to be the case. But is it the case?

Here there is a previous analysis by a friend of mine showing that using statistical test (Augmented Dickey-Fuller) to test whether a series is stationary, and the result is that the data is compatible with a stationary process. That is, the underlying distribution from which daily returns are drawn is constant across time.

In what follows, I try to replicate his study, showing some limitations of ADF.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
sns.set()
%matplotlib inline
from forex_python.bitcoin import BtcConverter
b = BtcConverter()
_=b.get_latest_price('USD')

start_date=datetime.datetime(2011,1,1)
end_date=datetime.datetime(2017,1,1)
r=b.get_previous_price_list('USD', start_date, end_date)

data=pd.DataFrame.from_dict(r,orient='index')
data.columns=['Price']
data.sort_index(inplace=True)
data.index=pd.to_datetime(data.index)
data.reset_index(inplace=True)
data.rename(columns={'index':'Date'},inplace=True)
data.head()

	Date	Price
0	2011-01-01	0.300
1	2011-01-02	0.300
2	2011-01-03	0.295
3	2011-01-04	0.299
4	2011-01-05	0.299

_=data.plot(x="Date",y="Price")

png

data['Daily change']=data['Price'].pct_change(1).multiply(100)
_=data.plot(title="Bitcoin daily change (%)",x='Date',y='Daily change')

png

data['Daily rolling 10']=data['Daily change'].rolling(window=10,center=False).std()
data['Daily rolling 30']=data['Daily change'].rolling(window=30,center=False).std()
data['Daily rolling 60']=data['Daily change'].rolling(window=60,center=False).std()
plt.figure(figsize=(12,6))
_=ax=data.plot(title="Bitcoin daily change (%)",y='Daily rolling 10',x='Date')
_=data.plot(y='Daily rolling 30',x='Date',ax=ax)
_=data.plot(y='Daily rolling 60',x='Date',ax=ax)

<matplotlib.figure.Figure at 0x7f2edc5fde80>

png

#First, we could do some curvefitting,
data['Date_n']=(data['Date']-data['Date'][0])/ np.timedelta64(1,'D')
fig=plt.figure(figsize=(10,8))
sns.regplot(data=data,x='Date_n',y='Daily rolling 10',scatter_kws={"s": 10,"alpha":0.5},label='Daily rolling 10')
sns.regplot(data=data,x='Date_n',y='Daily rolling 30',scatter_kws={"s": 10,"alpha":0.5},label='Daily rolling 30')
sns.regplot(data=data,x='Date_n',y='Daily rolling 60',scatter_kws={"s": 10,"alpha":0.5},label='Daily rolling 60')
plt.ylabel('Daily change (%)')
plt.legend()
import statsmodels.api as sm

y=data['Daily rolling 10']
X=data['Date_n']
mask = ~np.isnan(y)
X = X[mask]
y = y[mask]
model=sm.OLS(y,sm.add_constant(X),missing='drop').fit()
model.summary()

OLS Regression Results
Dep. Variable:	Daily rolling 10	R-squared:	0.178
Model:	OLS	Adj. R-squared:	0.178
Method:	Least Squares	F-statistic:	472.5
Date:	Sun, 08 Jan 2017	Prob (F-statistic):	5.36e-95
Time:	12:48:59	Log-Likelihood:	-5749.0
No. Observations:	2183	AIC:	1.150e+04
Df Residuals:	2181	BIC:	1.151e+04
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
const	6.9094	0.145	47.579	0.000	6.625 7.194
Date_n	-0.0025	0.000	-21.738	0.000	-0.003 -0.002

Omnibus:	812.011	Durbin-Watson:	0.072
Prob(Omnibus):	0.000	Jarque-Bera (JB):	3098.355
Skew:	1.818	Prob(JB):	0.00
Kurtosis:	7.565	Cond. No.	2.55e+03

png

Perhaps it would be better to use a nonlinear function, something like an exponential with a decreasing term like $f(x,a,b,c)=ae^{-xb}+c$ . For optimisation purposes, I noralise the series first.

from scipy.optimize import curve_fit
from sklearn.preprocessing import normalize
def normalize(X,Y):
    mask = ~np.isnan(Y)
    X2 = X[mask]
    y2 = Y[mask]
    return (X2-np.mean(X2))/np.std(X2),(y2-np.mean(y2))/np.std(y2)
def func(x,a,b,c):
    return a*np.exp(-x*b)+c
def plot(X2,y2,popt,n):
    plt.subplot(1,3,n)
    plt.plot(X2,func(X2,popt[0],popt[1],popt[2]))
    plt.plot(X2,y2)

y=data['Daily rolling 10']
X=data['Date_n']
X1,y1=normalize(X,y)
popt1, pcov =curve_fit(func,X1,y1)
fig=plt.figure(figsize=(12,6))
plot(X1,y1,popt1,1)
plt.title('Exponential fit for Daily rolling 10')
print("Parameters:",popt1)


y=data['Daily rolling 30']
X=data['Date_n']
X2,y2=normalize(X,y)
popt2, pcov =curve_fit(func,X2,y2)
plot(X2,y2,popt2,2)
plt.title('Exponential fit for Daily rolling 30')
print("Parameters:",popt2)

y=data['Daily rolling 60']
X=data['Date_n']
X3,y3=normalize(X,y)
popt3, pcov =curve_fit(func,X3,y3)
plot(X3,y3,popt3,3)
plt.title('Exponential fit for Daily rolling 60')
print("Parameters:",popt3)

plt.figure(figsize=(12,6))
plt.plot(X1,func(X1,popt1[0],popt1[1],popt1[2])-y1,label='Daily rolling 10')
plt.plot(X2,func(X2,popt2[0],popt2[1],popt2[2])-y2,label='Daily rolling 30')
plt.plot(X3,func(X3,popt3[0],popt3[1],popt3[2])-y3,label='Daily rolling 60')
plt.title('Residual plot')
plt.legend()

Parameters: [ 0.26753345  1.07601779 -0.4516436 ]
Parameters: [ 0.36183029  1.01534461 -0.57939952]
Parameters: [ 0.42689696  0.98383072 -0.66565455]

<matplotlib.legend.Legend at 0x7f2edc751748>

png

I would say that the fit looks right (It seems to be exponentially decreasing, and it seems that variance is decreasing) especially if we remove some of the ‘big’ events like bubbles, etc.

Now we bring on ADF (Which I don’t have experience using), checking the results we get for both a series with and without a trend. Here there is the documentation for this function. First I run some tests to see how the function behaves. The parameter of interest is the second in the returned array, a pseudo p-value. I’ll take it so that if p-value<0.05, we can’t reject the null hypothesis

from statsmodels.tsa.stattools import adfuller
#Here there *is* a trend. Null hypothesis is: constant (The variable follows a random-walk)
#Because there is a trend, the test fails.
plt.plot(np.linspace(0,10, 100)+np.random.randn(100)/10)
adfuller(np.linspace(0,10, 100)+np.random.randn(100)/10)

(0.030371082537879741,
 0.96097283593236305,
 5,
 94,
 {'1%': -3.5019123847798657,
  '10%': -2.5834538614757809,
  '5%': -2.8928152554828892},
 -130.36360151635336)

png

Here there is a trend. So we tell the test that there is, and ask: does the trend change?

We can also do an ADF test assuming that there is a trend:

plt.plot(np.linspace(0,10, 100)+np.random.randn(100)/10)
adfuller(np.linspace(0,10, 100)+np.random.randn(100)/10,regression='ct')

(-6.0891054304079377,
 1.4245907097885184e-06,
 3,
 96,
 {'1%': -4.0563093932201246,
  '10%': -3.1544345187717013,
  '5%': -3.4572550874385124},
 -161.30436800688932)

png

In the next example there is a trend that changes, but from a constant underlying function, a sin(x). The test still says the process doesn’t change. Using the ‘ct’ option also says that the process doesn’t change. As the changing trend itself (a cosine) is also stationary around zero

np.random.seed(42)
n=1000
space=np.linspace(0,100,n)
var=np.concatenate([np.sin(space[:n//2]),np.sin(space[n//2:])])+np.random.rand(n)/100
plt.plot(var)
adfuller(var,regression='ct')

(-14.936838558057197,
 2.3543437151057883e-22,
 22,
 977,
 {'1%': -3.9680661492141187,
  '10%': -3.1297006047557043,
  '5%': -3.4149932715005722},
 -8471.9770767955488)

png

We can now try to see what would the test say if it saw the function we just fitted.

plt.plot(func(X3,popt3[0],popt3[1],popt3[2]))
adfuller(func(X3,popt3[0],popt3[1],popt3[2])+np.random.rand(len(X3))/100)

(-16.794102051514088,
 1.2352461745010984e-29,
 25,
 2107,
 {'1%': -3.4334573966160153,
  '10%': -2.5675007703726296,
  '5%': -2.8629127187998606},
 -18524.80171562676)

png

So even if the variance were decreasing given a very clean equation, the test would still say that the process is stationary, so the test is not that much useful after all.

Ok, let’s go back to our Bitcoins and do the actual test

def nice(data,regtype="c"):
    return adfuller(data.dropna(),regression=regtype)[1]

print("P-value for 10 days averaged volatility: ",nice(data['Daily rolling 10']),'***')
print("P-value for 30 days averaged volatility: ",nice(data['Daily rolling 30']),'***')
print("P-value for 60 days averaged volatility: ",nice(data['Daily rolling 60']),'**')

P-value for 10 days averaged volatility:  8.59786631742e-05 ***
P-value for 30 days averaged volatility:  0.000113952000692 ***
P-value for 60 days averaged volatility:  0.034705270592 **

Thus we find that indeed for the 10 days average volatility, the data is consistent with a process with constant variance, and that remains valid for greater windows of averaging.

Let’s now examine what happens if instead, we consider as our hypothesis that the series follows a trend (y=ax+b)

print("P-value for 10 days averaged volatility: ",nice(data['Daily rolling 10'],'ct'),'***')
print("P-value for 30 days averaged volatility: ",nice(data['Daily rolling 30'],'ct'),'***')
print("P-value for 60 days averaged volatility: ",nice(data['Daily rolling 60'],'ct'),'**')

P-value for 10 days averaged volatility:  4.36675365203e-05 ***
P-value for 30 days averaged volatility:  5.24203107756e-05 ***
P-value for 60 days averaged volatility:  0.0309130608266 **

Surprise surprise, the test also says this is true! ** We cannot reject the fact that the series is generated by a process with decreasing variance either! **

For completion, below I show the ADF test for a number of possible lags. The test by default choses one using the Bayesian Information Criterion.

def makeplot(name,pos):
    plt.subplot(1,3,pos)
    for i in range(n_lags):
        pvalues[i,0]=adfuller(data[name].dropna(),regression='c',maxlag=i)[1]
        pvalues[i,1]=adfuller(data[name].dropna(),regression='ct',maxlag=i)[1]
    plt.plot(range(n_lags),pvalues)
    plt.yscale('log', nonposy='clip')
   
    plt.ylabel('P-value')
    plt.xlabel('Number of lags for the ADF test')
    plt.legend(['Assuming constant','Assuming trend'])
    plt.title('ADF test for '+name)
i=1
n_lags=30
pvalues=np.zeros((n_lags,2))
fig=plt.figure(figsize=(12,6))
makeplot('Daily rolling 10',1)
makeplot('Daily rolling 30',2)
makeplot('Daily rolling 60',3)

png

Conclusion

The ADF test is not very useful for a time series like the BTC-USD exchange rate. Here I would go with plain OLS,that says that the coefficient of the feature ‘date’ is negative at the 95% confidence level. This trend persist even if we consider only, say, post 2014 or post 2015 data. Using a nonlinear exponential fit is even better.

Thus the data warrants the conclusion that Bitcoin is, after all, becoming less volatile.

Twitter Facebook Google+ LinkedIn

José Luis Ricón

Analysis of Bitcoin Volatility

Conclusion