Conversation
…ke 45%) by PCA factors.
|
Add an option to use a variable value of explained variance(like 45%, 55%, 65%) by PCA factors. |
PanPip
left a comment
There was a problem hiding this comment.
Good progress 👌
I left some comments regarding the code that we can discuss.
Now we should polish the docstrings and start writing the sphinx docs.
|
|
||
|
|
||
| # pylint: disable=invalid-name | ||
| # pylint: disable=R0913 |
There was a problem hiding this comment.
Let's rather do
# pylint: disable=invalid-name, too-many-arguments
| :param matrix: (pd.DataFrame) DataFrame with returns that need to be standardized. | ||
| :param vol_matrix: (pd.DataFrame) DataFrame with histoircal trading volume data. | ||
| :param k: (int) Look-back window used for volume moving average. | ||
| :return: (pd.DataFrame) a volume-adjusted returns dataFrame |
There was a problem hiding this comment.
:return: (pd.DataFrame) A volume-adjusted returns dataFrame.
| # Fill missing data with preceding values | ||
| returns = matrix.dropna(axis=0) |
There was a problem hiding this comment.
Should we rather fill values?
| # Standardized: fill nan with zero / std: fill nan with 1 | ||
|
|
There was a problem hiding this comment.
This can probably be removed now.
| So the output is a dataframe containing the weight for each asset in a portfolio for each eigen vector. | ||
|
|
||
| :param matrix: (pd.DataFrame) Dataframe with index and columns containing asset returns. | ||
| :param explained_var (float) The user-defined explained variance criteria. |
There was a problem hiding this comment.
We should add that if this parameter is given it will override the n_components parameter. And also mention that it should've in the range from 0 to 1.
| Tests the PCA Strategy from the Other Approaches module. | ||
| """ | ||
|
|
||
| import unittest | ||
| import os | ||
| import pandas as pd | ||
| import numpy as np | ||
| from arbitragelab.other_approaches import ETFStrategy | ||
|
|
||
|
|
||
| class TestPCAStrategy(unittest.TestCase): | ||
| """ | ||
| Tests PCAStrategy class. |
There was a problem hiding this comment.
The naming should be fixed.
| # Check target weights | ||
| self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5) | ||
| self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5) | ||
| self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5) | ||
|
|
||
| # Check drift argument | ||
| target_weights = self.etf_strategy.get_signals(smaller_etf, smaller_dataset, k=1, corr_window=252, | ||
| residual_window=60, sbo=1.25, sso=1.25, ssc=0.5, | ||
| sbc=0.75, size=1, drift=True) | ||
|
|
||
| # Check target weights | ||
| self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5) | ||
| self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5) | ||
| self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5) |
There was a problem hiding this comment.
It's interesting that these test values are the same.
| # Check target weights | ||
| self.assertAlmostEqual(target_weights.mean()['EEM'], 0.333333, delta=1e-5) | ||
| self.assertAlmostEqual(target_weights.mean()['BND'], -0.5, delta=1e-5) | ||
| self.assertAlmostEqual(target_weights.mean()['SPY'], -0.38888, delta=1e-5) |
There was a problem hiding this comment.
And these too. Can we pick the values of the parameters so the outputs are different?
|
|
||
| def __init__(self, n_components: int = 15): | ||
| """ | ||
| Initialize PCA StatArb Strategy. |
There was a problem hiding this comment.
Docstrings in this class should be fixed.
| First, the correlation matrix to get PCA components is calculated using a | ||
| corr_window parameter. From this, we get weights to calculate PCA factor returns. | ||
| These weights are being recalculated each time we generate (residual_window) number | ||
| of signals. |
There was a problem hiding this comment.
All these descriptions should be updated to match the ETF Approach.
PanPip
left a comment
There was a problem hiding this comment.
Made some code fixes to this PR.
| condition = min(np.cumsum(expl_variance), key=lambda x: abs(x - explained_var)) | ||
| # The number of components to use | ||
| num_pc = np.where(np.cumsum(expl_variance) == condition)[0][0] + 1 |
There was a problem hiding this comment.
This part is not working as expected, I'll show an example.
| A function to calculate weights (scaled eigen vectors) to use for factor return calculation with | ||
| asymptotic PCA. | ||
|
|
||
| Weights are calculated from PCA components as: | ||
|
|
||
| Weight = Eigen vector / std.(R) | ||
|
|
||
| So the output is a dataframe containing the weight for each asset in a portfolio for each eigen vector. |
There was a problem hiding this comment.
Please adjust this docstring to reflect the idea behind the asym PCA.
Purpose
Describe the problem or feature in addition to a link to the issues.
Approach
How does this change address the problem?
Tests for New Behavior
What new tests were added to cover new features or behaviors?
Checklist
Make sure you did the following (if applicable):
./pylintto make sure code style is consistent.Learning
Describe the research stage
Links to blog posts, patterns, libraries or addons used to solve this problem