Introducing: Pre-engineered Stock Fundamental Features at
3 min readJul 8, 2021

By Ernest Chan is pleased to announce the launch of our new pre-engineered stock fundamental features. These features are ratios and indicators that have been constructed from the quarterly and annual financial statements of public companies. In the US, these are the 10-Q and 10-K filings to the SEC. Multiple independent studies have shown that merging fundamental and technical features can predict returns more accurately than using technical features alone. (For example, Cao and You, 2020.) has created 40 stocks’ fundamental features, free to all our subscribers. These features are sometimes called “cross-sectional” factors (Ruppert and Matteson, 2015, chapter 18), and are sourced from the Sharadar stock fundamental dataset. This dataset contains more than 6000 US listed companies1 and nearly 10,000 delisted companies. We have carefully vetted this list to ensure it is survivorship-bias-free and that all the data is captured point-in-time; without restatements. This data is available quarterly, yearly, and for trailing twelve months (these are called the 3 “dimensions” in Sharadar’s terminology). The original Sharadar dataset has around 140 raw data fields. We have narrowed and filtered this down to 40 essential features for users, as many of the features in the original set were redundant or highly correlated.

To use this data for machine learning, the time series data has been made stationary. For indicators which are denominated in dollars, such as total assets or capex (capital expenditure), we compute the percentage change between a given filing and the next. For indicators that are already in percentages, like gross margin or net margin, we simply take the difference between the values for successive filings. These conversions are done for each of the 3 dimensions — quarterly, yearly and trailing twelve months.

We have also normalized this data cross-sectionally to facilitate easy comparisons across different stocks.This allows users to merge data from different stocks into a single training data set for easy input into our no-code machine learning web interface or API. We multiply ratios like “earnings-per-share” or “debt-per-share” with the total number of common shares outstanding, divide by the enterprise value, then finally take the difference between consecutive filings. To reiterate, this normalization is done for each of the 3 dimensions. has also ensured that a reused ticker symbol is assigned to the last company that uses that ticker, not to a delisted company that previously had that ticker. This is to avoid any kind of survivorship biases. All reused ticker symbols are available with a numeric counter as a postfix at the end. For example, Australia Acquisition corporation and Ares Acquisition corporation both have the ticker symbol AAC. Australia Acquisition corporation was delisted in 2012, so it is available as AAC1 in our database. In contrast, Ares Acquisition corporation is still actively listed as AAC. Conversely, in the case where the ticker symbol for a company has changed over time, all its data is tied to its last valid ticker.

As SEC filings generally happen at arbitrary times on any given day, to avoid look ahead biases during backtests, the date of the associated data has been shifted to the next trading day to guarantee its availability long before the market opens. enables automatic merging of this fundamental data with your own technical or higher frequency data as one single input dataset for machine learning. When merging the Sharadar fundamental features with higher frequency data, the fundamental features are forward filled till the next valid filing date. For example, gross margin for quarterly filings will be forward filled for all dates till the gross margin for the next quarter is available on the date of its filing.

In our manual, you can view a detailed list of all the new features, as well as a description on how to use them. The specific preprocessing functions applied are also described there. All data is updated every trading day by 5am New York Time.

If you have any questions, please do not hesitate to reach out to us at

For more information about our work, visit

Further reading: “Fundamentals Analysis Via Machine learning” and “Sharadar Fundamental data doc”.


1This includes all ADRs but not OTC stocks.



Machine Learning SaaS, using AI to correct human decision-making and optimize business processes.