Skip to content

Methodology Upgrade: Improving Data Quality and Factor Calculations

At Analytical Platform, we continuously refine our methodology and research infrastructure to ensure that our factor models are built on the highest-quality data possible.

In June 2026, we introduced a major upgrade to how we process, store, and calculate characteristics across our investment universes. These changes improve data integrity, historical consistency, transparency, and the interpretability of our factors.

This upgrade strengthens the data foundation behind the Analytical Platform and improves how we calculate, store, and interpret investment characteristics.

For readers less familiar with factor investing, Investopedia provides a useful overview of factor investing. However, we have adapted this approach in line with our Stock Characteristics theory, focusing primarily on long-only portfolios.

Make every characteristic reflect the underlying data more accurately while preserving historical consistency and transparency.

Why We Changed Our Data Methodology

Data quality is one of the most important components of systematic investing. While our previous methodology produced robust results, years of research and platform development highlighted several areas where we could improve the relationship between raw data and the characteristics derived from them.

Our goal was simple: make every characteristic reflect the underlying data more accurately while preserving historical consistency and transparency.

How Our Data Methodology Worked Until May 2026

1. Some characteristics used the entire 13-year history

Certain indicators, especially volume-based factors, incorporated data from the full historical dataset, even when calculating current values.

This occasionally caused historical observations to influence present-day characteristics more than intended.

2. Historical values for non-saved portfolios were recalculated every month

For portfolios that were not explicitly saved by users, we recalculated historical values each month using the most recently available data.

Saved portfolios were unaffected because their historical states were preserved.

While this approach ensured consistency with newly acquired data, it meant that historical values could change over time.

3. Annual fundamental values were used

Many characteristics relied on annual financial statements.

This helped reduce seasonality but also significantly reduced the responsiveness of certain factors.

Some characteristics changed very slowly, limiting their ability to capture more recent developments.

4. Missing values (NaN) were treated as zero

Historically, missing observations were converted into zeros.

While practical, this occasionally introduced unintended distortions into factor calculations, particularly when a missing value carried a very different meaning than an actual zero.

5. Front-filling was applied to selected indicators

For some datasets, such as CapEx-related indicators, we used front-filling.

This meant that the last available value continued to be used indefinitely, even if a company had not reported an updated value for several years.

6. Most universes were survivorship biased

With the exception of the SP500 Historical universe, most universes were constructed using current constituents.

As a result, they benefited from survivorship bias and often exhibited stronger historical performance than would have been achievable in real time.

New Data Methodology from June 2026 Onward

1. A Maximum 3-Year Data Window

We now use a maximum rolling window of 3 years (756 trading days).

This corresponds to the longest indicators currently available in the platform.

Current values are now influenced only by data that can realistically affect them.

As a result, characteristics are more aligned with their intended definitions, and long-term historical observations no longer have unnecessary influence on present-day factor values.

2. Historical Values Become Fixed Once Stored

Only the most recently stored month can be recalculated.

All previous monthly observations become fixed and immutable.

Historical results now reflect the information that was actually available at that specific point in time.

This provides a more accurate representation of historical decision-making conditions.

3. Quarterly Instead of Annual Fundamentals

We migrated from annual reporting data to quarterly reporting data.

Benefits include faster reactions to changing fundamentals, more dynamic factor behavior, and improved responsiveness to company developments.

Faster reaction to changing fundamentals comes at the cost of higher turnover and increased seasonality.

We believe this better reflects how markets process information in practice.

4. NaN Values Are No Longer Converted to Zero

Missing values are now preserved as NaN.

When calculating characteristics, NaN values are considered only after all available valid observations have been exhausted.

This eliminates situations where artificial zeros could negatively influence calculations and portfolio construction.

As a secondary effect, this also reduces the unintended influence of Market Capitalization as a tie-breaker in certain ranking processes.

5. Reduced Front-Filling

For selected indicators, front-filling has been removed entirely.

When a value is unavailable, it is now handled according to the NaN methodology described above.

This prevents stale information from remaining active in the system for extended periods.

6. Historical Universes Rebuilt

We have rebuilt nearly all universes using historical constituent information.

The current exceptions are:

  • STOXX 600
  • ESG
  • SP100

For these universes, historical constituent data is not yet available. However, all future observations will be constructed using the new methodology going forward.

Example: Return on Assets (ROA)

One of the clearest examples of the impact of these changes can be seen in our Return on Assets family of characteristics.

Previous Approach

Historically, we calculated ROA using annual Net Income and average Total Assets over the previous two years.

Derived characteristics such as Return on Assets Change 66 and Return on Assets Change 252 were then calculated from those annual values.

Because annual ROA changes slowly, shorter-term change indicators often provided limited additional information.

New Approach

We now calculate ROA using quarterly data.

This significantly increases the informational content of derivative characteristics.

What previously behaved like a fiscal-year change now behaves much closer to a fiscal-quarter change.

Return on Assets Change 252 no longer resembles a yearly change metric. Instead, it reacts to quarterly changes in company fundamentals and provides a much more dynamic view of business development.

Roadmap

Current development priorities:
  • Improving dividend-related characteristics
  • Building advanced data quality monitoring systems
  • Developing portfolio reasoning and explainability features

We are actively redesigning several dividend-related indicators where we believe further improvements are possible.

At the same time, we are building automated systems that identify potential data issues and track their impact on characteristics and portfolio construction.

We are also investing heavily in portfolio reasoning capabilities that help explain why individual stocks and portfolios receive their rankings.

Our Philosophy

Quality data is not a feature of factor investing. It is its foundation.

Quality data and rigorous methodology are the cornerstones of factor investing.

At Analytical Platform, we believe transparency matters just as much as performance.

Better investment decisions start with better data.

These methodology improvements are part of our long-term commitment to providing investors with:

  • Better data
  • More realistic historical results
  • Higher-quality factor research
  • Greater transparency into how portfolios are constructed

As we continue to expand the platform, data quality and methodological rigor will remain at the center of everything we build.