DPS Lab

State-of-the-Art data and text analysis tools

Home Tools & Services StockPicking Lab

About DPS Lab

Own customized pipelines creation

Software is designed for creation of large text and data processing tasks composed from data imputation, preprocessing, processing and interpretation of outputs. You can use the results sharing feature for streamlining a team communication.

Unique combination of objects

Text datasets, Mongo DB, Postgres DB, External API, YouTube, Scrapers, Tickers, Factor groups, Models, Parquet DS, NLP tasks, Pre-Processing tasks, Data quality check, Financial features, Models training, Summarization, Sentiment, Charts.

Versatility

The modules are prepared for the input of data from manifold areas, such as biology or meteorology and others characterized by stochastic behavior. The application is ready to integrate the several type of data and bring a solution in the desired domain.

How does DPS Lab work?

Inputs

The input object is files of functions for importing data of various formats.

The platform enables connection to public and private sector information systems, as well as applications in which data is published with a regular period and repetition.

The range of data used ranges from open public sector data (US Statements/EDI), to data downloaded by web crawlers (for example Yahoo Finance or Seeking Alpha), to paid APIs (for example RapidAPI), to private data that users upload to the platform themselves.

Text datasets

Node for textual dataset selection.

Mongo DB

Node for loading data from external Mongo database.

Postgres DB

Node for loading data from external PostgreSQL database.

External API

Node for loading data from external API.

Youtube

Node for loading data from Youtube videos.

Scraper

Node for loading data from Scraper.

Tickers

Node for Ticker selection.

Factor groups

Node for factor group selection.

Models

Node for model selection.

Parquet DS

Node for parquet dataset selection.

Preprocessing and Processing

The data collected by the platform is available to the data preprocessing and processing subject. The platform provides new data processing functions, such as summarization tools and sentiment identification tools.

The development of these functions used the latest knowledge in the fields of machine learning, deep learning, the development of artificial intelligence (AI), in general, the use of cognitive technologies.

Tokenize

Tokenize string input into array output. Posibility choose czech or english language.

Remove stop words

Basis stopwords. Posibility choose czech or english language.

Check data quality

The input can be a ticker (stock) or parquet file.

Lemmatize

Lemmatize text. Posibility choose czech or english language.

POS Tagger

Part-of-speech recognition. Posibility choose czech or english language.

Financial features

The input is a ticker (stock) and the selected factor group. According to the group, the performance factors for the selected ticker are calculated.

Drop columns

Drop selected columns.

Rename columns

Rename selected columns.

Train model

The input is the parguet file and the selected model.

Pre-Processing

Most common preprocessing techniques. Remove numbers, emoticons, accents, etc.

Summarize

Summarize given text.

Sentiment

Get sentiment for given text.

Results displaying and saving

The data processing object is followed by the data storage object (data repository).

Data can be exported from the platform into machine-readable formats, which enable further editing and data import into other systems. Communication with third-party systems or applications is ensured by both the export option and API output, which enables automated data transfer for cases of automated communication with other systems.