Team & Story

Founded in 2017 by Silver Traat and Raul Sirel. Our ambition was to build a language technology company using state-of-the-art AI capability to service customers in ever-thinkable text analytics verticals.

Having contributed to text analytics research already before founding TEXTA and continuing this work every single day now - TEXTA holds mastery in language technology.

As an industry leader our mission is to empower everyone by solving their unorganised  textual data problems through data-centric AI.

Our team core competence is in text analytics, natural language processing, machine learning, and artificial intelligence.

Our AI-based and language independent products enable our customers to improve customer experience, increase efficiency, extract value from unstructured data, manage compliance risks, automate processes, and build online trust and safety.

Silver Traat is the co-founder and CEO of TEXTA. He has worked as Head of Business Development in STACC and International Projects Manager in Eurecat Competence Center in Spain.

Merilin-Ingrid Kaalep is the CMO of TEXTA. She has previously worked as Marketing & Brand Manager in Tallink and continues consulting start-up companies on growth marketing and strategic planning.

Raul Sirel is the co-founder and CTO of TEXTA. He has obtained an MA in Computational Linguistics from University of Tartu. Raul has worked as visiting researcher in University of Western Sydney and NICTA Canberra Research Lab and as researcher and project leader in STACC.

Publications

Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification

This paper presents an industry-driven solution for extreme multi-label classification with a massive label collection. The proposed approach incorporates a large number of binary classification models with label pre-filtering and employs methods and technologies shown to be applicable in industrial scenarios where high-end computational hardware is limited. The system is evaluated on an Estonian newspaper article dataset which contains almost 2000 unique labels and has shown to perform over 80 times faster than applying all the binary models of the entire label set without negative impact on prediction scores.

https://zenodo.org/record/4306169#.YqnY0NJBxhE

Kratt: Developing an Automatic Subject Indexing Tool for The National Library of Estonia

Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately 1 minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the cataloguers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.

https://arxiv.org/abs/2203.12998v1

EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions

This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program. The collected resources were offered to participants of a hackathon organized as part of the EACL Hackashop on News Media Content Analysis and Automated Report Generation in February 2021. The hackathon had six participating teams who addressed different challenges, either from the list of proposed challenges or their own news-industry-related tasks. This paper goes beyond the scope of the hackathon, as it brings together in a coherent and compact form most of the resources developed, collected and released by the EMBEDDIA project. Moreover, it constitutes a handy source for news media industry and researchers in the fields of Natural Language Processing and Social Science.

https://aclanthology.org/2021.hackashop-1.14/

Projects

EMBEDDIA

MBEDDIA is a Horizon 2020 project, Grant ID 825153. The EMBEDDIA project seeked to use cross-lingual embeddings coupled with deep neural networks to allow existing monolingual resources to be used across languages, leveraging their high speed of operation for near real-time applications, without the need for large computational resources. It resulted with the creation of the EMBEDDIA Media Assistant (EMA), which is a collection of AI tools for the media sector and text-based industry, supporting a range of tasks and languages.
You can find out more about the tools here: https://embeddia.texta.ee/

TIM - Texta Intelligent Moderator

Texta Intelligent Moderator is supported by the European Union through the European Regional Development Fund and Enterprise Estonia (project number RE.5.04.22-0056). Project period is 01.09.2022 - 31.08.2024. Grant is in the sum of 341 366,51 euros and total cost of the project is 539 952,16 euros.

Scalable data processing pipelines, smart text annotation, and domain-specific entity recognition in TEXTA Investigator

Scalable data processing pipelines, smart text annotation, and domain-specific entity recognition in TEXTA Investigator project was co-funded by the European Union and Enterprise Estonia (project number No. EU48684) and executed in collaboration with STACC.

Deep neural models and cross-lingual embeddings

Deep neural models and cross-lingual embeddings project was co-funded by Enterprise Estonia and European Regional Development Fund (project number No. EU48684) and executed in collaboration with STACC.