Search Results for "java text mining preprocessing"

Showing 54 open source projects for "java text mining preprocessing"

View related business solutions
  • Apify is a full-stack web scraping and automation platform helping anyone get value from the web. Icon
    Apify is a full-stack web scraping and automation platform helping anyone get value from the web.

    Get web data. Build automations.

    Actors are serverless cloud programs that extract data, automate web tasks, and run AI agents. Developers build them using JavaScript, Python, or Crawlee, Apify's open-source library. Build once, publish to Store, and earn when others use it. Thousands of developers do this - Apify handles infrastructure, billing, and monthly payouts.
    Learn More
  • Windocks - Docker Oracle and SQL Server Containers Icon
    Windocks - Docker Oracle and SQL Server Containers

    Deliver faster. Provision data for AI/ML. Enhance data privacy. Improve quality.

    Windocks is a leader in cloud native database DevOps, recognized by Gartner as a Cool Vendor, and as an innovator by Bloor research in Test Data Management. Novartis, DriveTime, American Family Insurance, and other enterprises rely on Windocks for on-demand database environments for development, testing, and DevOps. Windocks software is easily downloaded for evaluation on standard Linux and Windows servers, for use on-premises or cloud, and for data delivery of SQL Server, Oracle, PostgreSQL, and MySQL to Docker containers or conventional database instances.
    Learn More
  • 1
    Dawarich

    Dawarich

    Self-hostable alternative to Google Timeline

    Dawarich is a command-line tool (likely Ruby-based) for transforming and analyzing Arabic text data with normalization, diacritic handling, segmentation, and morphological tokenization. Designed for text mining and NLP workflows in Arabic-language contexts.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Weka

    Weka

    Machine learning software to solve data mining problems

    Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
    Leader badge
    Downloads: 11,387 This Week
    Last Update:
    See Project
  • 3
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Dominate AI Search Results Icon
    Dominate AI Search Results

    Generative Al is shaping brand discovery. AthenaHQ ensures your brand leads the conversation.

    AthenaHQ is a cutting-edge platform for Generative Engine Optimization (GEO), designed to help brands optimize their visibility and performance across AI-driven search platforms like ChatGPT, Google AI, and more.
    Learn More
  • 5
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    DataMelt

    DataMelt

    Computation and Visualization environment

    DataMelt (or "DMelt") is an environment for numeric computation, data analysis, computational statistics, and data visualization. This Java multiplatform program is integrated with several scripting languages such as Jython (Python), Groovy, JRuby, BeanShell. DMelt can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Lingua

    Lingua

    The most accurate natural language detection library for Java

    Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    The Lemur Project

    The Lemur Project

    Search engine and data mining applications and ClueWeb datasets.

    The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 9
    libpostal

    libpostal

    A C library for parsing/normalizing street addresses around the world

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Effortlessly Manage Product Information Icon
    Effortlessly Manage Product Information

    OneTimePIM is a comprehensive Product Information Management System designed to streamline the import and distribution of product data.

    A single source of truth for all of your product information with easy ways to distribute that data to wherever it needs to go, including the most powerful e-commerce connectors in the industry.
    Learn More
  • 10
    DynaQ

    DynaQ

    Innovative text document search. http://dynaq.opendfki.de for details.

    The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. DynaQ is a (desktop)search engine with enhanced functionality for file, email and blog search. Look at our GitLab homepage for sourcecode and documentation: http://dynaq.opendfki.de
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    RapidMiner -- Data Mining, ETL, OLAP, BI
    ETL, data warehousing, data mining, OLAP, business intelligence (BI) in Java. 500+ modules: extract, transform, load (ETL), data mining, data analysis + Weka, statistical forecasting, preprocessing, validation, visualization, OLAP, business intelligence.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 12
    @Note2

    @Note2

    @Note2 - A workbench for Biomedical Text Mining

    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    DSTK - Data Science TooKit 3

    DSTK - Data Science TooKit 3

    Data and Text Mining Software for Everyone

    DSTK - Data Science Toolkit 3 is a set of data and text mining softwares, following the CRISP DM model. DSTK offers data understanding using statistical and text analysis, data preparation using normalization and text processing, modeling and evaluation for machine learning and algorithms. It is based on the old version DSTK at https://sourceforge.net/projects/dstk2/ DSTK Engine is like R. DSTK ScriptWriter offers GUI to write DSTK script. DSTK Studio offers SPSS Statistics like GUI...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    JSentiWordNet

    A wrapper for the famous SentiWordNet, a resource for opinion mining

    This project aims to provide a wrapper around the SentiWrodnet, a lexical resource for opinion mining. As defined by the authors : SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity. You can find additional information about the creation of SentiWordnet here : http://nmis.isti.cnr.it/sebastiani/Publications/LREC06.pdf sentiWordnet (avilable here : https://drive.google.com/open?id=0B0ChLbwT19XcOVZFdm5wNXA5ODg) is a text file with a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    GNAT

    GNAT

    GNAT recognizes gene names in text and maps them to NCBI Entrez Gene

    GNAT is a BioNLP/text mining tool to recognize and identify gene/protein names in natural language text. It will detect mentions of genes in text, such as PubMed/Medline abstracts, and disambiguate them to remove false positives and map them to the correct entry in the NCBI Entrez Gene database by gene ID. March 2017: We started to upload GNAT output on Medline. See files/results/medline/.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17

    sgmweka

    Weka wrapper for the SGM toolkit for text classification and modeling.

    Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. ...
    Leader badge
    Downloads: 18 This Week
    Last Update:
    See Project
  • 18
    The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Jbowl is a Java library intended to provide an API for development of text mining applications. It provides facilities for text analysis, as well as for building, evaluating and applying of various supervised and unsupervised text mining models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Stemmer Gujarati

    Stemmer Gujarati

    Offline stemmer for Gujarati , which is one of 22 Indian languages.

    This is a Gujarati stemmer in Java. Stemming is a process in which affixes are removed form the root word (stem). It relates morphological variant words to corresponding common root. For example "પ્રતિઉપયોગી" is word which has stem " ઉપયોગ". Stemmers are language specific tools. The design of a stemming algorithm requires a significant level of linguistic expertise. There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Cenobi

    Cenobi

    cost estimation and management accounting, using neural networks

    Cenobi is designed for management accountants, not (only) for statisticians and data mining experts. Carefully arranged default settings make sure you can concentrate on Cenobi's many accounting features rather than worrying about setting up artificial neural networks or genetic algorithms, which are the main machine learning tools under Cenobi's hood. Cenobi's main benefits are: - ease of use - Utilizing artificial neural networks to estimate cost relationships, Cenobi is able to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    webtextanalysis

    Mining knowledge from text data

    This project aims to implement in java the following text mining techniques: Text Language Detection, Keywords and keyphrases extraction, Text Classification, Text Clustering, Single or multiple documents Summarization, Plagiarism Detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    TML - Text Mining Library for LSA & CMM

    TML is a Java Library for LSA and extracting Concept Maps from text

    TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    TextProcessor

    A Java package to preprocess text datasets for posterior text analysis

    The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB