Apify is a full-stack web scraping and automation platform helping anyone get value from the web.
Get web data. Build automations.
Actors are serverless cloud programs that extract data, automate web tasks, and run AI agents. Developers build them using JavaScript, Python, or Crawlee, Apify's open-source library. Build once, publish to Store, and earn when others use it. Thousands of developers do this - Apify handles infrastructure, billing, and monthly payouts.
Learn More
The Cloud Sales Acceleration Platform
For businesses wanting a platform to list, manage, and co-sell on cloud marketplaces with minimal engineering effort
Streamline and automate your cloud sales cycle, enhance operational efficiency, and capitalize on marketplace opportunities with the Clazar Cloud Sales Acceleration Platform.
osDQ dedicated to create apache spark based data pipeline using JSON
This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/
This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also.
Get json example at https://github.com/arrahtech/osdq-spark
How to run
Unzip the zip file
Windows : java -cp ....