Showing 8 open source projects for "java html parser"

View related business solutions
  • Premier Construction Software Icon
    Premier Construction Software

    Premier is a global leader in financial construction ERP software.

    Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.
    Learn More
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • 1
    html-to-markdown

    html-to-markdown

    Convert HTML to Markdown. Even works with entire websites

    Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    goquery

    goquery

    A little like that j-thing, only in Go

    goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/HTML package and the CSS Selector library Cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), and detach()) have been left off. Also, because the net/HTML parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    bluemonday

    bluemonday

    Fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer

    bluemonday is an HTML sanitizer implemented in Go. It is fast and highly configurable. bluemonday takes untrusted user-generated content as an input, and will return HTML that has been sanitized against an allowlist of approved HTML elements and attributes so that you can safely include the content in your web page. If you accept user-generated content, and your server uses Go, you need bluemonday. It protects sites from XSS attacks. There are many vectors for an XSS attack and the best way...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • 5
    Horusec

    Horusec

    Open source tool that improves identification of vulnerabilities

    Horusec is an open source tool that performs a static code analysis to identify security flaws during the development process. Currently, the languages for analysis are C#, Java, Kotlin, Python, Ruby, Golang, Terraform, Javascript, Typescript, Kubernetes, PHP, C, HTML, JSON, Dart, Elixir, Shell, Nginx. The tool has options to search for key leaks and security flaws in all your project's files, as well as in Git history. Horusec can be used by the developer through the CLI and by the DevSecOps team on CI /CD mats.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    protoc-gen-doc

    protoc-gen-doc

    Documentation generator plugin for Google Protocol Buffers

    This is a documentation generator plugin for the Google Protocol Buffers compiler (protoc). The plugin can generate HTML, JSON, DocBook, and Markdown documentation from comments in your .proto files. There is a Docker image available (docker pull pseudomuto/protoc-gen-doc) that has everything you need to generate documentation from your protos. The plugin is invoked by passing the --doc_out, and --doc_opt options to the protoc compiler. The docker image has two volumes: /out and /protos...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    the hotdog web browser

    the hotdog web browser

    The hotdog web browser and browser engine

    the hotdog web browser project is a hobbyist web browser and layout engine written entirely from scratch in Go to explore how browsers work under the hood, implementing core components like an HTML parser, CSS rendering, UI toolkit, networking, and layout logic without relying on heavy external dependencies. It’s far from being a complete or spec-compliant browser, but it’s designed to be a learning platform and experimental codebase for anyone curious about browser internals and rendering architecture. The repository includes custom named modules such as ketchup for HTML parsing, mayo for CSS rendering, and a minimal OpenGL/GLFW-based UI toolkit termed mustard, among others. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    go_spider

    go_spider

    An awesome Go concurrent Crawler(spider) framework

    ...It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser. Html parsing is based on goquery package. Json parsing is based on simple JSON package. Jsonp will converse to json. Text form represents plain text content without a parser. The PageProcesser moduler only parse results. The moduler gets results(key-value pairs) and URLs to be crawled next step. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB