Astro - Hacker News

31 comments

smithkl42 a few seconds ago ago

We've been using Aspose.PDF for the last 10 years or so in our C# platform, and paying for the license. It's expensive and buggy and has shite support, so a year or so back I decided to see if there was some other library or combination of libraries that could meet our needs. Basically, we needed:
* HTML to PDF * Compress PDF * Manual PDF generation * Text extraction * No browser engine or other weird dependencies
I researched every library I could find, and downloaded, integrated and tested anything that looked remotely promising.
At the end of all that, I reluctantly handed my company credit card back to Aspose. There simply wasn't any open-source or even just cheaper PDF library that I could actually make work, and all the other paid ones that did work were even more expensive.
klysm a minute ago ago

If you are looking for a solution to generate PDF reports, I highly recommend using typst
tom_alexander 2 hours ago ago

> obviously needs to be a PDF
I've been making my reports in self-contained HTML files[0] and it works out so much better than PDF. It is not constrained by paper sizes, and it lets me add some nifty features. For example, I recently added support for hiding columns in a table using exclusively CSS. The only downside is browsers can render things slightly differently, but for my use cases I don't need pixel-perfect identical rendering.
[0] Images are inlined base64-encoded, CSS/JS embedded with style and script tags. No external assets / no http requests.
[-]
- giancarlostoro 2 hours ago ago
  
  You can also use media queries for printing specific styling too so you can remove things that maybe a user doesn't need to print out:
  https://developer.mozilla.org/en-US/docs/Web/CSS/Guides/Medi...
- dmboyd an hour ago ago
  
  Being constrained by page sizes is “a feature, not a bug” in most contexts. If I’m calling out numbers on the 3rd line of page 38 of a report, it helps if that’s consistent.
- dwroberts 2 hours ago ago
  
  Unless you can embed fonts [into the page itself] you aren’t beating PDF
  [-]
  - giancarlostoro 39 minutes ago ago
    
    Not only can you embed the fonts, but you can make it interactive and output a PDF if you really wanted to. The HTML might grow if you embed enough JS, but on the other hand... some PDFs are insanely large.
  - fuzzy2 an hour ago ago
    
    Not a problem with data: URIs. But then, a report may not need fancy fonts if HTML is acceptable.
  - gnomewascool an hour ago ago
    
    You can embed fonts into an HTML page. For example, place an @font-face with the src:url being a base64-encoded blob, in a style element.
- kgwxd 39 minutes ago ago
  
  The only reason PDFs still have a job is: pixel perfect consistency; the built-in validity stuff (ensuring the document wasn't altered, etc.); or the customer doesn't need the other things, but isn't open to alternatives. Otherwise, PDF is just a major headache.
  [-]
  - wongarsu 5 minutes ago ago
    
    Also page-level consistency, and generally layouting in a printable format
    Even with the same word document opened only in various MS Word versions (web, desktop, etc) you won't get consistent page numbers. And HTML tables work great on screen but don't print very well if they span more than what fits on a single sheet of paper
mythz 2 hours ago ago

The wider .NET ecosystem is lacking when trying to step out the mainline. I don't bother hunting for unused, partially implemented .NET libraries anymore and just call out to a process or API call when needing to get something done.
It's not ideal, but when there isn't a good option isn't available in .NET it's usually available in Python/npm. Typically I'll use background jobs when calling out of process for added resiliency/replayability and observability.
[-]
- cm2187 2 hours ago ago
  
  Not sure I agree. Also depends of the domain. The python ecosystem is of course a lot richer for anything AI. But try to open, manipulate and export spreadsheets. In python you pretty much need a different library for every excel file format (xls, xlsx, etc) and usually the more file formats a library can handle, the least capable it is (eg pandas). In .net you have libraries like spreadsheetgear that are super powerful, including their own excel calculation engine. I see nothing remotely close in python.
- pjmlp 35 minutes ago ago
  
  There is hardly anything that isn't available in .NET, the main problem is being willing to pay for tooling.
- thiago_fm 2 hours ago ago
  
  This looks like ChatGPT. There are PLENTY of alternatives on the post.
  Python and others have similar issues, with them having limitations as well
  [-]
  - mythz 2 hours ago ago
    
    It wouldn't be a quest if there were lots of good options, a few good options is better than lots of unused/unmaintained ones.
gpvos 35 minutes ago ago

When I used PdfSharp about 9 years ago, it wasn't really designed to import arbitrary PDFs; it crashed or hung on many less common constructs or invalid PDF files. It was really only designed to either create PDFs or edit PDFs created by itself (or MigraDoc, which used it); that it could also import some other PDFs was considered a bonus by its maintainers. I submitted some patches back then to fix the most egregious problems. Hopefully it has improved.
We needed a library to read arbitrary PDF files (although I forgot what exactly we needed to read from them; it wasn't for full rendering) and ended up using PdfSharp, because iText did not respond to our pricing request.
bob1029 24 minutes ago ago

My favorite approach for PDF rasterization was to interop with a simple, custom Java console application that leveraged Apache PdfBox.
This lasted until the log4j exploit, at which point we had to abandon it altogether due to our customers (banks) having a complete meltdown over it at the time.
It's probably still a really good option. I would definitely go back to it in a different context.
Archelaos 2 hours ago ago

I create PDF files from C# using LaTeX as an intermediate format. This works very reliable but sometimes takes a bit of tinkering until everything fits.
People here on HN recently recommended Typst as a replacement for LaTeX, but I haven't tried it myself yet.
tonyedgecombe 2 hours ago ago

>Naturally, I first started looking for permissively licensed libraries, which could be used free of charge and without additional license requirements.
There is a lot of work in a good PDF library, expecting to get it for free feels unreasonable to me.
actionfromafar 13 days ago ago

I my eyes, PdfSharpCore¹ is now the "canonical" version of pdfcore.
IMHO the list is incomplete without it.
1: https://github.com/ststeiger/PdfSharpCore
[-]
- eXpl0it3r 13 days ago ago
  
  It seems the PDFSharp rabbit hole goes even deeper than I've realized!
  Latest MigraDoc & PDFSharp seem to have been updated and ported to .NET 6 after a lot of the forks happened, so it was unclear to me whether there's merits in looking at other, mostly abandoned forks.
  I might add PdfSharpCore, though the use of SixLabors.ImageSharp and SixLabors.Fonts leads to a disqualification from the "quest", given their custom split license [1]
  Edit: Actually, the license seems to turn into an Apache 2.0 license, when used with an open source licensed project and also as transitive dependency. Certainly a confusing license.
  [1] https://github.com/SixLabors/ImageSharp/blob/main/LICENSE
  [-]
  - actionfromafar 13 days ago ago
    
    Edit: PSA - PdfSharpCore uses older releases of SixLabors.ImageSharp v1.0.4 and Fonts-1.0.0-beta17 which both were (and are still) distributed under plain Apache-2.0.
    https://web.archive.org/web/20251104163604/https://codeload....
    
    [-]
    
    eXpl0it3r 13 days ago ago
    
    Good to know, thank you!
    Though, makes me wonder how much "old code" this is then collecting...
flanbiscuit 2 hours ago ago

I needed this post a year ago when I was looking for this exact thing. I did end up going with Puppeteer because I needed it for something else that I couldn't avoid. I use a large list of flags with it to launch the most minimal version of headless Chrome that I can.
I am going to look into switching to MigraDoc and see if i can drop puppeteer
Thanks for this great research!

fuzzy2 2 hours ago ago

Oh yeah, PDF. In a past project I created a monster solution:

  * Scriban to fill in templates (LaTeX)
  * Custom Angular SSR to reuse frontend components (charts etc)
  * Playwright to convert SSR output to PDF
  * LuaLaTeX to convert LaTeX document + stuff to PDF

Super slow, but very high quality results. Do not try this at home!

Scriban is totally awesome though.

sander1095 2 hours ago ago

Thanks for this post! I've wanted to create such a post for a long while but never got around to it. Yours is fantastic!
giancarlostoro 2 hours ago ago

At work we were using I think it was GDPicture? Which is now called Nutrient. They started out with a flat fee, royalty free, then their pricing scheme became more hostile over time (per developer, per application licensing, and I don't recall if they wanted to know how many users - which is crazy unless it's a SaaS). I have friends (former coworkers) and family who ask me for advice on software libraries to use for what, since they know I'm a hyper nerd for that sort of thing, last time a former coworker asked what PDF library to use I told them to avoid Nutrient like the plague. There's wanting to be sustainable and then there's greed.
So yeah I too was looking for permissive licensing. The worst part is now its drastically harder for me to suggest any paid alternatives because we don't know that the alternative wont hike up prices on us. It's a really awful spot to be in.
[-]
- davsti4 an hour ago ago
  
  PDFlib - I've used it since 2001. Their pricing is stable, and they've been flexible over the years as computing models have shifted.
thiago_fm 2 hours ago ago

The wrappers to wkhtmltopdf look to me the best candidates.
Which use-cases needing Qt WebKit is an issue?
[-]
- pabs3 2 hours ago ago
  
  wkhtmltopdf is unmaintained and deprecated though.