Hacker Newsnew | past | comments | ask | show | jobs | submit | gettalong's commentslogin

Regarding "Important Notice on Third-Party Components": You are including AGPL components in your project which itself is AGPL.

Even if you distribute your part of the code under the commercial license, the components make - depending on how they are integrated into BentoPDF - the whole project AGPL again. So it might be possible that all your commercial customers are breaking the AGPL of the included components. For example, if your code directly calls code in the components, the integration is tight and AGPL would most certainly apply. If all you do is call the components as external binaries, then it is fine.


In case you don't know it: PyHanko provides a full and up-to-date implementation for digital signatures in PDF, including time stamping.

https://github.com/MatthiasValvekens/pyHanko


Thank you for the reply and for mentioning pyHanko!

Yes, it is in Python and much more powerful/comprehensive.

I personally really prefer Python too, but unfortunately it is too heavy for Cloudflare Workers and similar edge environments (due to the WebAssembly overhead etc). It is survivable, but consumes significantly more resources.

That is exactly why I created this tool - it stays small and lightweight, and definitely would not grow into full signing support.


The library is dual-licensed as AGPL plus a commercial license. So everything is in the open and can be tested and tried out under the AGPL. Once the library is used in a commercial context, you nearly always need to buy the commercial licenses to stay compliant. This is how it generally works.

What the commercial license does is a different thing. You could charge once OR once and for every upgrade to an (arbitrarily defined) new major version OR each year via a subscription OR ... It is really up to you and how you want to handle this.


The AGPL is pretty easy to comply with in a commercial context even if you are using it as part of a SaaSS product.

Just either use the code unmodified, or release your modifications to customers, or to the public in general.

Do the businesses buying commercial licenses just not understand the AGPL license? or are their development processes not rigorous enough to ensure compliance? The AGPL includes some easy ways to be forgiven for accidental violations, so that should not be a problem in almost all cases. So only deliberate non-compliance should be an issue.


I'm not a lawyer but I think you mistaken in this regard. One indication for this is that otherwise some major companies would have problems.

For example, the GPL FAQ has the following part in the FAQ item title "What is the difference between an 'aggregate' and other kinds of 'modified versions'?" (https://www.gnu.org/licenses/gpl-faq.en.html#MereAggregation):

> If the modules are included in the same executable file, they are definitely combined in one program. If modules are designed to run linked together in a shared address space, that almost surely means combining them into one program.

A combined work needs to be distributed under the AGPL, an aggregated work does not. Since Ruby is interpreted the code of HexaPDF loaded from the application would run in the same address space and thus it would be a combined work.

The following two links are also relevant: https://opensource.stackexchange.com/questions/5003/agplv3-s... and https://opensource.stackexchange.com/questions/5010/can-i-us...


Just add an AGPL command-line interface, or a daemon wrapping the library and you have a process boundary. That doesn't necessarily create a derivative work boundary, but it probably would if it is generic enough to be useful to everyone.


Yes, creating a binary and calling that would circumvent the AGPL. But then everything will be more complex and slower.

Also, doing this extra work and developing the binary is probably more expensive than just buying a commercial license.


Thanks and corrected!


Thanks!

I agree that laying out PDFs could be made easier by using a declarative mechanism instead of coding. However, I'm still not sure what the best way would be to do that. Using HTML/CSS for this and doing it right would entail implementing something like PrinceXML...

With the current layout model you have the possibility to implement price layouts, like the one needed by Swiss QR bills (see https://x.com/_gettalong/status/1748823670368117154), or just define the general layout and let HexaPDF decide the final position (see https://hexapdf.gettalong.org/examples/pdfa.html).

If you have any ideas, how laying out PDFs could be made simpler, I'm all ears!


You can get a long way with only implementing the most basic things of the PDF specification, like section 7. And even there you don't need everything. For example, there is no need to implement the CCITTFaxDecode, JBIG2Decode, DCTDecode or JPXDecode filters if you don't want to get at the raw pixels of the images.

Once you have parsing and writing of a simple PDF file going (sections 7.2, 7.3, 7.4, 7.5, 7.7), add in support for encryption (section 7.6). Now you are able to handle to at least parse and write nearly all PDF files.

Then implement all the things you need gradually For example:

* Need support for parsing or creating the contents of a page? -> sections 7.8, 8, and 9. Mind you, start out with only supporting the built-in PDF fonts for creating text and later add support for TrueType (easier) and OpenType (harder if you need to implement the font parser yourself).

* Need support for annotations? -> section 12.5

And so on.

If you just need to store the metadata in the PDF, you only need support for parsing and writing a PDF because this usually also entails that you can modify the PDF object tree which is needed for storing the metadata. However, if you need to store that metadata in a way that is usable for other PDF processors, you would need to store it as an XMP file and creating that is yet another deep dive if you don't have an XMP library available. See section 14.3.2 in the PDF spec for this (btw. the latest PDF spec is available at no cost at https://pdfa.org/resource/iso-32000-2/).


Many PDF viewers and library do not fully follow the PDF standard or have subtle bugs. This leads to problems later on.

My guess is this all started many years ago when Adobe Reader was the standard PDF reader and it was (and still is) very lenient when it comes to PDFs that aren't exactly following the specification.

So what did everyone else? They followed Adobe's lead because "But, but, ... it works in Adobe Reader!"


What are you using to generate the PDFs? Are you doing something like PrinceXML (or weasyprint) which directly convert HTML+CSS to PDF? Or are you converting HTML+CSS to something else?

Are there any demo pages where it is possible to view the generated PDFs? I would imagine this would be easier to do since this is all based on Javascript.


Good question !

For the generation we have an API that you can use npm @onedoc/client which allows you to do unlimited rendering with watermark, and some without the watermark to a certain limit. But you are also free to use any other software or API allowing this (PrinceXML is one of them. In our case we directly convert the bundle HTML/CSS to PDF and have 100% accuracy over the layout conversion (compare to chrome solution for example) as the open-source library react-print-pdf has been design to do that.

About the demo page, 2 choices if you use Onedoc API. 1. you can render your PDF and write it directly in you local machine --> so you can open it on your editor (VScode or whatever) and have side by side the code and the preview 2. you can render the PDF and host it on our cloud platform. You'll get a link to share it with others and you will also be able to preview it directly from our webapp.

Here is some links: API: https://www.npmjs.com/package/@onedoc/client Onedoc App: https://app.onedoclabs.com/login Documentation: 1) react-print-pdf https://react.onedoclabs.com/introduction 2) onedoc API: https://docs.onedoclabs.com/introduction

I hope it helps. let me know


Thanks - that helps!


We are using a wrapper around PrinceXML until we can move into a separate rendering engine. We want to be able to offer the same layout features of such an engine, hopefully much faster.

You can have a look at our (WIP) set of templates at https://react.onedoclabs.com/ui/templates where the images are automatically built from the PDFs themselves.

We are trying things out to see how we can make a live preview for development purposes but the challenges of pagination are quite hard to solve in an elegant way at the moment. We are experimenting with Taffy to see how it could fit our use case but this is still a very early tentative.


Thanks for your answer! I imagined you would be using PrinceXML behind the scenes since that is probably the gold standard in HTML+CSS rendering.

The only open source alternative I know of is WeasyPrint at https://weasyprint.org/. I'm not sure how well it fares against PrinceXML, though.

And thanks for the pointer to Taffy - I didn't know it before!


It may be a valid HTML and JPEG (didn't check) but it is definitely not a valid PDF file. For example, it is missing the version identifiers in the PDF header, there is no cross-reference table and there is no PDF file trailer.


There is no reason to abandon PDF. As was already stated the main purpose of a PDF is viewing something as the author has intended, it should not be dynamic like a website. And editing is possible but not the intended use-case (and therefore harder to get right).

As for merging PDFs you first need to know what the user wants. Just merging the pages? Or should AcroForm forms be merged? Should the outline be merged, too? Depending on the answers you can then proceed with the merge. Simply page merging can be done with any tool, merging AcroForm forms or outlines may require more advanced tools.

PDF is actively worked on at multiple fronts, for example, support for HDR, newer image formats, maybe variable width font embedding, better re-flowing capabilities when this is needed (e.g. for small-screen devices)...

And with the ISO PDF 2.0 specification now freely available, anyone can join in :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: