More

akamor · on Aug 10, 2024

Does this work well on kvps and tables? That is where I typically have the most trouble with tesseract and where the cloud provider ocr systems really shine.

akamor · on Feb 20, 2024

tl;dr NER models work best when they process data similar in structure to their training data. This notebook shows a few examples of how one can use an NER model not trained on JSON to actually identify and redact both structured and unstructured values with in a JSON blog. The NER model used is free available but closed source, but one can expect similar results with open source NER models such as spacy.

akamor · on Oct 17, 2019

Is there a video somewhere of this in action?

cbertschy · on Oct 17, 2019

https://youtu.be/Rn4X2ZgZvsk and https://youtu.be/u4Jh1daCl60

akamor · on July 24, 2019

There are also way too many people with access to non-anonymized data. i.e. the development team that has read privileges on the production database. e.g. employees at uber spying on customers (https://www.theguardian.com/technology/2016/dec/13/uber-empl...).

edit: shameless plug. check out tonic.ai for a solution to the above problem.

akamor · on April 25, 2019

I'm one of the creators of Masquerade. It's open sourced and you can check it out here:

https://github.com/TonicAI/masquerade

akamor · on April 1, 2019

Tonic AI | Atlanta or SF| Software Engineer | ONSITE

At Tonic we are building tools to help people create synthetic data that looks, feels, and acts like their real data, without compromising security, privacy, or regulatory compliance.

Looking for a full stack engineer with a preference for someone stronger with back-end technologies.

Tech Stack: dotnet core, java, react, python

Learn more: https://tonic.ai

E-mail us at hello AT tonic . ai

Paying market rate salaries + equity

akamor · on March 5, 2019

Hi nagarjun. My company (https://tonic.ai) builds dev tools and we've noticed recently that companies with remote teams have been using our product for an unexpected use case. We are trying to investigate it further. Would you be willing to chat with me? If so, I'll drop you a way to get in touch and we can chat.

quick fyi, it does not compete with OP's product.

odensc · on March 5, 2019

Just a heads up, on your Product page (https://tonic.ai/product) you have the same description under both "Point-and-click" and "Built-in Statistics."

akamor · on March 5, 2019

Thanks. Our landing page is currently in a constant state of flux. I'll make sure that gets fixed. I'm a bit surprised but my previous reply actually generated a non insignificant amount of traffic to our site so we also opened up app.tonic.ai for anyone that wants to give the product a whirl.

akamor · on March 5, 2019

Eddie, I think it would be neat if we could build vendor images by just supplying docker containers with maybe some type of config.

At Tonic (https://tonic.ai) we do on-prem deploys with docker containers and docker-compose. Its seamless and would be great to use that same flow for Digital Ocean marketplace.

arikfr · on March 5, 2019

We (Redash) have a similar setup (Docker Compose based) and we used Packer to build the DigitalOcean image. Our setup is public on GitHub, in case you want to copy:

https://github.com/getredash/setup

akamor · on March 5, 2019

Thats awesome. Thanks a lot, arikfr.

eddiezane · on March 5, 2019

We have some vendors building images with a variety of methods—Packer for example (blog post coming soon). I _want_ to say there is someone building out of a container. We've got a repo [0] with our current process but definitely looking for ways to improve it. You should fill out the vendor form and we'll be in touch [1]!

0: https://github.com/digitalocean/marketplace-partners

1: https://marketplace.digitalocean.com/vendors

akamor · on Feb 11, 2019

Hey folks, I helped build Tonic Document Masker. If you have any questions let us know and we'll be happy to answer.

akamor · on Jan 31, 2019

Hi, this is Adam. I'm a founder at Tonic.

As others have said, we've found a lot of smaller companies will test with production data because of their need/desire to move quickly. But we've also seen much much larger companies use production data in their dev/staging environments. Sometimes there will be production-like safeguards and security measures in place but not always. People shy away from practices that slow down development and testing.

We think synthetic data is the right solution for a few reasons. Most importantly, we believe it provides the right level security, while still allowing your team to be productive, i.e., your business logic and test cases still work. It also allows you to scale really easily since you effectively have a ruleset for generating data of any size. Finally, it’s a great way to share data throughout your organization and can help facilitate sales and partnerships. If you’re curious about scaling, check this post out: https://www.tonic.ai/blog/condenser-a-database-subsetting-to...