More

bkase · on Oct 21, 2022

I work on the tooling layer for zkapps on Mina — Great questions!

I think in the future the overhead will improve significantly, and we can already see incremental improvements in some proof systems. For Kimchi (the proof system we’ve built for zkapps/Mina), we've instead focused on features — like infinite recursion. To really see the performance overhead reduce, we’ll need to push more of the compute into silicon (think opcodes designed for zk similar to the ones we have now to accelerate RSA). In the meantime, modern processors are fast enough that this overhead doesn’t hold back many sorts of applications.

On fixed inputs: Kimchi is “universal” in that it can recursively compose arbitrary circuits efficiently — including ones with different public and private inputs and including ones that haven’t been defined yet. This is what powers zkapps and zkapps’ ability to recursively compose circuits on itself (think application-specific rollups).

Randomness: Modern blockchains have mechanisms for trustlessly agreeing on securely generated random seeds. We can tap into that and use mechanisms like verifiable random functions (which Mina uses internally during consensus) to securely manage randomness (caveat: we haven’t built out the randomness APIs for zkapps yet so there may be some nuance here that we haven’t run into yet). If you want to learn more about how this works in Mina, read up on Ouroboros Samasika, Mina Protocol’s consensus mechanism.

Prevent replay attacks: Blockchains are designed for this :). Basically increment a nonce and feed the current nonce as one of the public inputs to the zkapp.

Remote attestation: This requires nuance to define properly as there are different solutions to different versions of this problem. I’ll outline a few here:

1. Consider the case where you trust the entity that attesting to something and they can upgrade their backend: the entity adds a snark friendly signature to their endpoint and you check the signature within the zkapp (remember since they run locally, you execution can just make async network requests)

2. Consider the case where you trust the entity but can’t convince them to change their API: here you can verify the contents of the response against the TLS signature that already exists in the modern web. Caveat: this is still a work in progress, we’re currently calling it zkOracles

3. Consider the case where you don’t trust the entity — in this case we’re talking about reading information from other zkapps or blockchains. Luckily, these all come with signatures in different shapes and sizes so the problem becomes efficiently verifying those signatures from within the zkapp. You can solve this in modern plonkish zk proof systems with custom gates. Caveat: this is also a work in progress for zkapps in Mina.

bkase · on June 27, 2019

One of the authors here: Happy to answer any questions. Keep in mind this was a school project from 2012 (Kayvon's 15-418 at CMU, a wonderful class), so it's been a while.

Athas · on June 27, 2019

Is the code available? The links are dead.

bkase · on June 27, 2019

Hmm I'll take a look at that. Yes code is here: https://github.com/bkase/CUDA-grep

carlmr · on June 27, 2019

Can you run your benchmarks against ripgrep and report back?

bkase · on June 28, 2019

Unfortunately, I don't actually have access to NVidia hardware anymore -- but I would be happy to accept a PR from someone who can get this to run!

carlmr · on June 28, 2019

Thanks for the reply. I'm also without nVidia hardware so I can't do it either.

qznc · on June 28, 2019

Did grep treat the input file as ASCII or UTF8?

I remember another story where someone claimed to be faster than grep. Turned out the comparison was unfair because if this distinction. I believe LANG=C is enough to turn grep into ASCII mode.

ltc5505 · on June 27, 2019

The references on the project site lead to 404. Are they CMU internal links?

bkase · on June 28, 2019

Which links specifically are broken?

ltc5505 · on June 28, 2019

Both links in the "REFERENCES" section.

bkase · on June 27, 2019

Oh good, I'm glad we did mention this!

bkase · on June 27, 2019

One of the authors here:

Yes you are right, you should not replace grep for one off regexes due to the latency of the io operations. As far as I recall, memory throughput however is much higher on house than on CPUs (or at least this was true in 2012). Our intended use cases for this sort of a grep were cases where you are finding many needles in enormous haystacks such as: Looking for malicious machine code in executables that you download off of the internet (virus scanning), or searching for certain genetic sequences in DNA code. I believe we discussed this in our final presentation, but perhaps we left it out of the written report.

Yes we probably should have included the numbers to show why you wouldn't want to use this instead of grep for simple tasks. I am sorry we missed it.

jldugger · on June 27, 2019

But getting any of that data off disk will typically be the bottleneck.

(Also, you'd want BLAST for DNA instead of grep, but perhaps this was an intentional simplification)

calvinmorrison · on June 27, 2019

well if you're looking for certain motifs or k-mers (https://en.wikipedia.org/wiki/K-mer), it's mostly scanning through pre-processed FASTA files.

My last K-mer counter I wrote was CPU/Memory bound (big enough size strings means big hash tables to store this stuff in, or using something like a bloom filter (Jellyfish's approach), not IO bound.

Anyway, for anyone who wants to write something fun, fasta parsers / k-mer counters are a good weekend project.

here was my work in the area (shameless plug, though i'm not in the field anymore)

https://github.com/mutantturkey/dna-utils.

just tested again against a 400mbish genome fasta file from ncbi and it's still clearly an cpu/memory issue with huge hash tables. See below (4^x is the potential combinations of K-mers). K=9 is done with a 4^9 array (sane in memory), 4^12 is a sparse array (stored as a a std::unordered_map, because it's too large to allocate that much space in memory). One is obviously way slower

calvin@bison:~/src/dna-utils$ time ./kmer_total_count -k 9 -i 3816_ref_Abrus_2018_chrUn.fa.2 >/dev/null

real 0m4.235s

user 0m4.116s

sys 0m0.108s

calvin@bison:~/src/dna-utils$ time ./kmer_total_count -k 12 -i 3816_ref_Abrus_2018_chrUn.fa.2 >/dev/null

real 0m25.524s

user 0m25.292s

sys 0m0.152s

calvin@bison:~/src/dna-utils

zeusk · on June 27, 2019

I'm quite certain Windows Defender uses (or used, at some point) GPUs in the machine to accelerate malware pattern matching. I remember a coworker talking about a funny bug as a consequence when the graphics driver executable itself was infected.

cl0ckt0wer · on June 27, 2019

There's an article on the announcement last year: https://arstechnica.com/gadgets/2018/04/intel-microsoft-to-u...

dmix · on June 28, 2019

That is for their enterprise "Microsoft Defender Advanced Threat Protection" endpoint security product which scans live memory for any hidden malware not writing to disk. Which would otherwise be consuming far more CPU than the typical default windows security software. Which is why they needed GPU in the first place. But it's an interesting solution none-the-less.

https://www.microsoft.com/en-us/microsoft-365/windows/micros...

stochastic_monk · on June 27, 2019

I’ve run into this for simple tasks on a GPU, like merging sorted lists. The massive parallelism can’t make up for the transfer costs unless the operations performed are expensive enough. The “roofline” model is usually the perspective used for this kind of accelerator application.

dr_zoidberg · on June 28, 2019

Reminded me of a 12 year old paper on file carving by Marziale, Richard and Roussev[0].

[0] https://www.dfrws.org/sites/default/files/session-files/pres...

hinkley · on June 27, 2019

I know in pretty nearly all the domains I've worked, discrete packets of data will use regexp directly.

If we are using something more 'grep'-like, it's for large quantities of data, and in many cases we could treat these as a stream. So as long as the bandwidth to the GPU is sufficient, a little stall at the beginning would represent an undue hardship, if the results come back faster.

bkase · on Aug 12, 2018

I enjoyed the post, but there's something that bothers me about using the name CRDT.

A CRDT is just what mathematicians (and functional programmers) call a Semilattice[1], right? In general, I find it frustrating when people make up new names for existing mathematical concepts because it deprives others from learning and seeing the big picture. Does this resonate with anyone here?

rntz · on Aug 12, 2018

CRDTs and semilattices are definitely connected, but they're not quite the same thing. A CRDT is a data structure that can be merged across the network in a way that's commutative, associative, and idempotent. That much is exactly the same as a semilattice.

But when people talk about "string CRDTs", the underlying semilattice, the data structure that gets merged, isn't a string; it's usually something rather more complicated. Then there's a _function_ which interprets that more complex data structure as the string that the user or application really cares about.

So a CRDT is a semilattice equipped with an interpretation function.

But it gets even more complicated, because there are many ways to do CRDTs in practice. Rather than gossiping your _entire_ complex data structure across the network ("state-based CRDTs"), usually CRDTs try to only send what's necessary. This leads to optimisations like delta-based and operation-based CRDTs. These optimisations are crucial for real-world use of CRDTs, but their connection to semilattice theory is not immediately clear to me. (That doesn't mean there isn't one, though!)

In any case, the story is a little more complicated than "CRDTS are just a semilattice". I do wish more people knew about the connection, though.

bkase · on May 12, 2017

Another interesting problem slightly more interesting than a quine is writing an auto-cannibal machine (this was a homework assignment in my semester of 15-251 at CMU):

Write a function `ACM : (String -> a) -> a` aka a function that takes a string consumer as input (such as flip_upper_and_lower_cases: String -> String, or word-count String -> Int) and outputs the result of applying this transform to it's own source code.

For example if the contents of the ACM file were: `ACM(f) = print f("foo")` (note that this is clearly not a correct answer in any language)

`ACM(flip_upper_and_lower_cases)` invoked from another file should yield: 'acm(F) = PRINT F("FOO")'

(please don't post a solution publicly just in case they're still using this as homework)

bkase · on Jan 19, 2017

I found your post very compelling. I have tried messing around with the APL derivatives (J and K and Q), but never created anything substantial. I didn't know you could view the AST like that, and it did really help in your explanation. I'm definitely going to check out an APL again! Thank you for the great post!

bkase · on Dec 14, 2015

Don't miss the presentation linked at the bottom of the readme[1], and the browser-based IDE, Golem linked in the middle[2].

[1] https://www.youtube.com/watch?v=HnZipJOan54 [2] http://shem.io/

bkase · on Dec 13, 2015

Certified Programming with Dependent Types (available online as well http://adam.chlipala.net/cpdt/) is also a good resource for learning Coq (note: I just started working through the material this weekend).

bkase · on Nov 8, 2015

I've been using scm_breeze for a couple years, and it really makes me more productive with Git.

My favorite feature is that after a `gs` all the files listed are bound to sh variables $e1, $e2, etc.