I'll bet the "Wizard of Oz" strategy would work well for this idea.
Start off by NOT building any of the fancy infrastructure—the crawler, crunchbase joiner, LDA algorithm to cluster investors, etc.
But set up a form where somebody can enter their company idea, URL, and traction metrics and then you'll do the research and suggest 5-10 VC firms and partners.
You (the maker of the site) will learn a lot from being the wizard of oz behind the curtain.
And when you're done, you have two routes: 1) the "Google" approach: build algorithms to automate what you're doing already, 2) the "eBay" approach: let the VCs themselves onto the site, show them a stream of (anonymized?) pitches, and let them choose which entrepreneurs to meet. No need for an algorithm. Then the site becomes a marketplace to match startups and investors, a sort of an online, always-running demo day.
I think an "anonymized" pitch would be hard at best and self-defeating at worst. If a company has tangible traction in the space, they way they describe themselves, the industry, and their products will quickly show who they are.
I'm more interested in your "Wizard of Oz" idea. It's basically LendingTree for the VC community. If you could pair it with good warm introductions to VCs, that could be really useful. And realistically, if you analyzed someone's LinkedIn (probably cross ref with FB), you may be able to figure out who they might approach for the introductions.
(I don't think LinkedIn enough is alone.. you may have just traded cards. Being connected on Facebook is a stronger indicator of some sort of relationship or possible engagement. Bi-directional following on Twitter is probably even stronger.)
Isn't the second version extremely similar to AngelList? Just includes much more proprietary information about the startups (so I'd imagine it wouldn't be as public-facing). Still an interesting idea.
"So an automated tool that crawls VC websites, pulls the links to each and every portfolio company, categorizes these investments by stage, sector, geography, and ideally a host of other things, would be incredibly useful."
In theory, not hard. It would be fun to run a LDA model and see what topics come up.
In practice, hard. Given that every VC website probably presents this info in slightly different ways, not to mention the variety in the portfolio company websites, not to mention the survivorhip bias (where is the info on the portfolio companies which failed)...
But if we hired a person to gather and structure all of this data... seriously, mechanical turks have their uses.
It wasn't in practice that hard to set up, it's a scraper in python with a DSL to tell it how each VC site set up their portfolio page.
The pain is maintaining it. Every week a few of the scrapers break because the VC changes their site. And recently, with so many seed funds starting up, it's missing a lot of the interesting new VCs.
[I once scraped Crunchbase for each VC firm, and the descriptions of every company they invested in, ran them all through a naive bayes model, and set it up so that you could type in your company description and it would tell you which firm was most likely to invest in you. It didn't work at all, giving me new appreciation for people who actually know how to make ML models work.]
You may want to give Funderbeam (http://funderbeam.com) a shot -- We have data on 30k+ investors of various kinds. With their portfolio investments, target geographies, and funding sizes. More filters on the way.
It sounds more like a feature for AngelList/Gust/Crunchbase/CBInsights than a viable business by itself
Hundreds of small one-off sales to make $150k with an information product that needs updating all the time, can easily develop a [possibly undeserved] reputation for being "wrong", and a realistic prospect of competent bigger names developing a similar service to give away for free sounds like the absolute worst business case for a B2B play.
It's a more attractive opportunity for someone wanting to establish a name for themselves in the startup/VC ecosystem for other reasons though.
Exactly. There is not a biz (even a "lifestyle one"). Small highly fragmented market with high churn and low price is not a good recipe to build a business.
Plus, Fred is grossly underestimating the effort required. Just tracking & taxonomizing transactions is hard to do at scale.
I'm a co-founder of CB Insights. We do everything (and more) Fred outlines but our price point is more institutionally oriented.
Cool, but I worry about defensibility. What happens if VCs decide that they should be more transparent to avoid wasting time sifting through applicants? If they publish information on their own site, I'd think that as the primary source, people would use it instead of the proposed project.
However, it might be useful for there to be some sort of VC search engine. Where "investment strategy" would be one of the parameters.
There are a decent number of tools doing parts, or even most, of what he is suggesting (Crunchbase, Mattermark, AngelList, some of the paid tools like Pitchbook, CapIQ, etc). The key here seems to be that, despite presumably knowing and using all of these tools, Fred Wilson (my dad's name too, always weird to type it) still feels like a unified product would be valuable.
brandonb has an accurate comment that you could take pg's advice to do things that don't scale and do this manually to start, without building the tech Wilson talks about. I agree with him, that's probably good advice. At the same time, building the technology to do this seems like an interesting project to work on, regardless of its potential as a product, so it also qualifies under sama's startup advice to start with an interesting project. Maybe the right set of founders are a) someone with exposure and knowledge in the VC and investing space and b) someone who is interested in building some web crawlers and algorithms. The finance founder runs the web form and gets the business end going, as well as helps design the inputs to the algorithm, while the technical founder builds from the bottom up - first the web crawler, then the crunchbase joiner, etc.
I built this a couple years ago using Crunchbase data. I found the data to be helpful in weeding out the walking dead VCs so I could focus on the investors who were actively investing in bay area companies at the right stage and sector. My plan was to add a network component to show both first and second degree connections to the VC partners and executives at companies they had funded, but we closed our funding before I finished it.
Unfortunately, after a redesign, CrunchBase data is no longer open and requires you to be a part of an organization to use the API.
I had used the data myself. However, I found a few holes in the data, so it's fine that I cannot use it anymore. "Triangulate to figure out when the initial investment was made" is flat out impossible to do with any accuracy.
There's an extra step, which involves pulling out parameters and running through them in an ordered way, which would definitely be an interesting read.
Had a similar idea this morning, when I realized that I've likely been wasting my time on trying to pitch YC and a few other investors on a non-B2B app. Looking at their portfolios now, that's clearly not where their interests lie. See also my recent HN post.
"Venture capital firms don't do a great job on their websites of explaining what they invest in..."
Solution:
1 Entrepreneurs petition VC firms to all post a CSV file (Excel file saved as .csv) listing what they have invested in, dates, etc. Is there a single VC firm that does not have this data? Most likely the data has at some time been put into a spreadsheet or some form of report.
2 VC firms locate the spreadsheet, delete any confidential columns (Alt+Space, Alt+H,D,C), save as Comma Separated Values (.csv) and give the file to their web developers.
3 Web developers upload the file to web servers.
4 Entrepreneurs download CSV file.
Start off by NOT building any of the fancy infrastructure—the crawler, crunchbase joiner, LDA algorithm to cluster investors, etc.
But set up a form where somebody can enter their company idea, URL, and traction metrics and then you'll do the research and suggest 5-10 VC firms and partners.
You (the maker of the site) will learn a lot from being the wizard of oz behind the curtain.
And when you're done, you have two routes: 1) the "Google" approach: build algorithms to automate what you're doing already, 2) the "eBay" approach: let the VCs themselves onto the site, show them a stream of (anonymized?) pitches, and let them choose which entrepreneurs to meet. No need for an algorithm. Then the site becomes a marketplace to match startups and investors, a sort of an online, always-running demo day.