It works well! I remember being pleasantly surprised by this feature when I was searching for an unrelated document last time I logged in, and it found a bunch of images. I was like, wait, how did it know that...? It's a cool stealth feature.
Any chance Dropbox might want to add more DAMS-like features in the future, so that Dropbox can be used as a central image repository not just for photo originals but public websites, etc.? Kinda like Imgix or Cloudinary? I know our org already uses Dropbox to store asset originals, and it would be lovely to be able to hook in to a powerful API to serve it at different responsive sizes, formats, cropping, etc.
Thanks! I can't talk about potential future releases, but we do have some nice thumbnailing capabilities that could be used for this sort of thing if we decided to go that way. Some of these capabilities are exposed in our public API: https://dropbox.tech/developers/scaling-down-large-image-fil...
Very cool writeup! There is a lot of progress in multi-modal AI/image-text/video matching. Are you guys hiring researchers (who like to build things) in this area?
I'm more and more motivated to just cobble together my own dropbox (I know that several folks here have done just that) for my simple needs, to be what dropbox originally was. Their ui has become ridiculous, and 'features' like image spy serve them more than me.
Take a server with sftp access, so you can mount it locally and browse with winscp if you need to. Put filestash on top and you have a nice ui. That's probably 90% of the original features
For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
Isolated from everything else, that comment was true at the time and is still true today. You can't propose that as an alternative to Dropbox, but that is a specific context. eplanit specifically said "cobble together my own dropbox for my simple needs", so:
- I'm assuming they have the minimal technical skills to make this trivial and accomodate the rough edges that come with it: no easy "signup" flow but creating an ssh key and putting it on the server, for example. Not that it's much harder though
- I'm assuming they don't care about business viability because it's just for them
- I'm assuming they don't need a lot of features: automatic syncing, a simple ui for listing/downloading/uploading
Dropbox has become bloated and has tons of useless features I and most other HN users probably don't care about and find annoying. But the original core functionality still works really well and is a big quality of life improvement over more hacky personal backup implementations, IMO.
So I agree with the criticisms, but I think it's still a useful service at heart and worth the money. It just - very predictably - got way too big for its britches and tried to expand a useful app into a multi-domain conglomerated platform. Steve Jobs's comment that Dropbox is "a feature, not a product" (to which he might now add "and not a company/ecosystem") was pretty much right, I think. It's still a really good and well-executed feature, though.
That said, would I switch to something that was cheaper and gave me all the same core features with the same level of reliability? Probably.
Is this a joke about how the top comment on the HN post for dropbox was just "hmm don't see the point it's just FTP". Are people really doing this again?
Nextcloud looks great, but the effort of hardening the VPS is too much (for me at least). Hackers love an IPV4 address that has nothing but a login prompt to a Nextcloud instance.
You can run it behind a reverse proxy. It won't serve anything on the naked IP, the correct host is required in the http request headers.
I've been very happy selfhosting Nextcloud (and many others, including Vaultwarden). There are very few hits that even land on the login page, and essentially all of them only probe for /wp-admin or similar paths, then promptly leave me alone once all those probes return 404.
And then there's 2FA if any actually targeted attack ever materializes. Since it's entirely unknown what's inside the Nextcloud instance, there's no clear economic benefit (aka potential benefits are entirely uncertain, the instance might be vanilla). So I'm certain there's very little reason for anyone to actually try hard enough to achieve anything at all. Keep your system updated through the normal means and you're golden.
I've been running it on my server for a few years and haven't been hacked yet (well, at least as far as I can tell). I just update it every few months, and followed the recommended security settings when I set it up (the management page in the UI will list some issues).
I love this! Honestly, if i had heard the name "cloudwrap" on its own without your description, i would have thought: the uber of sandwich delivery...but i very much like your idea better! :-)
Image search is great.
But in terms of photo backups Dropbox is inferior to Apple's iPhoto/iCloud (I'm aware of Apple's ability for better integration) but still Google photos is much more functional.
I'm not asking for much but:
- better caching. Their thumbnails slows up badly trying to go back few years ago.
- better organization, search is great but I also like basic ability to visually search and organize content myself.
- live photos support...
I really think this is a missed opportunity in the photo space. There are plenty of editors and tools for photos out there. They can provide something that helps with organization.
Dropbox will do wonders if they had a Picasa like client that simply looked into the Dropbox folders and created albums based on folders and sub-folders structure.
I didn't know that they do that, and now that I know, I'm not sure if I ever wanted to use that feature and feel a bit uneasy about Dropbox scanning all my private photos.
It really feels like "your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should" situation.
The overwhelming sentiments of comments on this is negative, but as a fairly non-technical audience far removed from ML, I find this pretty fascinating.
The hilarious part of this, to me, is that it feels like this thread is just copying the (now-famous) HN thread announcing Dropbox.
As a technical user, I get why people are concerned. As someone that's seen this movie before, I tend to think that Dropbox has a better handle on what nontechnical users want than HN does.
I'm also non-technical and I do see the value for the customer, BUT it's very obvious there are issues here. Dropbox claims their service is secure and privacy preserving. Quote from their website:"All files you store in Dropbox are private. Other people can't see and open those files unless you purposely share links to files or share folders with others." Funnily enough in the next sentence already they somewhat water this down again, saying dropbox employees do access your files "on rare occasions", for example "when necessary to ensure that our systems and features are working as designed (e.g., debugging performance issues, making sure that our search functionality is returning relevant results, developing image search functionality, refining content suggestions, etc.)"
Yes, it's possible to do such indexing. They can also start running facial recognition software like Facebook does. What's going to happening next with all the data they collect? Doesn't take a pessimistic coder to know, any layperson should be able to figure this out.
Sure, it's fair to be concerned about that, but is it necessary to focus so much on that?
A company builds a new feature, it's useful, and even more they write a detailed article about how they built it. Isn't it fair to appreciate that and discuss the approach?
I get it, there are some issues with Dropbox, etc. but must every article about some company feature have a majority of comments talk about how they liked them better in the past, how they wanna stop using it, how environmentally problematic some of the company products are, and the likes? I think there's also a time and place to discuss those (e.g. on comments for an article about the business practices, or some medium opinion piece), but I'd prefer on submissions like this one we focus on the content at hand, and maybe have a single comment thread only dedicated to tangential issues about the company.
EDIT: TBF, I don't actually see that much unrelated negativity in this comment section.
Granted, I appreciate them explaining and properly disclosing it. So to be clear my criticism isn't Dropbox specific but this also goes for other cloud storage providers. That said just because others do the same (or even worse) doesn't make it any better. The loss of privacy and rights is one of the biggest issues in tech these days. I wouldn't even be that opposed to it if the business model would at least recognize the users as content creators and pay them for the value they add, but the way it is now we provide all the value and the shareholders reap the rewards. Humans have become a commodity, like cattle. Our private data is the milk, and they're milking us dry like no tomorrow.
> Sure, it's fair to be concerned about that, but is it necessary to focus so much on that?
Yes.
Third Party Doctrine says that anyone who stores information with third parties has no legal expectation of privacy. Technically, law enforcement can request the person's data without the person's consent or even knowledge.
Combine that with most people's phones automatically upload the pictures and videos they take.
Combine those with this analysis and law enforcement can start fishing expeditions with little to no effort.
The only thing standing between individuals and law enforcement having deep access to much of our information is the kindness of the tech companies holding the information. Great.
Why strawman for user hostile policies? Users don't even know that they are giving up privacy. How is that remotely acceptable? How is criticizing such a glaring issue being "negative about tech".
At some point it's just tiring to be a user that is continuously jerked around. I love tech, but everyday I understand the libre hermit POV more.
> Users don't even know that they are giving up privacy.
lol, Do you think the users care about privacy on Dropbox. The tech boffins might, but not the average user.
How have you been spied on as a Dropbox user? Can you do a better service than them? What alternative do you suggest that is in the same league as Dropbox?
And I'm stating facts about Dropbox. Users are not told that their data will be looked at by others. Having the keys to do it and actually doing it are two very different things.
Why are you asking all of these questions about the current market? Are you just trying to point out how bad the current situation is? If so, then I agree. That's why I and other users here paying attention are doling out criticism.
> And I'm stating facts about Dropbox. Users are not told that their data will be looked at by others. Having the keys to do it and actually doing it are two very different things.
OK, but so what?
Again the tech boffins would care about this enough to take action on this 'issue'.
You can always move your files away from Dropbox if you don't like it, I'm sure companies who are collecting information on you (name, email, browser, file_id) are doing this to improve the service that you've signed their TOS with.
> I love tech, but everyday I understand the libre hermit POV more.
Then don't use Dropbox or similar services then, it's that simple. There is always an SFTP/Rsync server waiting for you to upload your files on, or better yet for your usecase, an encrypted USB drive.
I think it's shameful how negative these comments are. I'm not here to make a value judgment on Dropbox, but the content of the post was fascinating and very well presented.
This is pretty basic keyword-category-vector stuff, and the results sadly match.
Try searching your dropbox for "the day I took a bunch of photos on the subway on the way to work", and you'll see none...
Yet dropbox probably has all the info to answer that search - they can parse the natural language query, they can detect when multiple photos were uploaded on the same day, they can look at location and time tags and see which ones might be 'on the way to work'. They can see which photos might visually look like they were taken on a subway.
How about "Me on dress-silly day". Again, no matches. Or "My broken arm". No matches. But I totally have that image.
Dropbox need to take a step back, and consider that for each query there probably is a correct answer. And they need to track what the user types, and which image they eventually view, as training data to refine their algorithm.
Do these complex queries work on google photos, apple photos or whatever the state of the art image search is? With search, performance is key which makes anything O(n^2) unusable.
For personal data search, the number of images that need to be searched is pretty small.
Typically rather than an indexing system, it's best to just do as much precomputation as possible so that a linear scan is fast. That scales up to 1M+ images/user.
Normally the approach taken is to preprocess all the images with a neural net (putting as input the image, metadata, some info from other images in the same location, same day, text from the web looked up from location coordinates, any other input that might answer a users query). Output an embedding vector of say 8192 elements.
Then when the query comes in, put it as input to some big pretrained language model with a fine tuned embedding layer to give another vector.
Then, for each image in the users account (1 million plus), run a tiny neural net to see if an image might be relevant. Such a network might only have a few thousand weights, and may only operate on part of the image and query vector. You'll probably want to use a GPU for this step, but it should work on a CPU too just about.
Take the top scoring few thousand images, and run a bigger comparison net for the final ranking.
You might want an extra input to the comparison net to give result diversity - ie. to try to avoid 50 very similar images all being returned at the top of the rankings.
Then all networks should be end-to-end trained on user behaviour - ie. the image users actually found that answered their query.
I think this is a nice approach. You may even be able to take it further; if you're training end to end based on users' queries, you can probably have the query and image representations in the same space and use a simple similarity measure in place of the tiny neural net (something like OpenAI's CLIP model).
The tricky part will be scaling it -- not just for speed, but keeping the index size down. Also, you'll need to already have some version of image search to collect the training data.
If you have concerns about unencrypted storage of personal photos, I’d request you to check out my project: ente.io[1]. It is essentially Google Photos, minus search capabilities, plus end-to-end encryption.
Looks good, but unfortunately my first thought is: this looks like a 1 or 2 person shop, how big is the risk of them calling it quits within 5 years from now?
For that reason, I use https://www.photosync-app.com/ and back up my photo's to my webhost and to B2: both the app and the storage locations are interchangeable in case any of them stops working.
> this looks like a 1 or 2 person shop, how big is the risk of them calling it quits within 5 years from now?
I'm not sure I agree with you. I would argue that a 1-2 persons shop actually has a much greater incentive to maintain a profitable small-scale business than a great group. For the former, it may be a comfortable addition to their income; for the latter, it might be an nth project that could turn out to not be profitable enough or be too much of a hassle to keep maintaining – the golden standard example being of course the Google Cemetery (https://gcemetery.co/).
An MMR of a few hundreds bucks is quite nice and worth cherishing for a single person, but it's more hassle than it's worth for a big company.
In our specific case, the business is setup such that it is self-sustaining. There's no free plan, so for as long as you are paying for your storage, we'll be profitable.
Outside that, I would say trust snowballs in the long run.
Also, I don't know if size of a company is a metric that should warrant additional trust. The mission could be diluted in a larger organization, and hard-pivots could hurt them lesser.
We can also change the fees for our services (other than those you have already contracted and paid for) at any time if we give you notice.
10.2 make you pay, on demand, default interest on any amount you owe us at 10% per annum calculated on a daily basis, from the date when payment was due until the date when payment is actually made by you. You will also need to pay all expenses and costs (including our full legal costs) in connection with us trying to recover any unpaid amount from you.
In circumstances where we cease providing our services for other reasons, we will, if we consider it appropriate, it is reasonably practicable and we are not prevented by law or likely to incur any liability in doing so, give you 30 days' notice to retrieve your data.
Hey, thanks for bringing this up. This was a part of the templated ToS, which at the point of framing did not sound unfair.
That said, I now realize that this better applies to a B2B SaaS, where in a defaulter could have consumed a large amount resources, resulting in non-trivial financial damage.
Given the context of ente.io, this is not a situation we have to be worried about, and the clause has now been removed.
> Looks good, but unfortunately my first thought is: this looks like a 1 or 2 person shop, how big is the risk of them calling it quits within 5 years from now?
As opposed to something like Google that constantly shuts their services down?
It's totally possible to implement client side indexing and search. If you have the time/budget to do so, I think you would have a much more compelling product.
You should see if you can use Squoosh to improve the loading time of the image a little. It loads a bit slow on my subpar connection. Awesome landing page though!
Ohh, this looks pretty cool. I was actually considering building such a thing myself, but I hadn't gotten around to it.
I tlooks quite a bit more expensive than google photos, but I could probably live with that. What I do want to know is where you are storing the photos, and what precautions you have taken to ensure data is not lost (assuming I still have access to my encryption key)
I haven't used the feature, but the way it's implemented feels overly complicated, especially for something like keyword search (and not similar-image-search).
If they only use the Top10 categories in their feature vector for the documents, why don't they store these categories as tags on each documented and use standard inverted-index searching and scoring. I know the vector will express how much "beach" a certain image is, but your user-supplied query doesn't have a notion of how "much" beach the user expects, so the output can be a simple list ranked using standard term search mechanisms. What am I missing?
Hi, I wrote this post. The retrieval stage amounts to storing the top 50 categories in the inverted index and searching for the top 10 from the query. The harder part is the ranking. There are approaches to the ranking that are more akin to the techniques you'd use for text document search -- think of the classifier scores as TF-IDF values, treat query-to-category matching as something like synonyms -- but to my mind they're more complicated than our approach, not less. We did experiment a bit with some of these ideas, but the results were worse.
I have a dropbox with Directories for all my projects. Let’s assume I also want to upload all my personal photos. Then I wonder
- how do I keep the photos separate from the projects (which also includes images)
- how will my photos go from my familys camera phones to the Dropbox?
- how do i go look at photos scrolling through dec 2018 on my phone for the next 2 min?
To me Google photos and drodpbox are conceptually different - photo albums vs files in Directories, and i can’t wrap my head around how albums could work in Dropbox.
Cosine similarity is a great little tool to have in your box. When you have a map[dict|obj] with key => val(float) it is an easy way to compare a bunch of them. Years ago I threw together an example in JS
I’m happy with iTunes match, which gives you access to anything you own on Apple’s servers for streaming, plus uploads whatever you have and they don’t have.
Maybe someone from Dropbox can add more color to this and explain what other options they considered.
As it stands, I still can't find what I need in Dropbox. And never could. From reading this article I'd think searching for a basic keyword like "dog" or "ship" or "runner" would yield some results from my tens of thousands of photos, yet I get nothing (nothing relevant, at least).
Edit: On second reading, this is only available to Dropbox Pro and Business users. I hope they roll this out to other paying users soon.
Search by image is conceptually easier because you don't have to map between text and images, but it's a very different product. It is something we've considered.
Encoding words and images into the same space and doing ANN is kind of what the current system is, if you look at it right. The ANN is framed in terms of similarity rather than distance -- and is approximate because of the sparseness approximation. But the big difference from the papers you linked is what we use as the encodings: not the traditional penultimate layer of a network, but classifier scores for images and projected word vectors for text. This gives us a space with semantically meaningful dimensions, which lets us build the system without a large multimodal training set; our text and image models are independently trained on different datasets.
If you're searching for a photo do you always have a similar photo to use as a reference? Text allows for people to simply search their photos based on what they think is in them. As a search feature that's going to have a lot more use for customers than image similarity. Image similarity has other uses but this is for search specifically.
I have Dropbox Plus. Image search only works on Professional? I am pretty happy with iPhone/iCloud image search. I sync to Dropbox and Google Photos as a backup. But my older photos are only in Dropbox. I am considering copying it all to iCloud, to have better management of it all.
I'm with you. I get don't get the reasoning for giving it to paying Pro users but not paying personal (Plus and Family) users. I deal with far more photos in my personal account than in any work-related account.
They've fully committed to being a B2B company, and that apparently means putting ordinary B2C features at B2B price tiers even when it makes no sense.
One thing which annoys me with lot of these tech blogs are that they don’t support RSS. I would like to subscribe and read them as lot of these information are generally a good read and learning.
To be fair, they have a neat feature to subscribe for their new posts by email but somehow RSS is harder?
RSS is essentially dead, sadly. Can't push ads or gather analytics through it so it's been "optimised out" of the web.
Even those rare sites that DO support RSS quite often only show the first paragraph or even just the title of the page, which I suppose is an acceptable compromise.
Image search is one of the pain points when you want to deploy your own cloud, and what I'd miss compared to Google Photo or that image search in Dropbox.