Lighthouseapp attachments indexed and searchable in bulk from google

Mike K's Avatar

Mike K

03 Apr, 2017 09:51 PM

Are you guys aware that if you google "inurl:activereload-lighthouse.s3.amazonaws.com" and a file extension like .log, .jpg, or .png you can turn up a lot of documents? Probably got indexed from public threads, but as people are uploading logfiles and such, it's probably not the best idea from a security standpoint. If you can prevent google from spidering attachments, or find some way to keep direct URLs to your S3 bucket from being exposed, it might be something to consider implementing.

  1. Support Staff 1 Posted by brandi on 04 Apr, 2017 07:45 PM

    brandi's Avatar

    Hi Mike,

    Those are all on public projects.

  2. 2 Posted by Mike K on 13 Apr, 2017 12:32 AM

    Mike K's Avatar

    Yes, but there's absolutely nothing in place to ensure that end users know that.

    I uploaded some mail logs containing sensitive identifying information to a lighthouseapp forum as part of a support request. I was totally fine sharing them with the developer I was seeking support from, or others on the board, but that doesn't mean I knew or would have approved of them being indexed and retrievable in Google by a simple global search for my one of my email addresses. Needless to say, I was unpleasantly surprised to find them there that way.

    Now that I know, I'll only upload attachments in passworded archive files. But I shouldn't have had to learn that the hard way, nor should the next person to unknowingly upload sensitive info they don't know is going to be propagated in the minutest detail into search engine indices far and wide.

    Besides which, publicly files and URL of your S3 bucket is sloppy and an invitation to hackers. You should at least put it behind some sort of CDN or even a cgi script or something. A malicious script kiddie could run you up thousands of dollars in AWS bandwidth charges. You don't even have a referrer check on it. Anyone could host their website resources on it at your expense.

    Anyway, that's it, you have the information, do what you will with it. But just washing your hands of any responsibility for the user security risks you're enabling is a really callous way to treat your users.

  3. Support Staff 3 Posted by Tiger Team on 13 Apr, 2017 03:10 AM

    Tiger Team's Avatar

    Thanks for the advice. I'll investigate further. I'm sorry for the shock of
    finding your sensitive data on google. They really do jump through hoops to
    find and index everything. You should probably go back and delete those
    files you uploaded so they can't be viewed later. Incidentally would you
    have uploaded the same file to somewhere like stackoverflow or github?
    Why/why not?

    I've added some limits to google indexing the files so they'll soon all
    drop off the index. We do actually enforce authenticated reads on files,
    (i.e. you can't just link to a file, you need to get the request hmac
    signed from lighthouse). The timeout limit on generated files is a bit
    higher than it could be, but some of our users do indeed host and link to
    their public files out of their attachments so I imagine I'll hear from
    them when I start enforcing referers and lower the limit on the signature.
    We don't have a CDN on the files because that'd be just paying double for
    the bandwidth, which is already in the high hundreds of GB per month.

    --courtenay

Discussions are closed to public comments.
If you need help with Lighthouse please start a new discussion.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac