Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought the idea was interesting so here is a little PoC in Ruby:

    require 'facets'
    lines = Dir['**/*.rb', '**/*.py', '**/*.cpp'].map { |f| File.read(f).lines.map(&:chars) }.inject(&:+)
    puts lines.sort_by(&:entropy).map(&:join).last(10).reverse
Using: http://www.rubydoc.info/github/rubyworks/facets/Array%3Aentr...


Is it easy to modify this script to run over all lines that have ever existed in the repository history?

For example, could you pipe the output of `git log -p --all` through this and filter out all the commit hashes somehow?


Yup, just use:

    lines = `git log -p --all`.lines.map(&:chars)
So I found that `git grep /.+/ $(git rev-list --all)` is a better way to get the content of all the files: https://gist.github.com/Dorian/e1514535c3c5036cf327ce61eb34a...

But actually an hex number regexp might me far more accurate than the entropy (e.g.: secrets are often long hex numbers).

I tried it and it yields interesting results: https://gist.github.com/28110f0b8105db11e8973d1d0be85259




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: