Просмотр исходного кода

updated readme, better logging

zricethezav 8 лет назад
Родитель
Сommit
1cecd5e090
7 измененных файлов с 131 добавлено и 88 удалено
  1. 4 6
      CHANGELOG.md
  2. 65 25
      README.md
  3. 12 13
      checks.go
  4. 5 1
      checks_test.go
  5. 21 22
      leaks.go
  6. 11 9
      main.go
  7. 13 12
      options.go

+ 4 - 6
CHANGELOG.md

@@ -6,16 +6,14 @@ CHANGELOG
 Version 0.2.0 of Gitleaks is the first version update since this got relatively popular. Based on the issues raised it seems that folks want better support for integration into their pipelines. I hear ya. This is what this update tries to provide. So... what are the changes?
 
 * Additionally regex checking
-* $HOME/.gitleaks/ directory
+* $HOME/.gitleaks/ directory for clones and reports
 * Clone into temp dir option
-* Persistant repos for Orgs and Users (no more re-cloning)
-* Pagination for Org/User list... no more partial repos
+* Persistent repos for Orgs and Users (no more re-cloning)
+* Pagination for Org/User list... no more partial repo lists
 * Since commit option
 * Updated README
 * Multi-staged Docker build
-* Travis tests
-* More tests
-
+* Travis CI
 
 
 0.1.0

+ 65 - 25
README.md

@@ -5,11 +5,6 @@
 
 ## Check git repos for secrets and keys
 
-### Features
-
-* Search all commits on all branches in topological order
-* Regex/Entropy checks
-
 #### Installing
 
 ```bash
@@ -24,34 +19,80 @@ go get -u github.com/zricethezav/gitleaks
 ./gitleaks {git url}
 ```
 
-This example will clone the target `{git url}` and run a diff on all commits. A report will be outputted to `{repo_name}_leaks.json`
-Gitleaks scans all lines of all commits and checks if there are any regular expression matches. The regexs are defined in `main.go`. Work largely based on  [https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf](https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf) and regexes from https://github.com/dxa4481/truffleHog and https://github.com/anshumanbh/git-all-secrets.
-
-##### gitLeaks User
-```bash
-./gitleaks -u {user git url}
+Gitleaks will clone the target `<git url>` to `$HOME/.gitleaks/clones/<repo name>` and run a regex check against all diffs of all commits on all remotes in topological order. If any leaks are found gitleaks will output the leak in json, Ex:
 ```
-##### gitLeaks Org
+{
+   "line": "-const AWS_KEY = \"AKIALALEMEL33243OLIAE\"",
+   "commit": "eaeffdc65b4c73ccb67e75d96bd8743be2c85973",
+   "string": "AKIALALEMEL33243OLIA",
+   "reason": "AWS",
+   "commitMsg": "remove fake key",
+   "time": "2018-02-04 19:43:28 -0600",
+   "author": "Zachary Rice",
+   "file": "main.go",
+   "repoURL": "https://github.com/zricethezav/gronit"
+}
+``` 
+Gitleaks will not re-clone repos unless the temporary flag is set (see Options section), instead gitleaks will `fetch` all new changes before the scan. This works for users and organization repos as well. Regex's for the scan are defined in `main.go`, feel free to open a PR and contribute if you have additional regex you want included. Work largely based on  [https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf](https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leaks-msr15.pdf) and regexes from https://github.com/dxa4481/truffleHog and https://github.com/anshumanbh/git-all-secrets.
+
+#### Example with Report
 ```bash
-./gitleaks -o {org git url}
+gitleaks --json https://github.com/zricethezav/gronit
+```
+This will run gitleaks on one of my projects, gronit and create the following structure in `$HOME/.gitleaks`:
 ```
+.
+├── clones
+│   └── zricethezav
+│       └── gronit
+│           ├── README.md
+│           ├── main.go
+│           ├── options.go
+│           ├── server.go
+│           └── utils.go
+└── report
+    └── zricethezav
+        └── gronit_leaks.json
+```
+The clones directory contains the repo owner (me) and any repos gitleaks has scanned. Next time we run gitleaks on gronit again we will `fetch` gronit rather than `clone`. Reports are written out to `$HOME/.gitleaks/report/<owner>/<repo>_leaks.json`
 
-#### Help
+#### Options
 ```
 usage: gitleaks [options] <url>
 
 Options:
- -c                     Concurrency factor (default is 10)
- -u --user              Git user url
- -r --repo              Git repo url
- -o --org               Git organization url
- -s --since             Scan until this commit (SHA)
- -b --b64Entropy        Base64 entropy cutoff (default is 70)
- -x --hexEntropy        Hex entropy cutoff (default is 40)
- -e --entropy           Enable entropy
- --strict               Enables stopwords
- -h --help              Display this message
+ -c --concurrency 	Upper bound on concurrent diffs
+ -u --user 		    Git user url
+ -r --repo 		    Git repo url
+ -o --org 		    Git organization url
+ -s --since 		Commit to stop at
+ -b --b64Entropy 	Base64 entropy cutoff (default is 70)
+ -x --hexEntropy  	Hex entropy cutoff (default is 40)
+ -e --entropy		Enable entropy		
+ -j --json 		    Output gitleaks report
+ --token    		Github API token
+ --strict 		    Enables stopwords
+ -h --help 		    Display this message
+
 ```
+
+##### Options Explained
+
+| Option | Explanation |
+| ------------- | ------------- |
+| -c --concurrency | Set the limit on the number of concurrent diffs. If unbounded, your system would throw a `too many open files` error. Tweak `ulimit` for quicker scans at your own risk. Ex: `gitleaks -c 100 <repo_url>` |
+| -u --user | Target git user. Reports and clones are dumped to `$HOME/.gitleaks/clones/<user>/<user_repos>` and `$HOME/.gitleaks/reports/<user>/<gitleaks_reports>`. Ex: `gitleaks -u <user_git_url>`.
+| -o --org | Target git organization. Reports and clones are dumped to `$HOME/.gitleaks/clones/<org>/<org_repos>` and `$HOME/.gitleaks/reports/<org>/<gitleaks_reports>`. Ex: `gitleaks -o <org_git_url>`
+| -r --repo | Default behavior is to have gitleaks target a specific repo, so this option is unecessary, but... Target git repo. Reports and clones are dumped to `$HOME/.gitleaks/clones/<owner>/<repos>` and `$HOME/.gitleaks/reports/<owner>/<gitleaks_reports>`
+| -s --since  | Since argument accepts a commit hash and will scan the repo history up to and including this hash. Ex: `gitleaks -s <HASH> <repo_url>`
+| -b --b64Entropy | Entropy cutoff for base 64 characters. Ex: `gitleaks -e -b 70 <repo_url>` |
+| -x --hexEntropy | Entropy cutoff for hex characters. Ex: `gitleaks -e -x 70 <repo_url>` |
+| -e --entroy | Enable entropy checks. Ex: `gitleaks -e <repo_url>` |
+| -j --json | Enable report generation. Ex: `gitleaks --json <repo_url>` | 
+| -t --temporary | Cloned repos will be cloned into a temp directory and removed after gitleaks exits. Ex: `gitleaks -t <repo_url>` |
+| --token | NOTE: you should use env var `GITHUB_TOKEN` instead of this flag. Github API token needed for scanning private repos and pagination on repo fetching from github's api. |
+| -- strict | Enable stopwords. Ex: `gitleaks --strict <repo_url>` |
+
 NOTE: your mileage may vary so if you aren't getting the results you expected try updating the regexes to fit your needs or try tweaking the entropy cutoffs and stopwords. Entropy cutoff for base64 alphabets seemed to give good results around 70 and hex alphabets seemed to give good results around 40. Entropy is calculated using [Shannon entropy](http://www.bearcave.com/misl/misl_tech/wavelets/compression/shannon.html).
 
 
@@ -69,4 +110,3 @@ docker build -t gitleaks .
 docker run --rm --name=gitleaks gitleaks https://github.com/zricethezav/gitleaks
 ```
 
-

+ 12 - 13
checks.go

@@ -1,13 +1,11 @@
 package main
 
 import (
+	_ "fmt"
 	"math"
 	"strings"
-	_"fmt"
-	"regexp"
 )
 
-
 // TODO LOCAL REPO!!!!
 
 // checks Regex and if enabled, entropy and stopwords
@@ -19,12 +17,13 @@ func doChecks(diff string, commit Commit, opts *Options, repo RepoDesc) []LeakEl
 	)
 
 	lines := strings.Split(diff, "\n")
-	file := ""
+	file := "unable to determine file"
 	for _, line := range lines {
-		if strings.Contains(line, "diff --git a"){
-			re := regexp.MustCompile("diff --git a.+b/")
-			idx := re.FindStringIndex(line)
-			file = line[idx[1]:]
+		if strings.Contains(line, "diff --git a") {
+			idx := fileDiffRegex.FindStringIndex(line)
+			if len(idx) == 2 {
+				file = line[idx[1]:]
+			}
 		}
 
 		for leakType, re := range regexes {
@@ -40,11 +39,11 @@ func doChecks(diff string, commit Commit, opts *Options, repo RepoDesc) []LeakEl
 				Commit:   commit.Hash,
 				Offender: match,
 				Reason:   leakType,
-				Msg: commit.Msg,
-				Time: commit.Time,
-				Author: commit.Author,
-				File: file,
-				RepoURL: repo.url,
+				Msg:      commit.Msg,
+				Time:     commit.Time,
+				Author:   commit.Author,
+				File:     file,
+				RepoURL:  repo.url,
 			}
 			leaks = append(leaks, leak)
 		}

+ 5 - 1
checks_test.go

@@ -12,13 +12,17 @@ func TestCheckRegex(t *testing.T) {
 		HexEntropyCutoff: 40,
 		Entropy:          false,
 	}
+	repo := RepoDesc{
+		url: "someurl",
+	}
+	commit := Commit{}
 	checks := map[string]int{
 		"aws=\"AKIALALEMEL33243OLIAE": 1,
 		"aws\"afewafewafewafewaf\"":   0,
 	}
 
 	for k, v := range checks {
-		results = doChecks(k, "commit", opts)
+		results = doChecks(k, commit, opts, repo)
 		if v != len(results) {
 			t.Errorf("regexCheck failed on string %s", k)
 		}

+ 21 - 22
leaks.go

@@ -10,9 +10,9 @@ import (
 	"os/exec"
 	"os/signal"
 	"path/filepath"
+	"strings"
 	"sync"
 	"syscall"
-	"strings"
 )
 
 // LeakElem contains the line and commit of a leak
@@ -21,24 +21,24 @@ type LeakElem struct {
 	Commit   string `json:"commit"`
 	Offender string `json:"string"`
 	Reason   string `json:"reason"`
-	Msg 	 string `json:"commitMsg"`
-	Time 	 string `json:"time"`
+	Msg      string `json:"commitMsg"`
+	Time     string `json:"time"`
 	Author   string `json:"author"`
 	File     string `json:"file"`
 	RepoURL  string `json:"repoURL"`
 }
 
 type Commit struct {
-	Hash string
+	Hash   string
 	Author string
-	Time string
-	Msg string
+	Time   string
+	Msg    string
 }
 
-func rmTmp(owner *Owner){
+func rmTmp(owner *Owner) {
 	if _, err := os.Stat(owner.path); err == nil {
 		err := os.RemoveAll(owner.path)
-		log.Printf("Cleaning up tmp repos in %s\n", owner.path)
+		log.Printf("\nCleaning up tmp repos in %s\n", owner.path)
 		if err != nil {
 			log.Printf("failed to properly remove tmp gitleaks dir: %v", err)
 		}
@@ -49,7 +49,7 @@ func rmTmp(owner *Owner){
 // start
 func start(repos []RepoDesc, owner *Owner, opts *Options) {
 	var report []LeakElem
-	if opts.Tmp{
+	if opts.Tmp {
 		defer rmTmp(owner)
 	}
 
@@ -87,7 +87,7 @@ func start(repos []RepoDesc, owner *Owner, opts *Options) {
 			fmt.Printf("Cloning \x1b[37;1m%s\x1b[0m...\n", repo.url)
 			err := exec.Command("git", "clone", repo.url).Run()
 			if err != nil {
-				log.Printf("failed to clone repo %v", err)
+				fmt.Printf("failed to clone repo %v", err)
 				return
 			}
 			report = getLeaks(repo, owner, opts)
@@ -116,6 +116,7 @@ func outputGitLeaksReport(report []LeakElem, repo RepoDesc, opts *Options) {
 	if err != nil {
 		log.Fatalf("Can't write to file: %s", err)
 	}
+	fmt.Printf("Report written to %s\n", reportFile)
 }
 
 // getLeaks will attempt to find gitleaks
@@ -129,9 +130,6 @@ func getLeaks(repo RepoDesc, owner *Owner, opts *Options) []LeakElem {
 		report            []LeakElem
 	)
 	semaphoreChan := make(chan struct{}, opts.Concurrency)
-	if opts.Tmp{
-		defer rmTmp(owner)
-	}
 
 	go func(commitWG *sync.WaitGroup, gitLeakReceiverWG *sync.WaitGroup) {
 		for gitLeak := range gitLeaks {
@@ -163,9 +161,6 @@ func getLeaks(repo RepoDesc, owner *Owner, opts *Options) []LeakElem {
 		if commit.Hash == "" {
 			continue
 		}
-		if commit.Hash == opts.SinceCommit {
-			break
-		}
 
 		commitWG.Add(1)
 		go func(currCommit Commit, repoName string, commitWG *sync.WaitGroup,
@@ -181,8 +176,8 @@ func getLeaks(repo RepoDesc, owner *Owner, opts *Options) []LeakElem {
 			<-semaphoreChan
 
 			if err != nil {
-				if strings.Contains(err.Error(), "too many files open"){
-					fmt.Printf("error retrieving diff for commit %s. Try turning concurrency down. %v\n", currCommit, err)
+				if strings.Contains(err.Error(), "too many files open") {
+					log.Printf("error retrieving diff for commit %s. Try turning concurrency down. %v\n", currCommit, err)
 				}
 				if opts.Tmp {
 					rmTmp(owner)
@@ -199,6 +194,10 @@ func getLeaks(repo RepoDesc, owner *Owner, opts *Options) []LeakElem {
 			}
 
 		}(commit, repo.name, &commitWG, &gitLeakReceiverWG, opts)
+
+		if commit.Hash == opts.SinceCommit {
+			break
+		}
 	}
 
 	commitWG.Wait()
@@ -208,12 +207,12 @@ func getLeaks(repo RepoDesc, owner *Owner, opts *Options) []LeakElem {
 
 func parseFormattedRevList(revList [][]byte) []Commit {
 	var commits []Commit
-	for i := 0; i < len(revList)-1; i=i+5 {
+	for i := 0; i < len(revList)-1; i = i + 5 {
 		commit := Commit{
-			Hash: string(revList[i+1]),
+			Hash:   string(revList[i+1]),
 			Author: string(revList[i+2]),
-			Msg: string(revList[i+3]),
-			Time: string(revList[i+4]),
+			Msg:    string(revList[i+3]),
+			Time:   string(revList[i+4]),
 		}
 		commits = append(commits, commit)
 	}

+ 11 - 9
main.go

@@ -16,13 +16,14 @@ import (
 )
 
 var (
-	regexes      map[string]*regexp.Regexp
-	stopWords    []string
-	base64Chars  string
-	hexChars     string
-	assignRegex  *regexp.Regexp
-	gitLeaksPath string
-	gitLeaksClonePath string
+	regexes            map[string]*regexp.Regexp
+	stopWords          []string
+	base64Chars        string
+	hexChars           string
+	assignRegex        *regexp.Regexp
+	fileDiffRegex      *regexp.Regexp
+	gitLeaksPath       string
+	gitLeaksClonePath  string
 	gitLeaksReportPath string
 )
 
@@ -38,7 +39,7 @@ type Owner struct {
 	url         string
 	accountType string
 	path        string
-	reportPath string
+	reportPath  string
 }
 
 func init() {
@@ -55,11 +56,12 @@ func init() {
 		"Twitter":  regexp.MustCompile("(?i)twitter.*['|\"][0-9a-zA-Z]{35,44}['|\"]"),
 		"Github":   regexp.MustCompile("(?i)github.*[['|\"]0-9a-zA-Z]{35,40}['|\"]"),
 		"Reddit":   regexp.MustCompile("(?i)reddit.*['|\"][0-9a-zA-Z]{14}['|\"]"),
-		"Heroku": regexp.MustCompile("(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}"),
+		"Heroku":   regexp.MustCompile("(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}"),
 		"AWS":      regexp.MustCompile("AKIA[0-9A-Z]{16}"),
 		// "Custom": regexp.MustCompile(".*")
 	}
 	assignRegex = regexp.MustCompile(`(=|:|:=|<-)`)
+	fileDiffRegex = regexp.MustCompile("diff --git a.+b/")
 	homeDir, err := homedir.Dir()
 	if err != nil {
 		log.Fatal("Cant find home dir")

+ 13 - 12
options.go

@@ -9,15 +9,17 @@ import (
 const usage = `usage: gitleaks [options] <url>
 
 Options:
- -c --concurrency 	Concurrency factor (default is 10)
- -u --user 			Git user url
- -r --repo 			Git repo url
- -o --org 			Git organization url
+ -c --concurrency 	Upper bound on concurrent diffs
+ -u --user 		Git user url
+ -r --repo 		Git repo url
+ -o --org 		Git organization url
  -s --since 		Commit to stop at
  -b --b64Entropy 	Base64 entropy cutoff (default is 70)
  -x --hexEntropy  	Hex entropy cutoff (default is 40)
  -e --entropy		Enable entropy		
+ -j --json 		Output gitleaks report
  -h --help 		Display this message
+ --token    		Github API token
  --strict 		Enables stopwords
 `
 
@@ -37,7 +39,7 @@ type Options struct {
 	Tmp              bool
 	EnableJSON       bool
 	Token            string
-	Verbose 		 bool
+	Verbose          bool
 }
 
 // help prints the usage string and exits
@@ -103,20 +105,14 @@ func parseOptions(args []string) *Options {
 			opts.OrgURL = optionsNextString(args, &i)
 		case "-u", "--user":
 			opts.UserURL = optionsNextString(args, &i)
-		case "-p", "--persist":
-			opts.UserURL = optionsNextString(args, &i)
 		case "-r", "--repo":
 			opts.RepoURL = optionsNextString(args, &i)
-		case "-f", "--forks":
-			opts.IncludeForks = true
 		case "-t", "--temporary":
 			opts.Tmp = true
-		case "-gt", "--token":
+		case "--token":
 			opts.Token = optionsNextString(args, &i)
 		case "-j", "--json":
 			opts.EnableJSON = true
-		case "-v", "--verbose":
-			opts.Verbose = true
 		case "-h", "--help":
 			help()
 			return nil
@@ -131,5 +127,10 @@ func parseOptions(args []string) *Options {
 		}
 	}
 
+	// "guards"
+	if opts.Tmp && opts.EnableJSON {
+		fmt.Println("Report generation with temporary clones not supported")
+	}
+
 	return opts
 }