-
Duplicates in the data.csv file (should figure it out why!)
-
Searching for pom.xml and build.gradle in the root directory (fine for now, can be improved in the future)
-
For the search API first 1000 repositories allowed (cannot get more than 1000 projects)
-
Figuring a way to filter out non-active projects
-
Do we need atomicity in data.csv?
-
May have better data filtering
-
May add more options for coverage and continuous integration tools