What happened
While I was working on some visualizers and checking the Django Admin panel, I noticed that IntelOwl is creating a lot of redundant rows in the Data Model tables (IPDataModel, DomainDataModel, etc.).
Basically, every time an analyzer runs successfully, it creates a brand new row in the database. If I run three different analyzers for the same IP (like 1.1.1.1), and they all find the same ASN and ISP info, I end up with three identical rows in the database instead of just one. Over time, this is going to make the database much larger and slower than it
needs to be.
Environment
What did you expect to happen
If the intelligence data (the dictionary) returned by an analyzer is exactly the same as something we already have in the database, the system should be smart enough to just link to the existing record instead of printing a new one.
How to reproduce your issue
- Run a few scans for the same IP (e.g., 8.8.8.8) using analyzers like AbuseIPDB.
- Go to the Django Admin panel at /admin/data_model_manager/ipdatamodel/.
- You'll see multiple rows that have different IDs (PKs) but the exact same data (same ASN, same Org Name, etc.).
- I actually found a note in the code where the maintainers already caught this! In api_app/analyzers_manager/models.py on line 124, there's a # TODO that says we don't need to actually crate a new object every time. if the report is the same of the previous one, we can just link it.
Error messages and logs
There aren't any errors per se because the code doesn't crash, but you can see the bloat
in the database records. Here is an example of what my IPDataModel table looks like after a
few scans of the same IP:
- PK 230 | ASN 13335 | ORG Cloudflare | COUNTRY US
- PK 231 | ASN 13335 | ORG Cloudflare | COUNTRY US
- PK 232 | ASN 13335 | ORG Cloudflare | COUNTRY US
As you can see, IDs 230, 231, and 232 are identical. We should probably be using something
like get_or_create with a fingerprint/hash to handle this.
What happened
While I was working on some visualizers and checking the Django Admin panel, I noticed that IntelOwl is creating a lot of redundant rows in the Data Model tables (IPDataModel, DomainDataModel, etc.).
Basically, every time an analyzer runs successfully, it creates a brand new row in the database. If I run three different analyzers for the same IP (like 1.1.1.1), and they all find the same ASN and ISP info, I end up with three identical rows in the database instead of just one. Over time, this is going to make the database much larger and slower than it
needs to be.
Environment
What did you expect to happen
If the intelligence data (the dictionary) returned by an analyzer is exactly the same as something we already have in the database, the system should be smart enough to just link to the existing record instead of printing a new one.
How to reproduce your issue
Error messages and logs
There aren't any errors per se because the code doesn't crash, but you can see the bloat
in the database records. Here is an example of what my IPDataModel table looks like after a
few scans of the same IP:
As you can see, IDs 230, 231, and 232 are identical. We should probably be using something
like get_or_create with a fingerprint/hash to handle this.