Generated code, ideally, should not be committed to version control. Committing generated code can sometimes speed up testing and code generation but it is a design smell. It is better to cache generated code via CI caching. Committing generated code to version control is the worst as it is hard to even detect the difference.
However, there are a few specific circumstances where committing generated code/config/data to version control is worth it.
- The generated code is text – so that, one can see the diff and how it evolves over time. And,
- Tracking changes to generated code – a perfect example of this is committing
openapi.json
specifications to version control. So, any breaking modifications to it can be tracked via CI.
Should Machine Learning models be committed to version control? I don’t think so. These are huge binary blobs. It is best to put them in an Amazon S3 bucket/Google Cloud Storage bucket and use that as a version control.