Conversation
|
So excited about this! In addition to the examples I previously made regarding training your own classifier, some examples using cosine similarity could be good! (and perhaps we want to build into ml5 a cosine similarity utility function). Last year I made a demo that matches your webcam image to the most similar image. It has some unfriendly to beginners code, but could be a model for an ml5 version. https://editor.p5js.org/ml_4_cc/sketches/CWE6Ox_jd I am hoping to make some videos about this in the future. |
…sion; Add webcam support.
|
Two newly implemented demos have been pushed to the
Feel free to check it out! |
| trainButton.mousePressed(function () { | ||
| classifier.train().then(function () { | ||
| console.log("Starting classification..."); | ||
| classifyVideo(); |
There was a problem hiding this comment.
Take a look at the new ml5.js startClassify() pattern in other examples. This eliminates the need for the "recursive" call to classifyVideo()!
There was a problem hiding this comment.
A new classifyStart() and predictStart() method has been implemented in 17afa84.
| // Initialize the feature extractor | ||
| featureExtractor = ml5.featureExtractor({ epochs: 100 }, modelReady); | ||
| // Create a new classifier using those features and with a video element | ||
| classifier = featureExtractor.classification(video, videoReady); |
There was a problem hiding this comment.
This matches exactly how we implemented the API in ml5.js prior to 1.0. @gohai I think this could be up for discussion. Do we like this methodology of first loading the extractor and then "turning it into" a classifier? Another option would be to use ml5.neuralNetwork() in conjunction with the feature extractor. This might involve more code, but could be better from a pedagogical standpoint. Would love your thoughts!
There was a problem hiding this comment.
For the loading methodology, I have an idea of introducing a task property that accepts two options: classification and regression. With this approach, loading a classifier or regressor would look like:
feClassifier = ml5.featureExtractor({ task: 'classification' }, modelReady);
feRegressor = ml5.featureExtractor({ task: 'regression' }, modelReady);There was a problem hiding this comment.
The two-step process of creating an extractor and then using it to create a classifier does indeed seem quite confusing to me. I would prefer a single-line solution, similar to what @JunhaoZhu0220 is proposing.
There was a problem hiding this comment.
I have implemented my design in e119244, so now
feClassifier = ml5.featureExtractor({ task: 'classification' }, modelReady);
feRegressor = ml5.featureExtractor({ task: 'regression' }, modelReady);is the new way to initialize the model as shown in the two demos.
There was a problem hiding this comment.
I think that's great. Users now get the feature extractor right away, and the task property mirrors the NeuralNetwork class.
(Leaving for others to chime in)
@shiffman @gohai For the cosine similarity utility, one demo that came up to my mind is handwritten digit recognition. Inspired by Yann Lecun et al.'s MNIST database, the feature extractor would first obtain feature embeddings for reference images of digits 0–9, then compute the embedding for a webcam frame where the user holds up a handwritten number on paper. The digit is identified by finding the reference embedding with the highest cosine similarity. Please give me some feedbacks on how you feel about this idea! |
| @@ -0,0 +1,69 @@ | |||
| let featureExtractor; | |||
There was a problem hiding this comment.
Thank you for this nice example inspired by @shiffman's video, @JunhaoZhu0220! 🤗
Since "pen" and "phone" are still fairly generic classes, and one might need to know that they aren't part of MobileNet to fully understand and apprechiate this functionality: wondering if making both buttons be contenteditable might be an idea worth trying out, in your opinion? (the user would be prompted to define their own classes, hits "done", and then continues in the way the example works currently)
There was a problem hiding this comment.
@gohai I do think this is a perfect idea to offer the users the feasibilty of editing the classes on their own! The interface in aa53f36 looks like
![]()
in which two classes will set defaultly to Class 1 and Class 2. If the user input customized class names in the text box, the names for the label will be changed accordingly.
There was a problem hiding this comment.
This is already much more friendly @JunhaoZhu0220 ✨
What I meant with contenteditable, and which you might still want to try to see if you like it is the HTML attribute of the same name, which allows users to type in the button directly. (Might make the p5 code shorter, but I haven't tested myself.)
(and if the user clicks "Done", then it might remain "Class #1" and "Class #2")
… updating the training hyperparameters within the .train() function.
… .train(); intialize the model with a single line code through "task" property; add training visualization.
…tion to avoid recursively calling .classify()
| let sampleCount = 0; | ||
| let predictedValue = 0; | ||
|
|
||
| function modelReady() { |
There was a problem hiding this comment.
Small nitpick: defining those function in code in the order that we expect them to be called (e.g. preload then setup etc) might make it slightly easier to read the sketch top-to-bottom
|
|
||
| function preload() { | ||
| // Initialize the feature extractor for regression | ||
| feRegressor = ml5.featureExtractor({ task: 'regression', version: 2 }, modelReady); |
There was a problem hiding this comment.
Is the version: 2 here necessary for it to function? (If yes, is there a drawback to changing our default from 1 to 2?)
There was a problem hiding this comment.
The reason why I passed version: 2 is that the regression task is different from what the mobilenet is originally designed for, so we might need to utilize a stronger (newer) model which can generalize the feature extraction better 🤗
If this may introduce some confusions to users, I can remove this passing and use the default version: 1, while indicating in the documentation that version: 2 will bring a better performance.
There was a problem hiding this comment.
Curious if changing the default to version 2 could be an option? Does this have downsides for the classification task perhaps?
There was a problem hiding this comment.
I believe version 2 also works perfectly for classification tasks. (will change the default version to 2 in next commit)
| video.hide(); | ||
| background(0); | ||
| // Set the video as the input for the Classifier | ||
| feClassifier.video = video; |
There was a problem hiding this comment.
It is probably nicer to have a dedicated method to setting the input here (which might be a video, but possibly also an image, canvas...) 🤔 Something to think together with @shiffman at some convenient time.
For context: previously, the video input got passed as an argument to the constructor, but since the feature extractor now gets created in preload(), we typically don't have the video element yet at this point.
Our other models, such as bodyPose, work around this by taking video as an argument to e.g. detectStart(). We could do the same, and require video as an argument to both addImage() and classifyStart(). Or, pass the source to the feature extractor at one point in time only - similarly to how @JunhaoZhu0220 is doing here.
This PR implements the
FeatureExtractormodule, adapted from the ml5.js library (v0.12.2). It follows the original API design and behavior as defined in that release, enabling transfer learning on top of a pre-trained MobileNet model for both classification and regression tasks.Changes
Refactored model loading logic: Instead of loading the full MobileNet and truncating inference at a specific layer (which varies between MobileNet v1 and v2), the new approach uses the
FeatureVectorvariant of MobileNet directly. Its output is then fed into an MLP for downstream training, making the pipeline cleaner and more version-agnostic.Fixed MLP input shape mismatch: The previous implementation hardcoded the MLP input shape based on MobileNet v1 with
alpha=1, without accounting for thealphahyperparameter that scales filter sizes across the network. This fix dynamically resolves the correct input shape based on the chosenalphavalue.TODO
Webcam Support — Currently, the feature extractor only supports static image uploads. Add real-time webcam input so users can perform inference directly from their computer camera.
Usage Examples — Design example demos to demonstrate the feature extractor in practice. I will draw inspiration from @shiffman's classification tutorial and regression tutorial. Open to suggestions on example design!
Model Save & Load — Add the ability to save a trained model to disk and reload it in a later session, avoiding the need to retrain from scratch each time.
Cosine Similarity Utility & Example — Implement a cosine similarity utility and a related demo.
Documentation