Skip to content

[WIP] Add MobileNet Feature Extractor#301

Draft
JunhaoZhu0220 wants to merge 15 commits intomainfrom
feature-extractor
Draft

[WIP] Add MobileNet Feature Extractor#301
JunhaoZhu0220 wants to merge 15 commits intomainfrom
feature-extractor

Conversation

@JunhaoZhu0220
Copy link
Copy Markdown

@JunhaoZhu0220 JunhaoZhu0220 commented Mar 3, 2026

This PR implements the FeatureExtractor module, adapted from the ml5.js library (v0.12.2). It follows the original API design and behavior as defined in that release, enabling transfer learning on top of a pre-trained MobileNet model for both classification and regression tasks.

Changes

  • Refactored model loading logic: Instead of loading the full MobileNet and truncating inference at a specific layer (which varies between MobileNet v1 and v2), the new approach uses the FeatureVector variant of MobileNet directly. Its output is then fed into an MLP for downstream training, making the pipeline cleaner and more version-agnostic.

  • Fixed MLP input shape mismatch: The previous implementation hardcoded the MLP input shape based on MobileNet v1 with alpha=1, without accounting for the alpha hyperparameter that scales filter sizes across the network. This fix dynamically resolves the correct input shape based on the chosen alpha value.

TODO

  • Webcam Support — Currently, the feature extractor only supports static image uploads. Add real-time webcam input so users can perform inference directly from their computer camera.

  • Usage Examples — Design example demos to demonstrate the feature extractor in practice. I will draw inspiration from @shiffman's classification tutorial and regression tutorial. Open to suggestions on example design!

  • Model Save & Load — Add the ability to save a trained model to disk and reload it in a later session, avoiding the need to retrain from scratch each time.

  • Cosine Similarity Utility & Example — Implement a cosine similarity utility and a related demo.

  • Documentation

@JunhaoZhu0220 JunhaoZhu0220 marked this pull request as draft March 3, 2026 10:09
@shiffman
Copy link
Copy Markdown
Member

shiffman commented Mar 3, 2026

So excited about this! In addition to the examples I previously made regarding training your own classifier, some examples using cosine similarity could be good! (and perhaps we want to build into ml5 a cosine similarity utility function). Last year I made a demo that matches your webcam image to the most similar image. It has some unfriendly to beginners code, but could be a model for an ml5 version.

https://editor.p5js.org/ml_4_cc/sketches/CWE6Ox_jd

I am hoping to make some videos about this in the future.

@JunhaoZhu0220
Copy link
Copy Markdown
Author

Two newly implemented demos have been pushed to the feature-extractor branch:

  1. featureExtractor-webcam-classifier — Classify between two categories (pens and phones) using the webcam.

  2. featureExtractor-webcam-regressor — Control the size of a circle by moving your face closer to or farther from the camera.

Feel free to check it out!

trainButton.mousePressed(function () {
classifier.train().then(function () {
console.log("Starting classification...");
classifyVideo();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the new ml5.js startClassify() pattern in other examples. This eliminates the need for the "recursive" call to classifyVideo()!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new classifyStart() and predictStart() method has been implemented in 17afa84.

// Initialize the feature extractor
featureExtractor = ml5.featureExtractor({ epochs: 100 }, modelReady);
// Create a new classifier using those features and with a video element
classifier = featureExtractor.classification(video, videoReady);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches exactly how we implemented the API in ml5.js prior to 1.0. @gohai I think this could be up for discussion. Do we like this methodology of first loading the extractor and then "turning it into" a classifier? Another option would be to use ml5.neuralNetwork() in conjunction with the feature extractor. This might involve more code, but could be better from a pedagogical standpoint. Would love your thoughts!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the loading methodology, I have an idea of introducing a task property that accepts two options: classification and regression. With this approach, loading a classifier or regressor would look like:

feClassifier = ml5.featureExtractor({ task: 'classification' }, modelReady);
feRegressor = ml5.featureExtractor({ task: 'regression' }, modelReady);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two-step process of creating an extractor and then using it to create a classifier does indeed seem quite confusing to me. I would prefer a single-line solution, similar to what @JunhaoZhu0220 is proposing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented my design in e119244, so now

feClassifier = ml5.featureExtractor({ task: 'classification' }, modelReady);
feRegressor = ml5.featureExtractor({ task: 'regression' }, modelReady);

is the new way to initialize the model as shown in the two demos.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's great. Users now get the feature extractor right away, and the task property mirrors the NeuralNetwork class.

(Leaving for others to chime in)

@JunhaoZhu0220
Copy link
Copy Markdown
Author

So excited about this! In addition to the examples I previously made regarding training your own classifier, some examples using cosine similarity could be good! (and perhaps we want to build into ml5 a cosine similarity utility function). Last year I made a demo that matches your webcam image to the most similar image. It has some unfriendly to beginners code, but could be a model for an ml5 version.

https://editor.p5js.org/ml_4_cc/sketches/CWE6Ox_jd

I am hoping to make some videos about this in the future.

@shiffman @gohai For the cosine similarity utility, one demo that came up to my mind is handwritten digit recognition. Inspired by Yann Lecun et al.'s MNIST database, the feature extractor would first obtain feature embeddings for reference images of digits 0–9, then compute the embedding for a webcam frame where the user holds up a handwritten number on paper. The digit is identified by finding the reference embedding with the highest cosine similarity.

Please give me some feedbacks on how you feel about this idea!

@@ -0,0 +1,69 @@
let featureExtractor;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this nice example inspired by @shiffman's video, @JunhaoZhu0220! 🤗

Since "pen" and "phone" are still fairly generic classes, and one might need to know that they aren't part of MobileNet to fully understand and apprechiate this functionality: wondering if making both buttons be contenteditable might be an idea worth trying out, in your opinion? (the user would be prompted to define their own classes, hits "done", and then continues in the way the example works currently)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gohai I do think this is a perfect idea to offer the users the feasibilty of editing the classes on their own! The interface in aa53f36 looks like
image
in which two classes will set defaultly to Class 1 and Class 2. If the user input customized class names in the text box, the names for the label will be changed accordingly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already much more friendly @JunhaoZhu0220

What I meant with contenteditable, and which you might still want to try to see if you like it is the HTML attribute of the same name, which allows users to type in the button directly. (Might make the p5 code shorter, but I haven't tested myself.)

e.g.
Screenshot 2026-03-21 at 3 45 14 PM
Screenshot 2026-03-21 at 3 45 49 PM
Screenshot 2026-03-21 at 3 47 45 PM

(and if the user clicks "Done", then it might remain "Class #1" and "Class #2")

let sampleCount = 0;
let predictedValue = 0;

function modelReady() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick: defining those function in code in the order that we expect them to be called (e.g. preload then setup etc) might make it slightly easier to read the sketch top-to-bottom


function preload() {
// Initialize the feature extractor for regression
feRegressor = ml5.featureExtractor({ task: 'regression', version: 2 }, modelReady);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the version: 2 here necessary for it to function? (If yes, is there a drawback to changing our default from 1 to 2?)

Copy link
Copy Markdown
Author

@JunhaoZhu0220 JunhaoZhu0220 Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I passed version: 2 is that the regression task is different from what the mobilenet is originally designed for, so we might need to utilize a stronger (newer) model which can generalize the feature extraction better 🤗
If this may introduce some confusions to users, I can remove this passing and use the default version: 1, while indicating in the documentation that version: 2 will bring a better performance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if changing the default to version 2 could be an option? Does this have downsides for the classification task perhaps?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe version 2 also works perfectly for classification tasks. (will change the default version to 2 in next commit)

video.hide();
background(0);
// Set the video as the input for the Classifier
feClassifier.video = video;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably nicer to have a dedicated method to setting the input here (which might be a video, but possibly also an image, canvas...) 🤔 Something to think together with @shiffman at some convenient time.

For context: previously, the video input got passed as an argument to the constructor, but since the feature extractor now gets created in preload(), we typically don't have the video element yet at this point.
Our other models, such as bodyPose, work around this by taking video as an argument to e.g. detectStart(). We could do the same, and require video as an argument to both addImage() and classifyStart(). Or, pass the source to the feature extractor at one point in time only - similarly to how @JunhaoZhu0220 is doing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants