FastCSSTool: A No‑Code Platform for Computational Social Science Text Classification

FastCSSTool: A No‑Code Platform for Computational Social Science Text Classification

🎯 Overview

👋🏻 Welcome to the Field of Computational Social Science!

Fast CSS Tool is here to simplify your data analysis journey. This intuitive application is designed to assist social scientists in analyzing digital datasets, including social media data, with ease.

Even if you have little to no coding experience, Fast CSS Tool makes it easy to preprocess, filter, and classify your data. The tool automates many complex backend processes, leveraging machine learning algorithms to streamline your workflow.

Whether you're new to coding or simply looking for a more efficient way to handle your data, Fast CSS Tool provides a user-friendly interface to help you get the job done.

Features

Data Preprocessing: Import data, apply manual and AI-based filters.
Model Training: Train multiple machine learning models using SentenceTransformers, Microsoft's FLAML and scikit-learn.
Model Evaluation: Evaluate models using various metrics.
Export Results: Save models and export evaluation results.

Installation

Prerequisites

Python 3.10 or later
pip (Python package installer)

Windows Installation

Download and Install Miniconda:
- Download the Miniconda installer for Windows from here.
- Run the installer and follow the prompts to install Miniconda.
Create and Activate Virtual Environment:
- Open Command Prompt:
  - Press Win + R, type cmd, and press Enter.
- Navigate to the Project Directory:
  - Use the cd command to navigate to the directory where you have saved the Fast CSS Tool project. For example:
```
cd fastcsstool
```
- Create a Virtual Environment:
  - Run the following command to create a virtual environment named fastcsstool-env:
```
conda create -y -p %cd%\\fastcsstool-env python=3.10
```
- Activate the Virtual Environment:
  - Run the following command to activate the virtual environment:
```
call fastcsstool-env\\Scripts\\activate
```
Install PyTorch with GPU Support:
- Run the following command to install PyTorch with GPU support:
```
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
Install Other Dependencies:
- Run the following command to install the required Python packages:
```
pip install -r requirements.txt
```
Run the Application:
- Ensure the virtual environment is activated.
- Run the application by executing:
```
python main.py
```

Mac Installation

Download and Install Miniconda:
- Download the Miniconda installer for Mac from here.
- Open Terminal and navigate to the directory where the installer is downloaded.
- Run the installer with the following command:
```
bash Miniconda3-latest-MacOSX-x86_64.sh
```
- Follow the prompts to complete the installation.
Create and Activate Virtual Environment:
- Open Terminal.
- Navigate to the Project Directory:
  - Use the cd command to navigate to the directory where you have saved the Fast CSS Tool project. For example:
```
cd fastcsstool
```
- Create a Virtual Environment:
  - Run the following command to create a virtual environment named fastcsstool-env:
```
conda create -y -p ./fastcsstool-env python=3.10
```
- Activate the Virtual Environment:
  - Run the following command to activate the virtual environment:
```
source fastcsstool-env/bin/activate
```
Install PyTorch:
- Run the following command to install PyTorch:
```
pip install torch torchvision torchaudio
```
Install Other Dependencies:
- Run the following command to install the required Python packages:
```
pip install -r requirements.txt
```
Run the Application:
- Ensure the virtual environment is activated.
- Run the application by executing:
```
python main.py
```

Usage

Data Generation from Twitter

Bearer Token: Enter your valid Twitter bearer token.
Keywords: Set keywords to filter tweets.
Include Options: Choose to include retweets and quotes.
Geo-Location: Specify latitude, longitude, and radius for geographic targeting.
Date Range: Select the start and end dates for your data collection.
Language: Choose the language of the tweets to collect.
Search and Download: Click to begin the data collection process.

Data Labeling

Import CSV: Load your data file for labeling.
Labels: Input and update the labels for categorizing data.
Navigation: Navigate through data entries and save your labeling progress.

Data Preprocessing

Import Data: Load your dataset.
Manual Filtering: Apply filters like keyword exclusion and tweet length constraints.
AI-Based Filtering: Use AI models to filter data automatically.
Export Data: Save your filtered dataset for further analysis or training.

Model Training & Evaluation

Training Data: Load your dataset for model training.
Start Training: Begin the training of your model.
Evaluation: Assess the performance of your model with accuracy, recall, precision, and F1-score metrics.
Save Models: Save your trained models for future use.

Analyze Data

Import Data and Model: Load your analysis model and dataset.
Graphical Analysis: Perform and visualize various analyses like time series and distribution of data points.
Export Analysis Results: Save your analysis results for reporting or documentation purposes.

🤖 Bot Detection Model

FastCSSTool includes an account-level bot detector to reduce noise before labeling and training.

Dataset: Hydrated subset of the Twitter Bot Detection Dataset (n = 33,184 of 40,566 IDs)
Features: 17 metadata-derived features from author profiles and tweet records
Model: XGBoost selected via FLAML (macro-F1 objective), exported to ONNX
Performance (held-out test): Accuracy = 0.85, Macro-F1 = 0.83
- Human: Precision 0.87, Recall 0.92, F1 0.89
- Bot: Precision 0.82, Recall 0.72, F1 0.77

_{Confusion matrix on the held-out test set.}
_{4,102 humans and 1,572 bots correctly identified; 345 false positives, 618 false negatives.}

License

This project is licensed under the GNU GENERAL PUBLIC LICENSE.

Acknowledgements

This tool uses the following main packages

Sentence Transformers library for accessing, using, and training state-of-the-art text and image embedding models.
FLAML library for Automated Machine Learning & Tuning.
scikit-learn library for machine learning algorithms.

For any issues or questions, please contact [info@csstr.org].

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
custom_stopwords		custom_stopwords
images		images
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
about_page.py		about_page.py
analyze_data_page.py		analyze_data_page.py
data_generation_page.py		data_generation_page.py
data_labeling_page.py		data_labeling_page.py
data_preprocessing_page.py		data_preprocessing_page.py
extract_bot_features.py		extract_bot_features.py
help_page.py		help_page.py
labels.txt		labels.txt
main.py		main.py
model_training_grid_page.py		model_training_grid_page.py
model_training_page.py		model_training_page.py
plot_with_pylustrator.py		plot_with_pylustrator.py
requirements.txt		requirements.txt
requirements_manual.txt		requirements_manual.txt
styles.py		styles.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastCSSTool: A No‑Code Platform for Computational Social Science Text Classification

🎯 Overview

Features

Installation

Prerequisites

Windows Installation

Mac Installation

Usage

Data Generation from Twitter

Data Labeling

Data Preprocessing

Model Training & Evaluation

Analyze Data

🤖 Bot Detection Model

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FastCSSTool: A No‑Code Platform for Computational Social Science Text Classification

🎯 Overview

Features

Installation

Prerequisites

Windows Installation

Mac Installation

Usage

Data Generation from Twitter

Data Labeling

Data Preprocessing

Model Training & Evaluation

Analyze Data

🤖 Bot Detection Model

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages