About Experience Projects CTF Writeups Contact
AuvergnHack 2026 Machine Learning

Gravitational Classifier

Date May 2026
Result 3rd Place
Category Machine Learning
Flag ZiTF{chimpanzee_love_banana}
Author Maxime Reynaud
Overview

A classic ML forensics challenge: you are handed a single serialised scikit-learn model in .pkl format and must extract a three-word passphrase hidden across its internals. No training data, no API, no server. Just you and the model weights...

The three words are concealed through three different techniques: an anomalous coefficient in a secret classification class, a custom base64-encoded model attribute, and ASCII values embedded as unusually large integers in a second class's coefficients. Each path requires a slightly different angle of inspection.

You can download and try the CTF challenge using the model .pkl file gravitational_model2.pkl and solve it using my personal python write up ML_AuvergnHack2026.py. Both of these files are available to download on the right hand side

Challenge Description
Challenge prompt Dr. Bonobo, a principal researcher in artificial intelligence at the Simian-7 orbital station, was urgently transferred after being suspected of spying on behalf of a dissident faction. Before his departure, he was working on a machine learning model designed to classify astronomical objects around Sagittarius Ape-Star. Simian counterintelligence recovered the trained model file (.pkl) from his workstation. Analysts believe it contains classified information regarding the upcoming secret mission! Maybe a passphrase? Load the model, inspect its inner workings, and reconstruct the hidden passphrase. The flag consists of three words scattered throughout the model. Combine them with underscores to obtain the complete flag.
Definitions Pickle: It can be used to serialize Python object structures, which refers to the process of converting an object in the memory to a byte stream that can be stored as a binary file on disk. When we load it back to a Python program, this binary file can be de-serialized back to a Python object. - Python Programming and Numerical Methods

Vectorizer: Converts text into numerical features that a machine learning model can understand. In this challenge, the vectorizer turns input words into a 238-dimensional feature vector.

Classifier: Machine learning model that assigns an input to one of several possible categories. In this challenge, the classifier uses the vectorized text features to predict one of 8 astronomical-object classes.
Security warning Pickle files can execute arbitrary code on load. ALWAYS open untrusted .pkl files inside an isolated VM or container. Never on your host machine !
Initial Analysis

The first step is loading the pickle and understanding what we are dealing with. After opening the file with the pickle library and seing that it requires the sklearn library aswell. We discover that it contains a dictionary with two keys: model and vectorizer. The model is a linear classifier (with a coef_ matrix) paired with a CountVectorizer that maps words to integer indices. Our first instinc is to use __dict__ to open up the scikit-learn object and observe it's attributes, a quick "dump the model's internal state” step.

Python inspect.py
import pickle, sklearn, base64

path = "gravitational_model2.pkl"

with open(path, "rb") as f:
    data = pickle.load(f)

model      = data["model"]
vectorizer = data["vectorizer"]

print("Model contents:", model.__dict__.keys())
print("Classes:",       model.classes_)
print("coef_ shape:",   model.coef_.shape)
output
$ python inspect.py
Model contents: dict_keys(['penalty', 'C', 'l1_ratio', 'dual', 'tol', 'fit_intercept', 'intercept_scaling', 'class_weight', 'random_state', 'solver', 'max_iter', 'verbose', 'warm_start', 'n_jobs', 'n_features_in_', 'classes_', 'n_iter_', 'coef_', 'intercept_', 'galaxy_'])
Classes: ['black_hole' 'galaxy' 'gas_giant' 'ice_giant' 'secret' 'star' 'super_earth' 'terrestrial_planet']
coef_ shape: (8, 238)

Bingo! Two things immediately stand out from the model.classes_ and __dict__ inspection: the presence of a 'secret' classification class among the otherwise astronomical labels, and an unexpected custom attribute called galaxy_ that is not part of the standard sklearn API. Both are worth pursuing.

Methodology

Three independent hiding techniques were used with one word per technique. We tackle each in turn.

01
Anomalous coefficient in the secret class
Isolate the coefficient vector for the secret class, map each value back to its vocabulary word via the vectorizer, and sort by descending weight. The word with the largest coefficient by a significant margin is the first hidden word.
02
Base64-encoded custom model attribute
The galaxy_ attribute is a non-standard field injected directly into the model object. Its value is a base64 string that, once decoded, reveals the second hidden word.
03
ASCII values hidden as large integers in super_earth
Iterate over every coefficient across all classes, casting each to an integer. Values greater than 2 are statistically impossible in a well-trained linear classifier, any such integer is suspect... The anomalous integers found in the super_earth class decode directly to ASCII characters forming the third word.
Implementation

Word 1: secret class coefficients

This model takes input text, converts it into 238 numerical features using a vectorizer, and then classifies it into one of 8 classes. This secret class (part of the 8 of this model) will most probably have associated coeficients for its vocabulary (238 of them). We get the index (index 4) and extract the coefficient row corresponding to the secret class, then build a list of (index, word, coefficient) tuples sorted by weight. One word, banana, has a coefficient of 5.336722, dramatically higher than everything else in the vocabulary. This means that when the word/feature banana appears in the input text, it strongly pushes the classifier toward the class secret, making it very suspect.

Python word1.py
classes   = model.classes_
sec_index = list(classes).index("secret")
coef      = model.coef_[sec_index]

# Map each vocabulary entry to its coefficient value
sorted_vocab = dict(sorted(vectorizer.vocabulary_.items(), key=lambda item: item[1]))
comparison   = [(idx, word, float(coef[idx])) for word, idx in sorted_vocab.items()]

top10 = sorted(comparison, key=lambda x: x[2], reverse=True)[:10]
print("Top weights for 'secret' class:", top10)
output
$ python word1.py
Top weights for 'secret' class:
[(9, 'banana', 5.336722...), (15, 'bigger', -0.006756...), (65, 'entity', -0.006995...), ...]

The gap between banana at 5.3368 and the next entry at -0.0068 is a clear statistical anomaly. We assume that this word was injected deliberately. Word 1: banana.

Word 2: base64 custom attribute

The galaxy_ attribute does not belong to any standard sklearn estimator. It was added directly to the model object before serialisation as a covert data carrier. When looking deeper at the value stored in galaxy_, we see something that resembles base64 (due to the trailing ==, often a sign of base64 padding). Decoding it from base64 is straightforward.

Python word2.py
import base64

raw     = model.galaxy_               # bG92ZQ==
decoded = base64.b64decode(raw).decode("utf-8")
print("galaxy_ raw:", raw)
print("galaxy_ decoded:", decoded)
output
$ python word2.py
galaxy_ raw:     bG92ZQ==
galaxy_ decoded: love

So this is also part of our flag. Word 2: love.

Word 3: ASCII integers in super_earth

Coefficients in a trained linear classifier are real-valued weights typically in the range [-2, 2]. Any integer greater than 2 is a red flag. Iterating over every coefficient and casting to int (because my small brain can not process scientific number notation), then filtering for values > 2, surfaces a suspicious sequence exclusively in the super_earth class. Once we have identified the suspicious coefficients, we take a closer look and observe that the numbers are in the ASCII lowercase letter range 97-122 (a-z). With this observation, we attempt to transform our values into letters and see if we get a string that makes sense.

Python word3.py
ascii_values = []

for i in range(len(classes)):
    for j in range(len(model.coef_[0])):
        val = int(model.coef_[i][j])
        if val > 2:
            print(f"Class: {classes[i]}, int value: {val}")
            if classes[i] == "super_earth":
                ascii_values.append(val)

print("ASCII values:", ascii_values)
print("Decoded:", "".join(chr(v) for v in ascii_values))
output
$ python word3.py
Class: super_earth, int value: 99
Class: super_earth, int value: 104
Class: super_earth, int value: 105
Class: super_earth, int value: 109
Class: super_earth, int value: 112
Class: super_earth, int value: 97
Class: super_earth, int value: 110
Class: super_earth, int value: 122
Class: super_earth, int value: 101
Class: super_earth, int value: 101

ASCII values: [99, 104, 105, 109, 112, 97, 110, 122, 101, 101]
Decoded: chimpanzee
Tip If you are not comfortable with Python's chr(), you can paste the integer sequence directly into CyberChef → From Decimal to reach the same result instantly.

Word 3: chimpanzee.

Assembling the flag

The three words are chimpanzee, love, and banana. Ordered into a grammatically coherent phrase we get chimpanzee love banana. We then join them with underscores and that gives us the final flag.

Flag
Flag
ZiTF{chimpanzee_love_banana}
Conclusion

This challenge demonstrates that a serialised ML model is not just useless weights, it is a Python object graph. Anything that can be stored in a Python object can be hidden inside a .pkl file: custom attributes, unusual coefficient values, or encoded strings. The standard model inspection workflow (print __dict__, inspect coef_, check the vocabulary) is enough to surface all of it.

From a defensive standpoint, this is also why blindly loading pickle files from untrusted sources is dangerous, they can carry arbitrary Python objects, not just model weights.

Key takeaway When auditing an ML model, always inspect model.__dict__ for non-standard attributes, sort coefficients to catch statistical outliers, and treat any unusually large integer in a weight matrix as a potential ASCII payload.