Reinforcement of age estimation in forensic tools to detect Child Sexual Exploitation Material

—Several image-based approaches for estimating the age of a person are available in computer vision literature. However, most of them perform poorly on minors and young adults, especially when the eyes are occluded. This type of occlusion is common in Child Sexual Exploitation Materials (CSEM), in order to hide the identity of victims. We introduce an approach that builds Soft Stagewise Regression Network (SSR-Net) models with natural and eye-occluded facial images, to estimate the age of minors and young adults. Our proposal reduces the Mean Absolute Error from 7 . 26 to 6 . 5 , and 6 . 81 to 4 . 07 for SSR-Net pre-trained models on the IMDB and MORPH datasets, respectively.

En esta edición de las jornadas se han organizado dos números especiales de revistas con elevado factor de impacto para que los artículos científicos mejor valorados por el comité de programa científico puedan enviar versiones extendidas de dichos artículos.Adicionalmente, se han otorgado premios al mejor artículo en cada una de las categorías.En el marco de las JNIC también hemos contado con la participación de la Red de Excelencia Nacional de Investigación en Ciberseguridad (RENIC), impulsando la ciberseguridad a través de la entrega de los premios al Mejor Trabajo Fin de Máster en Ciberseguridad y a la Mejor Tesis Doctoral en Ciberseguridad.También se ha querido acercar a los jóvenes talentos en ciberseguridad a las JNIC, a través de un CTF (Capture The Flag) organizado por la Universidad de Extremadura y patrocinado por Viewnext.
Desde el equipo que hemos organizado las JNIC2021 queremos agradecer a todas aquellas personas y entidades que han hecho posible su celebración, comenzando por los autores de los distintos trabajos enviados y los asistentes a las jornadas, los tres ponentes invitados, las personas y organizaciones que han participado en las dos mesas redondas, los integrantes de los distintos comités de programa por sus interesantes comentarios en los procesos de revisión y por su colaboración durante las fases de discusión y debate interno, los presidentes de las sesiones, la Universidad de Extremadura por organizar el CTF y la empresa Viewnext por patrocinarlo, los técnicos del área TIC de la UCLM por el apoyo con la plataforma de comunicación, los voluntarios de la UCLM y al resto de organizaciones y entidades patrocinadoras, entre las que se encuentra la Escuela Superior de Informática, el Departamento de Tecnologías y Sistemas de Información y el Instituto de Tecnologías y Sistemas de Información, todos ellos de la Universidad de Castilla-La Mancha, la red RENIC, las cátedras (Telefónica e Indra) y aulas (Avanttic y Alpinia) de la Escuela Superior de Informática, la empresa Cojali, y muy especialmente por su apoyo y contribución al propio INCIBE.However, most of them perform poorly on minors and young adults, especially when the eyes are occluded.This type of occlusion is common in Child Sexual Exploitation Materials (CSEM), in order to hide the identity of victims.We introduce an approach that builds Soft Stagewise Regression Network (SSR-Net) models with natural and eye-occluded facial images, to estimate the age of minors and young adults.Our proposal reduces the Mean Absolute Error from 7.26 to 6.5, and 6.81 to 4.07 for SSR-Net pre-trained models on the IMDB and MORPH datasets, respectively.

I. INTRODUCTION
In forensic applications, accurate and fast age estimation solutions enhance the detection of victims in Child Sexual Exploitation Materials (CSEM) [1].Forensic tools may also support Law Enforcement Agencies (LEAs) in identifying criminals through enhanced image analysis [2].
Age estimation is a challenging problem due to factors such as pose and illumination variation, which are commonly found in CSEM images [3].It is also common for offenders to use accessories or black stripes to hide the face or eyes of the victims [4], which presents further challenges to the performance of age estimators.
An increasing number of deep-learning-based age estimators have been proposed during the last years.However, most of these approaches are designed for the age interval between 0 and 60+ years, and are trained with unbalanced data [5], [6].Thus, many of them do not perform well for minors and young adults, aged between 0 and 25 years old.
To address this problem, we present an improved solution for the age estimation of minors and young adults [1] by training Soft Stagewise Regression Network (SSR-Net) models [5] using natural face images and faces with occluded eyes.

II. RELATED WORK
Due to the advancement of deep learning architectures, the performance of age estimators has improved significantly in recent years [7], [5], [8].Despite this, to our knowledge, there are very few approaches that estimate the age of minor/young adults [9] or eye-occluded facial images [10].
Zhang et al. [8] introduced an accurate age estimation model by combining Long Short-Term Memory (LSTM) networks which are complex and computationally intensive.In contrast, Yang et al. proposed a lightweight age estimation model, called SSR-Net [5], based on the Deep EXpectation (DEX) model [7].Likewise, Zhang et al. [6] introduced a compact model using cascaded training and multi-scale context to estimate the age with small-scale facial images.These compact models are preferable for real-time tasks due to a reduced computational cost.

III. METHODOLOGY
We introduce a two-fold solution for age estimation of minors and young adults, as presented in Fig. 1.
First, we created a balanced dataset with natural face images of minors and young adults and their corresponding eye-occluded versions.The natural facial images, in the range [0, 25] years, were collected from five different wellknown datasets, namely IMDB-WIKI, APPA-REAL, AgeDB, UTKFace, and Diversity in Faces, IBM (DiF).
We gathered a total of 130000 minor and young adult images by inspecting these datasets manually, removing images with an incorrect age label or without any human face.Afterwards, we created the occluded version of these images by locating the eye region using the Multi-Task Cascade Convolutional Neural Network (MTCNN) [11] and then masking it in order to simulate the referred conditions on CSEM.http://doi.org/10.18239/jornadas_2021.34.25 Lastly, both image sets were merged into one.
Using these images, we implemented a lightweight pretrained SSR-Net age estimator [5] to build new, fine-tuned age estimation models focused on minor and young adults.Our images were resized to 64 × 64 pixels to fine-tune the model.Furthermore, we split the dataset into a training (80%) and a test (20%) set using stratified random sampling.

IV. EXPERIMENTAL RESULTS
We evaluated the age estimation performance using the Mean Absolute Error (M AE) of the SSR-Net pre-trained models that have been trained with face images considering the age range [0, 25] years from four balanced datasets varying in size, [6500-130000], and with two unbalanced datasets, namely MORPH and IMDB.
Then, we measured the M AE's performance enhancement of fine-tuned age estimators using our non-occluded (Org.), eye-occluded (Ocl.), and a combination of both types (Org.-Ocl.) of minor and young adult facial images.Hence, eight different models were assessed per each image type: Org., Ocl., and Org.-Ocl.Our results are presented in Table I.
We noticed that the age estimation performance was more stable in SSR-Net models -pre-trained on the IMDB dataset-fine-tuned with our merged dataset.These models achieved the best M AE of 3.58 and 4.19 for non-eyeoccluded and eye-occluded images, respectively.Regarding MAE distribution, errors are heterogeneous: the MAE for age groups 0, 4-9, and 23-25 years is higher than for age range 0-25 years.
Furthermore, we compared our results with the best SSR-Net model against a state-of-art approach, VGG16-based DEX model, trained with our merged dataset.The proposed models outperformed the DEX model with M AE of 6.5 for non-eyeoccluded and eye-occluded facial images.In addition, the size of the SSR-Net-based age estimators was much lower than the DEX age estimator, with sizes of < 1MB and 500MB, respectively.Moreover, it predicts the age in 0.006 seconds from a facial image.
Lastly, we have successfully integrated our proposal, i.e.V. CONCLUSIONS We present an improved age estimator focused on minors and young adults with SSR-Net models, fine-tuned using natural and eye-occluded face images.Results show that our solution performs better in minors and young adults (MAE of 4.07) in comparison to the DEX model (MAE of 6.5), being more robust against eye occlusion.Moreover, our SSR-Net-based estimators are compact models and suitable for any hardware despite memory capability, as well as forensic applications of child detection on CSEM.As future work, the impact of the gender on the MAE for age estimation will be analyzed.

Figure 1 .
Figure 1.Steps to train age estimation models in minors and young adults.

Table I M
AE VALUES OF SSR-NET AGE ESTIMATION MODELS.THE BEST M AE Net age estimation model, into the 4NSEEK 1 tool to support the detection of minors on CSEM.