Life cycle of an AI system
Lifecycle Assessment (LCA) [7] is a systematic approach to evaluate the environmental impacts of a product or system throughout its entire life cycle [1]. The life cycle of an AI system is similar to the previous “life cycle of data mining project” [8]. Indeed, as for data mining, the AI lifecycle encompasses the complete process of developing and deploying artificial intelligence systems. It starts with data collection and moves through stages such as data preprocessing, model training, evaluation, deployment, and ongoing monitoring and maintenance.
There is no universal method for finding the right trade-off, however one way is to use benchmarking, which plays a crucial role in the development of frugal AI by improving efficiency and adaptability.
Due to the life cycle of an AI system, many costs prevent AI from being frugal.
A non-exhaustive list can be found in the table below:
| Example of Costs that should be reduced to target Frugal AI | |
| (i) | Development Costs |
| (ii) | Data Costs |
| (iii) | Infrastructure Costs |
| (iv) | Training Costs or retraining cost |
| (v) | Inference cost |
| (vi) | Maintenance Costs |
| (vii) | Compliance Costs |
| (viii) | Deployment Costs |
| (iX) | Support Costs |
These costs can accumulate and impact the overall frugality of an AI system, as detailed in recent publications e.g. [11]. The cost to pay is the addition of these costs, and some of them are recurrent costs that must be paid at every use of a given model as for example the cost. The cost to pay is not, as assumed in some publications, only the costs induced in the three steps: training, deployment, and production. We encourage considering the sum of all the costs of the life cycle of an AI system and not only part of them. For instance, the fine-tuning of an existing model only reduces one of the costs (the training cost). Even when the model must be only updated and not replaced by a new one, frugality should be considered and updating the model is an investment decision which, as in the financial markets, should be taken only if a certain return on investment is expected [12].
In summary, given a task to be solved by AI, the total costs should be minimized and the ROI should be considered.
Large Models are not always the best option for a given task
The recent trend in AI is the use of large models (generative AI, large deep neural networks, etc.). Many of the tasks that could be performed with AI ( , , etc.) are currently not frugally solved by large models.
Therefore, one should keep in mind that “old models/no large models” remains quite interesting in terms of performances particularly on tabular data or time series, as illustrated below on sentiment analysis. Examples of ‘no large models’ are Linear Regression, K-nearest neighbors, Random Forest [4], Catboost [10], Khiops [2], etc., or even signal processing for time series (as, for example, exponential smoothing, Arima, etc. [3]).
Finding the right inflection point between performance and frugality
Finding the right inflection point between performance and frugality indicators in AI models is critical to maximizing efficiency, accessibility, and ethical considerations, while still achieving satisfactory levels of performance. Balancing these factors can lead to more sustainable and impactful AI solutions. Besides simplification gains, there are many arguments in favour of finding the right tipping point [3], the more obvious ones being the improvement of:
- Resource efficiency:
- Cost reduction: Energy-efficient models require less computing power and memory, resulting in lower operating costs.
- Environmental impact: Reducing resource consumption can reduce the carbon footprint associated with training and deploying AI models.
- Scalability:
- Broader accessibility: More efficient models can be deployed in resource-constrained environments, making AI accessible to a wider audience.
- Faster deployment: More efficient models can be trained and deployed faster, allowing rapid iteration and adaptation.
- Optimized Performance:
- Diminishing returns: At a certain point, increasing model complexity yields minimal performance gains. Identifying the tipping point helps avoid unnecessary complexity.
- Robustness: Simpler models can sometimes generalize better to unseen (test or deployment) data, reducing the risk of overfitting (learning particularities of the training data that will be not present when using later the model on test or deployment data)
- User Experience:
- Latency reduction: Frugal models often result in faster inference times, improving the user experience in real-time applications.
- Ease of integration: Less complex models can be more easily integrated into existing systems and workflows.
- Ethical aspects:
- Fairness and transparency: Simpler models can be more interpretable, making them easier to understand the decisions made by AI systems and promoting fairness.
- Bias mitigation: Frugal models can reduce the risk of embedding biases that can result from overly complex architectures.
- Creativity issued from frugal Innovation and experimentation: A focus on frugality can inspire innovative approaches to problem solving, leading to novel solutions that may not rely on heavy computational resources.
This list is not exhaustive, of course, and we can add costs that are sometimes ‘hidden’, such as increasing the skills of teams, integrating an additional data scientist into the project team, etc.).
One way to find this trade-off is to use benchmarking [6], which plays a crucial role in the development of frugal AI by improving efficiency and adaptability.
The results of benchmarking AI methods help to develop more frugal AI in several ways. Firstly, it is possible to identify efficient methods, since benchmarks enable comparing the performance of different AI methods, highlighting those that offer the best value for money in terms of the resources used. Secondly, it is possible to optimize resources: through analysis of the results, researchers (i.e. users) can identify algorithms that require less data or computing power, thus favoring lighter solutions. They also provide a consistent framework to evaluate AI models, ensuring comparability across different approaches (standardization). They help identify the most efficient algorithms for specific tasks, guiding resource allocation (performance metrics). They encourage sharing of best practices and datasets, fostering innovation in frugal AI solutions (community Collaboration).
The aim of benchmark results is not to systematically compare solutions (by repeating a lot of experiments), but to build up a set of skills that will enable an appropriate selection to be made. The question is therefore “how can companies that do not have data scientists build up this knowledge” (or companies that have qualified data scientists but who are overloaded with work and therefore cannot respond to all requests, etc.).
Illustration of the different possible tradeoff
As far as we know there is no universal method for finding the right tipping point. Modestly, however, we can mention one that makes sense at the start of a data science project:
(i) define the performance criterion for the project;
(ii) define the value of this criterion (perhaps in the form of a return on investment (ROI));
(iii) use a rule, an AI, etc., that is simple at the start and then, if the value of the criterion is not reached, make the AI more complex;
(iv) stop as soon as the value of the criterion is reached or when the sum of the costs becomes too important (or if the return on investment cannot be achieved or if the cost of achieving it will be too high).

Figure 1: Illustration of different tradeoffs between performances and costs
This is illustrated in Figure 1 above: In the purple case, if the return on investment in terms of performance is achieved with P1, there is no reason to make the AI model more complex and pay additional costs. In the green case, the same performance can be achieved for two different costs. It is therefore very interesting to start by using an AI model producing cost C1 and then stop. The worst case is where using an AI model produces a higher overall cost with poorer performance (not illustrated in the figure).
This last scenario is well presented in [9]. In this report a classification task is designed on text (sentiment analysis) using a Support Vector Machine (SVM) [5] or three Large Language Models (LLM) (BERT fine-tuned on the problem to solve, Llamma and BERT prompted to solve the problem). For this given classification task, we may observe that the biggest LLM energy consumptions for inference are several orders of magnitude higher than a standard SVM for a comparable (or lower) accuracy.
Sources :
[1] K. E. Bassey, A. R. Juliet, and A. O. Stephen. AI enhanced lifecycle assessment of renewable energy systems. Engineering Science & Technology Journal, 2024.
[2] M. Boullé. Khiops: outil d’apprentissage supervisé automatique pour la fouille de grandes bases de données multi-tables. Revue des Nouvelles Technologies de l’Information, Extraction et Gestion des Connaissances, RNTI-E-30:505–510, 2016. www.khiops.org.
[3] G. Box and G. M. Jenkins. “Time Series Analysis: Forecasting and Control”. Holden-Day, 1976.
[4] L. Breiman. “Random forests”. Machine Learning,45(1):5–32, 2001.
[5] C. Cortes and V. Vapnik. “Support vector networks”. Machine Learning, 20:273–297, 1995.
[6] R. Dattakumar and R. Jagadeesh. “A review of literature on benchmarking.” Benchmarking: An International Journal, 10(3):176–209, 2003.
[7] W. Klöpffer and B. Grahl. “Life cycle assessment (LCA): a guide to best practice”. John Wiley & Sons, 2014.
[8] V. Lemaire, F. Clérot, N. Voisine, C. Hue, F. Fessant, R. Trinquart, and F. Olmos Marchan. “The data mining process: a (not so) short introduction”, 2017.
https://www.researchgate.net/publication/313528093_The_Data_Mining_Process_a_not_so_ short_introduction.
[9] N. E. Mbengue. Étude comparative de l’empreinte carbone de modèles de machine learning appliqués au traitement automatique de la langue (tal). Master’s thesis, TELECOM Nancy, 2024.
[10] L. O. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin. Catboost: unbiased boosting with categorical features. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, NeurIPS, pages 6639–6649, 2018.
[11] B. Xia, Q. Lu, L. Zhu, and Z. Xing. An ai system evaluation framework for advancing ai safety: Terminology, taxonomy, lifecycle mapping. In Proceedings of the 1st ACM International Conference on AI-Powered Software, New York, NY, USA, 2024. Association for Computing Machinery.
[12] I. Zliobaite˙, M. Budka, and F. Stahl. Towards cost sensitive adaptation: When is it worth updating your predictive model? Neurocomputing, 150:240–249, 2015. Bioinspired and knowledge-based techniques and applications The Vitality of Pattern Recognition and Image Analysis Data Stream Classification and Big Data Analytics
Process of using a trained model to make predictions or decisions based on new, unseen data. It involves applying the learned patterns and knowledge from the training phase to interpret input data and generate outputs such as classifications, predictions, or recommendations.
A type of supervised learning task where the goal is to categorize input data into predefined classes or labels. The model learns from labeled examples during training and then assigns labels to new, unseen data based on the learned patterns. Examples include spam detection in emails, image recognition, and sentiment analysis.
A type of supervised learning task where the goal is to predict a continuous output or numerical value based on input data. The model learns from labeled examples during training and then estimates values for new data. Examples include predicting house prices, stock market trends, or temperature forecasts.






