Ümit Bulut – Films, Teaches, Codes.

My Learnings From First MenaML Winter School – Doha, Qatar 2025

Leaving behind Istanbul, which showed snowfall in the following days, and stepping into Qatar, which was experiencing spring weather with my suitcase full of t-shirts, I was very excited. I hadn’t traveled abroad during the pandemic and the last five years (half decade) of my life, and at this time of transformation and a new Ümit, I was turning a new corner.

I was accepted with a scholarship to the Middle East and North Africa Machine Learning Winter School (MenaML 2025), organized by the Qatar Computing Research Institute (QCRI) at Hamad bin Khalifa University (HBKU). This six-day intensive program was designed to equip the next generation of AI leaders. I spent the second week of February at this event, hosted at the Minaretein Building within Education City. In this blog post, I will share what I learned and my experiences thanks to the dedicated university and Google DeepMind team, among other sponsors like PhazeRo, InstaDeep, Dell, Qeen.AI, and Apple. In addition to this technical article, which will be heavily focused on artificial intelligence, you can access my article about my Qatar experiences & the visual vlog will be added later. Don’t forget to cite or share!

PS: This is a slightly technical post about my experience. Photos will be added later.

After checking into the hotel where I arrived at night, my first task was to eat and rest. Because the next day we had to leave our hotel early at 07:00 AM and take the buses to the event area from Premier Inn, as scheduled.

Day 1: February 9th

On the first day, after arrival and registration, we had some time to meet and mingle in the area specially prepared for the event, followed by coffee and industry-related activities. After breakfast, opening speeches were made, including remarks from the MenaML Team and a QCRI Opening Remark by Dr. Ahmed Elmagarmid. Then, while getting used to the unfamiliar faces that would become quite familiar, we started with a warm welcome from the organizing team and information about the event. It was obvious that among the 1400-something applicants, the 147 chosen individuals, meaning a 10% acceptance rate, were diligent names heard in the field of artificial intelligence and whose names we would hear in the future.

The QCRI President and HBKU Vice-Rector delivered their speeches, emphasizing the importance of the event and improving the region. Following these, M. Shakir from Google began his keynote speech. Shakir’s talk was divided into three parts: “Getting Ready,” “Impact From AI Research,” and “Shaping AI”. He mentioned Arrival Technologies, citing OpenAI’s GPT from before NeuroIPS, and questioned our readiness for AI technologies, positioning us as pioneers and makers. For the impact, he discussed the “2 Sigma Problem” in education, highlighting the importance of AI and education technology for global quality education (Sustainable Development Goal 4). He talked about properties of a good AI Tutor, such as broad expertise, contextual use, and respect for the user, referencing LearnLM and a paper on responsible generative AI development for education. He also touched upon human-AI interactions and digital wellbeing, viewing this as the beginning of a robust area for responsible and sociotechnical AI Research. Another vision he shared was Weather Prediction by Numerical Process, discussing advances in weather prediction with GRAPHCAST for medium-range forecasts and the next generation, GenCast, which performs exceptionally well. He even linked weather forecasting to fish forecasting and its impact on lives, noting that high-income countries have more accurate forecasting than low-income ones. Shakir also explored AI’s role in Comedy, Art, and Culture with “Plays by Bots” and Dramatron (Google Deepmind GitHub), noting the wide spectrum of opinions on AI’s creative potential. The final part, “Shaping AI,” focused on creating grassroots initiatives like the MenaML Event, drawing parallels with the Deep Learning Indaba in Africa, which established an ML community in 47 African countries and brought many researchers to the forefront. His three key takeaways were “Getting Ready,” “Impact with AI,” and “Mena ML Community”. For me, the most illuminating and important part was where he talked about Shaping AI and the Indaba’s Story, highlighting the importance of collective groups, ownership, and participation for the AI community in the region, echoing the story of how African universities were underrepresented in ML papers from 2006-2016.

After a lunch break, Dr. Safa Messaoud and Dr. Ehsaneddin Asgari took the stage in the afternoon event to explain the introduction to Deep Learning and Transformers technologies. Safa began with the basics of Deep Learning, mentioning Project Mariner and explaining that LTSM is too complex for big data, given the current internet data volume of 250,000 GBs per second. Ehsaneddin then presented on Transformer Language Models, starting with Chomsky’s definition of language and the concept of language modeling. He noted that traditional word representations don’t distinguish meanings like “morning light” from “light joke,” emphasizing the need for contextual embedding and attention layers in transformers to capture multiple meanings of words. He also discussed how models have shifted from single-task models to finetuning a single model for all capabilities, and mentioned the “Muslims in ML Community” and the importance of community-based contributions.

In the Practical Session after the coffee break, we had the opportunity to apply what we learned in a hands-on session on Transformers & LLMs, with a dedicated GitHub repository. We returned to our hotel rooms with dinner. Since I was very tired, I decided to spend the rest of the evening looking at the Kaggle competition data and doing some coding instead of sightseeing. The event featured two challenges: the QCRI Challenge on “Fingerprinting Movement Disorders in Fruit Flies” and the Qeen.AI Challenge on “revenue-generating forecasting model from a retail customer dataset”.

Day 2: February 10th

The second day, February 10th, started with a practical challenge and then a Q&A session for the Kaggle competitions. Both academic and industry challenge questions were asked, and the detailed and complex questions for the academic competition showed me that the competition was quite intense.

Then, Dr. Mohamed Eltabakh from QCRI took the stage to give his keynote speech on Arabic-Centric LLMs: Fanar Team. He emphasized that “we don’t have to be consumers in the technology” and explained the ups and downs, pros, cons, and lessons learned while building Arabic LLMs. He highlighted the critical role of culture in AI; while technology might be agnostic to culture, AI inherently carries cultural perceptions. Building Arabic LLMs is hard and expensive but crucial to preserve the language, culture, and identity for future generations, as whoever owns the technology owns the control. A key takeaway for me was the usability of the model and creating public awareness. He noted that “LLMs” might evolve to “GenAI” as a more available term. The “technology dilemma” is that “one size does (not) fit all,” suggesting the need for smaller, application-specific models. Fanar is presented as a framework with several components, where Fanar Prime and Stars are LLMs, and other components form the “brain” of the model. They significantly increased Arabic content from 2% to 40% in their model. They also incorporated Preference Learning & Alignment for Fine-Tuning to address cultural differences, even implementing “ban words” to filter inappropriate prompts in their image generation model. The performance is detailed in the “Fanar Benchmarking” paper.

After a coffee break, we listened to a talk on Reasoning LLMs by Mohammad Raza. He started by discussing benchmarks for LLMs, noting that many models now perform better than existing benchmarks. He delved into Reasoning techniques, including Chain of Thought (CoT), its variations like Tree of Thought and Graph of Thought, Tool integration approaches like ReAct, and Problem reformulation approaches. He also covered Training and Fine-Tuning Approaches, such as ToRA (learning tool-use with a teacher model like GPT through imitation learning) and WizardMath (teacher model with RL). He touched upon how Reinforcement Learning (RL) alone can achieve good results, and even better when combined with fine-tuning. The session concluded with LLMs & Formal Reasoning, where he discussed how LLMs can be integrated with automated reasoning tools like LEAN and z3. A key issue is incorrect formalization by LLMs, which he is researching through Instantiation-guided Formalization and AlphaProof, which uses RL with MCTS to generate and verify formal proofs in LEAN. His closing thoughts emphasized the need for deeper integration of formal reasoning.

After lunch, we learned about Efficient LLM Training from Pranali Yawalkar. She reviewed the Transformer Architecture, including the Encoder-Decoder and Multi-Headed Self-Attention, emphasizing the need for smart and efficient training. She explained the critical challenges in LLM training, illustrating how updating weights might cause the model to “forget” previous capabilities (catastrophic forgetting). The session included a practical fine-tuning exercise on Colab, covering the Generative AI Lifecycle, sample queries, prompt engineering techniques (like CoT), and the distinction between pre-training and fine-tuning.

Finally, the day concluded with a productive panel session titled “AI as a Career Choice in the Region,” moderated by Mohamed M. Abdallah. The panelists included Prof. Khaled Harras (Prof. & Associate Dean of CMU), Dr. Dena Al-Thani (Assoc. Prof. at HBKU), Dr. Morteza Ibrahimi (CEO of Qeen.AI), and Mr. Masoud Al Rawahi (CEO of PhazeRo). The main takeaway from the panel was that there is a significant amount of talent in the Middle East, and it is crucial to discover this talent and recognize AI as an increasingly important career choice for the future. While jobs would transform, no one believed that human influence and the human element would diminish.

This evening, after having coffee at a nearby cafe and taking a walk, I returned to the hotel to code. There were pickleball courts next to the cafe – interesting to see! This was my first time see a pickleball courts in person, I’m assuming it’s popularity is getting bigger and bigger in the white-collar community.

Day 3: February 11th

The third day, February 11th, started with GenAI & Diffusion Models, presented by Safa Messaoud and Andrew El-Kadi. Safa began with the basics and mathematics of Diffusion Models, including coefficients, covariances, and Gaussian distributions. Andrew explained how diffusion models work, starting with U-Net, and discussed the importance of evaluation and diversity in images. He pointed out that while a higher Inception Score is considered better, the InceptionV3 model for evaluation is flawed, as good diversity doesn’t always mean good images. Instead, he suggested Frechet Inception Distance (FID), though it has limitations with noise and distortion. They explained that Video Diffusion is simply adding a frame dimension on top of image diffusion models, using temporal and spatial networks. They discussed training with an initial 16 frames and then using inpainting techniques autoregressively to add more frames, transitioning from FID to Frechet Video Distance (FVD) for evaluation. They connected video diffusion to Weather Forecasts, explaining GenCast as a diffusion model for weather. They contrasted GenCast with Graphcast, an earlier GNN-based model that was deterministic, noting that weather prediction requires possibilities and multiple possible futures, thus leading to the development of probabilistic ensembles in GenCast. They highlighted that weather is a Markov system, meaning it is memoryless and only needs 6-hour previous states, with GenCast learning from two previous states. I had a question about why not use long-term patterns for prediction and whether outliers could also be extreme events to predict. During encoding and decoding, the attention mechanism mentioned K-hops, which is similar to BFS. They also considered that not every square on Earth has the same visuals or surface area in projections, making denoising expensive due to double denoising steps (40 steps for 20 total). For evaluations, they assess Realistic & Accurate Events using CRPS (aiming for the blue line to be near the actual event) by generating 50 guesses for a year and comparing them to actual events, outperforming the ENS metric. They also evaluated Windfarms for power generation prediction and the ability to predict Extreme Events, showing how they predicted possible typhoon paths five days before Typhoon Hagibis. The Q&A touched on hallucinations, physical constraints for physicians, and the cost of training models (H100 TPUs for 8 nights), with a 1-degree resolution in the equator (30km area).

After a coffee break, Amin Sadeghi presented on Multi-modality. Although I was a little bit late for this part of his speech due to networking, I noted his insights on the importance of having an environment where one can make mistakes to learn more from feedback. He discussed the revolution of self-supervision, using unlabelled data for supervised learning with masking.

Afterwards, I’ve visited the Al-Khater Torba Market’s Pop-Up shop for National Sports Day. I wanted to visit the Qatar National Library, but it was closed due to the holiday. I then explored Education City, which is a communal campus with many universities, easily navigable by walking or free trams, offering a wide, uncrowded area. I also visited Cornell Medicine (without going inside) and Georgetown University, admiring its architecture. Then I returned to the event to charge my phone and have some food. I stayed for the poster presentations, where I found interesting posters and works. After looking at people’s work, I went with a friend I met there to visit some of Qatar’s local spots: Souq Waqif, the Corniche, and Falcon Souq, before heading back to our hotel. There was no tram service.

Day 4: February 12th

The fourth day, February 12th, started with an Introduction to Reinforcement Learning (RL) & Deep RL, presented by Sanjay Chawla and Mina Khan. Sanjay presented on RL, explaining how it works with sequences of data and other forms of data, mentioning the Reinforcement Algorithm.

Mina Khan’s presentation focused on Reinforcement Learning for LLMs, which was one of the areas that interested me most. She began by asking “Why RL?” and pivoted from the GPT paper to explain how RL learns from both positive and negative examples. She outlined the RL process: starting with an initial (supervised learning) policy, training a reward model (RM), and then optimizing the policy using the RM. She discussed various RL techniques, including RLHF (Reinforcement Learning Human Feedback), RLAIF (RL from AI Feedback), and RLEF (RL From Execution Feedback). Other reward types she mentioned included Process Supervision (PRM) versus Outcome Supervision (ORM), and techniques like STaR (Self-Taught Reasoner) and WebGPT.

After a coffee break, we had a practical session on RLHF, led by Yasin Abbasi Yadkori.

After the lunch and mentorship session and also we had a discussion on AI for Sustainability by Chris Brown. Here, a new Google product was demonstrated, and Andrew extensively discussed Geocast. The day concluded with a poster session. This evening, I visited the National Library, which I hadn’t been able to see before. I really liked the library, and even though I stayed until closing time, the time wasn’t enough; I definitely needed more time. Afterward, I had a short shopping session at Mall of Qatar before returning to the hotel.

Day 5: February 13th

This was one of the days I looked forward to the most, as both health and education are fields that greatly interest me. We started the fifth day, February 13th, with the AI for Education session by Avishkar Bhoopchand. This session was very insightful on AI & Education, covering education with technology, intrinsic evaluations, challenges in educational use, and future research. A LearnLM DEMO was also presented.

After a coffee break, we moved to AI for Health, presented by David Barrett and Alan Karthikesalingam. Alan began by sharing facts about healthcare needs, noting that nearly $9 trillion, or 10% of global GDP, is spent on healthcare annually. He highlighted key translational challenges for medical AI: generalization, reliability, and interactivity. He explained that “medicine is a game of long tail,” which means supervised learning alone is insufficient in hospital settings, and that “a wise doctor knows what he doesn’t know,” which is what AI should aim for. He discussed “Narrow AI Tools” currently used at the bedside, and the time-consuming process for a classification model (e.g., for breast cancer) to be approved and regulated for general use, moving at the “speed of trust”. He also touched upon Transformers and LLMs in medicine, emphasizing the importance of multimodality, and mentioned using USMLE-style MCQs as a benchmark. He noted the huge leap from Palm to PaLM 2. For David’s speech, I had an overlapping call about personalized medicine, so I had to leave, but I caught him afterwards to discuss recent developments and insights from the Q&A.

Next, I eagerly awaited the AI for Biology conference by Alex Graves. Here, the techniques he described, especially inpainting, greatly interested me – it was wizardry-level stuff! Additionally, the genomic application of the process was interesting, though super complicated, and his co-author (who I believe was Turkish) had more information on the genomic application. It was also quite important for me to have met Alex. I gained a lot from the conference.

After a group photo break, we returned to the conference hall for the AI for Genomics session by Natasha Latysheva. This was the last conference of the topic. During her talk, she emphasized how AI can integrate into genomics fields from an evolutionary point of view, suggesting that understanding these topics might help us in the future in a more personalized way. During this time, I was listening to the conference while also working on the Kaggle competition. Towards the end of the conference, I achieved a very good result! Both the result and the productivity of the conferences, along with what I learned, made me very happy.

After dinner, we had an unofficial city tour organized for the event participants. After cruising the bay by boat, we went to a newly developed area with both villas and cafes. Here, we enjoyed our Karak tea and had a pleasant evening chatting with friends we had grown accustomed to as the event slowly approached its end. At one point, I spent too much time at a store selling analog films and lost the group, but that’s another story! 🙂

Day 6: February 14th

The last day of the conference, February 14th, began with the closing session and the awards ceremony. I achieved third place in the industry challenge at the event!. I had secured second place with the code I uploaded at 2 AM, but someone else uploaded code at 4 AM that took second place. Still, third place is not a bad result at all! Afterwards, we discussed AI in the region at an unconference session before heading to the central mosque for Friday prayer. We performed the prayer by Imam Omar Suleyman. Then, with friends I had made, we visited the Museum of Islamic Art in Qatar. My comments on the museum and my impressions of Qatar are topics for another blog post!

Afterward, the final surprise of the week was a Macklemore concert. We learned he would be attending a YouTubers’ event for “Match 4 Hope,” organized especially (I assume for children in Gaza), so we went to support and watch Macklemore, who once again earned my admiration for his support of Palestine. I would say I ended the night exhausted and went to the hotel, but I only got 45 minutes of sleep before preparing for my early flight back to Istanbul.

Final Remarks

This intensive six-day program was truly an enriching experience, allowing me to delve deep into various aspects of Machine Learning, from Deep Learning and Transformers to Arabic-Centric LLMs, Reasoning LLMs, Efficient LLM Training, GenAI & Diffusion Models, Multi-modality, Reinforcement Learning, and critical applications in AI for Education, Health, and Biology/Genomics. It was an incredible privilege to join and meet so many talented people from around the world, and I sincerely hope to reunite with them in the next edition of MenaML. This entire experience was overwhelmingly positive and highly valuable. Don’t forget, in addition to this technical article, my article about my Qatar experiences, the visual vlog, and the technical report will be coming soon!

Leave a Reply

Your email address will not be published. Required fields are marked *