The best local LLM inference setup:4x Mac Studio (M3 Ultra, 512 GB of unified RAM), 2 TB of UMA RAM with RDMAEXO 1.0 tooling for clustering, now with tensor parallelism enabled!RDMA (Remote Direct Memory Access) though Thunderbolt 5 - clustering bottleneck eliminatedMLX inference acceleration (now with RDMA support!)And... Mac OS 26.2https://www.youtube.com/watch?v=A0onppIyHEg&t=3m10sDeepSeek v3.2 8 bit quantization (original training quantization) at 25 tokens per second! Wow!516 Watts at the peak of power usage!Downside: cost of 50K USD for hardware. Still better than one or several H100/H200/B200 with limited non unified discrete memory architecture! =)And such setup will work for way much cheaper Mac Minis (no RDMA yet and Thunderbolt 5, but will be added to new generations of M chips, now available in M4 Pro and higher and M3 Ultra)!Apple way ahead of all again!In couple of years this will be a common consumer setup for local LLM inference, using conventional hardware, an APUs from AMD and Intel+NVidia (with integrated CPU+GPU NVLink bus - an upcoming APU architecture), while Apple and NVidia will use Intel Fabs and TSMC fabrication.The enclaves/TEE for hardware memory encryption will be part of such setups for confidential computing over confidential sensitive data.#CPU#GPU#LLM#TEE
Technologique
@technologique
Deeply involved developers about various aspects, tendencies & conceptions of programming technologies, FLOSS, Linux, security, cloud infrastructures & DevOps practices, distributed systems, data warehousing & analysis, DL/ML, web3, etc.Author: @andrcmdr
Похожие каналы
Все →Последние посты
Python 3.14 https://blog.miguelgrinberg.com/post/python-3-14-is-here-how-fast-is-it In short - new Python 3.14 it's awesome! Worth to update immediately! 3.14 is way much better in performance than any previous versions, has optionally enabled JIT (doesn't…NoGIL is definitely a huge leap forward!From 3.13 GIL can be disabled... but for this we need customly build interpreter from sources. That's the point should be refined.Cause not every main Linux distro now provide prebuilt packages, only Fedora (python3.14-freethreading package), OpenSUSE (python314-nogil package), Ubuntu (python3.14-nogil package through external PPA) and Nix (python314FreeThreading package), in Gentoo via own ebuild, or in Arch via own pkgbuild script.This will provide python3.14t with NoGIL enabled by default, and we can enable GIL with PYTHON_GIL environment variable or the command-line option -X gil for CPython.But... free-threaded CPython build is not thread safe!Thread safety, i.e. managing shared mutable state for simultaneous threads, using locks, mutexes and other synchronization primitives - are fully on developer. Python code is thread safe. But CLang code (via FFI) and Python interpreter code itself, that written in CLang, can allow access to the same memory, for pointers in several threads, lead to data race and deadlocks. Also can lead to dead/hanging objects in memory and thus memory leaks in long uptimes.And this will affect run-time and revealed only in run-time.(While in Rust for example pointers/references are typed and type-safe, thus allocations/deallocations, objects lifetimes tracking, pointers/references to same data and memory regions, are tracked in compile time, via move semantics, which completely prevents dangling pointers.)Thus memory sanitizers and threads sanitizers should be used for free-threaded CPython. And not all main/core libraries in PyPI now support free-threading.https://docs.python.org/3/howto/free-threading-python.htmlhttps://py-free-threading.git
And there are even more comprehensive continuous benchmarking from TechEmpower, which measure performance for frameworks and libraries in different languages and ecosystems (JSON serialization, web requests/responses, DB requests and updates, etc.):https://tfb-status.techempower.com/https://www.techempower.com/benchmarks/#section=data-r23&a=2&test=updatehttps://tfb-status.techempower.com/results/d27544b6-7365-4269-a4d4-f908f0d21a3ehttps://www.techempower.com/benchmarks/#section=test&runid=d27544b6-7365-4269-a4d4-f908f0d21a3e&a=2&test=update#benchmark#benchmarks#benchmarking#TechEmpower
Python 3.14https://blog.miguelgrinberg.com/post/python-3-14-is-here-how-fast-is-itIn short - new Python 3.14 it's awesome! Worth to update immediately!3.14 is way much better in performance than any previous versions, has optionally enabled JIT (doesn't give too much performance boost, due to the too much dynamic nature of Python and vibrant run-time objects lifetimes) and optionally disabled GIL for multi-threading (installed as separately compiled binary in a system).But PyPy JIT still outperform CPython.Much love for Python anyways! 🙌 Python is a cross-system glue now!Comparison with Rust is just for fun here - Python always will be much more slower, due to the dynamic types dispatch through vtables. And due to the dynamic nature Python always will allow run-time unexpected behavior and run-time crashes (thus should be covered thoroughly with tests for everything), while Rust is fully static (even Dyn trait impls checked by compiler in compile time) and fully type safe (in compile time, before running).There are also more consistent benchmarking test suite across languages:https://benchmarksgame-team.pages.debian.net/benchmarksgame/box-plot-summary-charts.html(They should update Python environment soon and we'll see 3.14 results - now 3,13 used.)#Python#Rust
And the full speech of Geoffrey Hinton about AI anxiety, risks and warning to Humanity:https://www.youtube.com/watch?v=IkdziSLYzHw#AI#AGI
AI anxietyhttps://youtu.be/odUjxJy0YMoHere's Geoffrey Hinton talking about the risks...In fact, he defined and described the risks as a warning to Humanity, and the risks are as follows:Access inequality to general artificial intelligence, i.e. AGI, is the most powerful of its forms, based on various specialized agents/models that interact with each other. OpenAI GPT4o, GPT4.1, o1, o3 and o4, GPT4.5 - are such models (DeepSeek R1 as well). This means that only corporations will have access to such intelligence, but not people and the community.Since proprietary models are closed, the community is offered a closed restricted model.Only the corporation and partially the state have a full model.And AI is actually the Fourth Industrial Revolution - it significantly increases labor productivity, due to very high-level automation.Those who have access to it are both competitive and more efficient.(Our startup, Sentient OpenAGI, is eager to solve this problem of unequal access to AI and create a platform that will contribute to the development of community-driven open AGI, based on decentralized web3 technologies.)And there are risks of bad actors - like developing viruses and bio-weapons. Genetic selective weapons, etc. I.e. the conversion between the protein structure of virion shell and its cell receptors to RNA or DNA sequence of nucleotides (nucleic acid bases) is the task that already solved by neural networks, as it is mostly a combinatorial task.This is not a joke or a fantasy anymore! All these are already existing technologies.#AI#AGI
The data storage engine projects we're all waiting for!I was expecting data storage engines and data warehouse solutions, cloud native solutions for data lakes, will be made using Rust, as systems language, in Rust community.Long awaited stuff, for the whole time since 2015, stabilized Rust v1.0 compiler and Rust 2015 standard.https://github.com/RustFS/RustFS#Rust#RustLang#RustFS
The one technically great web calls service, written in Rust, using Actix and NATS:https://videocall.rshttps://app.videocall.rshttps://github.com/security-union/videocall-rs
AI is dangerously centralized.Why building community aligned AI is really matter, and how web3 technologies can play the key role to resolving current situation with centralized AI, owned by tech giant companies, and instead help to create a community driven ecosystem for AI development.https://x.com/oleg_golev/status/1944157582144246077The podcast:https://x.com/autonolas/status/1926675599172452539#AI#AGI#OpenAGI
AI and AGI should be fully open sourced and loyal to builders and community! The most important thing I should say and add to Steve's blog post is that AI should be open (now we see opposite things - a big tech concentrated AI market), free (as in freedom)…Open, Monetizable, Loyal AGI Platformhttps://www.sentient.xyz#AI#AGI#OpenAGI
AI and AGI should be fully open sourced and loyal to builders and community!The most important thing I should say and add to Steve's blog post is that AI should be open (now we see opposite things - a big tech concentrated AI market), free (as in freedom), monetizable and loyal, for creators/builders/developers good and for community win. And this is OML principle. And target goal of Sentient Foundation, who makes truly open AGI future, and already developed Dobby model (and Dobby is already free! =), Sentient Chat, Sentient OpenDeepSearch, OML Fingerprinting library, Agent Framework and Enclaves Framework (proud to be a leading part of it!).And all of these parts of groundbreaking product portfolio and breakthroughs are made just within less than a year!More good things to come! Stay turned!https://steveklabnik.com/writing/i-am-disappointed-in-the-ai-discourse/https://www.sentient.xyz#AI#AGI#OpenAGI
Whoa! We need to update our kernels!https://hoefler.dev/articles/vsock.htmlhttps://security-tracker.debian.org/tracker/CVE-2025-21756#kernel#Linux#VSock
Amazing things has been released by Modular development team (Mojo language and Max inference backend): https://www.modular.com/blog/max-25-2-unleash-the-power-of-your-h200s-without-cuda #Mojo #MAX #AI #AGIModular provides MAX platform - it is MAX inference backend (engine) and MAX inference server (MAX Serve).Just look at this:https://builds.modular.com/models/DeepSeek-R1-Distill-Llama/8B-Q6_Khttps://builds.modular.com/models/Llama-3.3-Instruct/70B?tab=deployIn terms of deployment it is fantastic! Just one (relatively) tiny container!And in terms of programming - GPU programming and acceleration without CUDA, using Mojo language (statically LLVM compiled), which has capabilities of Rust (static memory safety), LLVM MLIR (Multi-Level Intermediate Representation) byte code compilation for amazing low level code optimization and acceleration, syntax of Python and Mojo integrates (embrace) the whole Python ecosystem. I'm playing with Mojo for quite a while already (and it is best of both worlds - Rust and Python), but MAX just used recently. And Llama.cpp not even in comparison with MAX!#Mojo#MAX#AI#AGI
Amazing things has been released by Modular development team (Mojo language and Max inference backend):https://www.modular.com/blog/max-25-2-unleash-the-power-of-your-h200s-without-cuda#Mojo#MAX#AI#AGI
https://www.youtube.com/live/AyH7zoP-JOgGreat conversation!The privacy and confidentiality should be a fundamental human right in the information and ubiquitous computations era.Always think about how your data will be used, what you say, message and what you'll prompt to search engine or AI model, how it can be and will be used, especially against your interests.#AI#AGI#privacy#confidentiality#confidential_computing#CC#security