Invest In Crypto News
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO
No Result
View All Result
Invest In Crypto News
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO
No Result
View All Result
Invest In Crypto News
No Result
View All Result

NVIDIA NVFP4 Training Delivers 1.59x Speed Boost Without Accuracy Loss

CryptoExpert by CryptoExpert
February 23, 2026
in Blockchain News
0
Nvidia's Soaring Data Center Revenue Signals Strong AI and GPU Market Position
  • Facebook
  • Twitter
  • Pinterest


You might also like

Canton, ZKsync Clash Over How Blockchains Enforce Rules

Japanese Government Bond Collateral Goes Onchain in New JSCC and Mizuho Blockchain Pilot

PEPE Flatlining at $0.0000045 – Technical Deadlock Points to $0.000006+ Breakout Within 72 Hours



Rongchai Wang
Feb 23, 2026 18:39

NVIDIA’s NVFP4 4-bit training format achieves 59% faster AI model training than BF16 while matching accuracy on Llama 3 8B benchmarks, per new research.





NVIDIA’s NVFP4 low-precision training format delivers up to 1.59x faster throughput compared to standard BF16 training while maintaining equivalent model accuracy, according to new benchmarks published by the company’s research team on February 23, 2026.

The results mark a significant milestone for 4-bit AI training, demonstrating that aggressive numerical compression doesn’t require sacrificing model quality when proper techniques are applied.

The Numbers That Matter

Testing on Llama 3 8B models trained across 1 trillion tokens, NVIDIA’s team measured throughput at 1,850 TFLOP/s per GPU with NVFP4 versus 1,165 TFLOP/s for BF16 baseline—a 59% improvement. The tests ran on GB200 NVL72 hardware using the company’s Blackwell architecture.

Downstream benchmark scores tell the real story. On MMLU, NVFP4-trained Llama 3 8B scored 45.64% compared to 45.98% for BF16. HellaSwag showed 75.59% versus 76.44%. These differences fall within noise margins for practical applications.

Phemex

Memory efficiency gains enabled doubling the micro-batch size from 2 to 4 during pretraining, directly improving scalability for large-scale training runs.

Why 4-Bit Training Works Now

Previous attempts at ultra-low-precision training often resulted in model divergence or significant accuracy degradation. NVIDIA’s approach sidesteps these issues through a specific recipe that’s emerged from extensive testing.

The critical insight: keeping approximately 15% of the network in higher precision prevents training collapse. Specifically, the final four transformer layers must remain in BF16. Ablation studies confirmed that fully NVFP4 models diverge during training.

The format uses a two-level scaling strategy—micro-block scaling for groups of 16 elements combined with global FP32 scaling across full tensors. This hierarchical approach manages the limited dynamic range inherent in 4-bit representations.

Random Hadamard transforms smooth tensor spectrums and reduce outliers that would otherwise cause training instability. Stochastic rounding for gradients eliminates systematic quantization bias.

Comparison With Other Low-Precision Formats

NVFP4 isn’t the only option. FP8 with current scaling (FP8-CS) achieved 1.33x speedup over BF16, while MXFP8—a block-level scaling variant optimized for Blackwell—hit 1.32x. Both formats showed slightly better convergence tracking than NVFP4 during training, though final accuracy metrics remained comparable across all approaches.

MXFP8 demonstrated marginally better performance than standard FP8, likely due to finer-grained scaling that better captures local dynamic range within tensors.

Production Deployment

The techniques are available now through NeMo Megatron Bridge, NVIDIA’s open PyTorch-native library. Switching between precision formats requires changing a single configuration flag—no model code or optimizer logic modifications needed.

For teams running large-scale training workloads on Blackwell hardware, the throughput gains translate directly to reduced training time and compute costs. A model that previously required 10 days of training could potentially complete in under 7 days with NVFP4.

The recommended recipe for NVFP4: AdamW optimizer with epsilon=1e-8, learning rate decaying from 6e-4 to 6e-6, and global batch size of 768. These parameters represent the empirical sweet spot from NVIDIA’s extensive testing across multiple architectures and datasets.

Image source: Shutterstock



Source link

  • Facebook
  • Twitter
  • Pinterest
CryptoExpert

CryptoExpert

Recommended For You

Canton, ZKsync Clash Over How Blockchains Enforce Rules

by CryptoExpert
April 21, 2026
0
Canton, ZKsync Clash Over How Blockchains Enforce Rules

Banks are moving onchain through competing models that take different approaches to how financial rules are enforced.On the one hand are blockchain-native builders like Matter Labs co-founder Alex...

Read more

Japanese Government Bond Collateral Goes Onchain in New JSCC and Mizuho Blockchain Pilot

by CryptoExpert
April 21, 2026
0
Japanese Government Bond Collateral Goes Onchain in New JSCC and Mizuho Blockchain Pilot

Key Takeaways: JSCC, Mizuho, and Nomura launched a PoC on April 20, 2026, to test JGB digital collateral on the Canton Network. The JFSA-backed trial targets 24/7 real-time...

Read more

PEPE Flatlining at $0.0000045 – Technical Deadlock Points to $0.000006+ Breakout Within 72 Hours

by CryptoExpert
April 21, 2026
0
Bitcoin Hits $118K All-Time High: Analyzing Market Momentum, Technicals, and Future Outlook

Ted Hisokawa Apr 21, 2026 07:27 PEPE sits trapped in neutral territory with RSI at 54.69 and MACD at zero, but stochastic crossover signals...

Read more

Coinbase Expands Crypto-Backed USDC Loans to UK Users

by CryptoExpert
April 20, 2026
0
Coinbase Expands Crypto-Backed USDC Loans to UK Users

Crypto exchange Coinbase has rolled out crypto-backed USDC loans for users in the United Kingdom, allowing users to borrow USDC against Bitcoin, Ether and Coinbase Wrapped Staked Ether...

Read more

AAVE Token Crashes 20% as $293M Kelp DAO Hack Triggers $8B TVL Exodus

by CryptoExpert
April 20, 2026
0
AssemblyAI Introduces German STT and Enhances PII Detection

Alvin Lang Apr 20, 2026 03:54 Aave loses $8 billion in TVL after hackers use stolen Kelp DAO funds as collateral, creating $195M in...

Read more
Next Post
DEFI CRUSHING BANKS!! Trillion Dollar Wealth Redistribution… Amadeo Brands and Ivan on Tech

DEFI CRUSHING BANKS!! Trillion Dollar Wealth Redistribution... Amadeo Brands and Ivan on Tech

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Browse by Category

  • Altcoin News
  • Bitcoin News
  • Blockchain News
  • Business
  • Doge News
  • Ethereum News
  • Finance
  • Market Analysis
  • Mining
  • NFT News
  • Politics
  • Regulation
  • Technology
  • Trending Cryptos
  • Video

Sitemap

  • Market Cap
  • Donations
  • Trading
  • Mining
  • Contact

Legal Information

  • Privacy Policy
  • Anti-Spam Policy
  • Copyright Notice
  • DMCA Compliance
  • Social Media Disclaimer
  • Terms Of Service

Categories

  • Altcoin News
  • Bitcoin News
  • Blockchain News
  • Business
  • Doge News
  • Ethereum News
  • Finance
  • Market Analysis
  • Mining
  • NFT News
  • Politics
  • Regulation
  • Technology
  • Trending Cryptos
  • Video

© Copyright 2024 InvestInCryptoNews.com

No Result
View All Result
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO

© Copyright 2024 InvestInCryptoNews.com

This website is using cookies to improve the user-friendliness. You agree by using the website further.

Privacy policy
bitcoin
Bitcoin (BTC) $ 77,586.00
ethereum
Ethereum (ETH) $ 2,365.67
tether
Tether (USDT) $ 1.00
xrp
XRP (XRP) $ 1.44
bnb
BNB (BNB) $ 640.04
usd-coin
USDC (USDC) $ 0.999881
solana
Solana (SOL) $ 87.31
tron
TRON (TRX) $ 0.332195
figure-heloc
Figure Heloc (FIGR_HELOC) $ 1.03
staked-ether
Lido Staked Ether (STETH) $ 2,265.05

Pin It on Pinterest

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?