Invest In Crypto News
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO
No Result
View All Result
Invest In Crypto News
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO
No Result
View All Result
Invest In Crypto News
No Result
View All Result

NVIDIA NVFP4 Training Delivers 1.59x Speed Boost Without Accuracy Loss

CryptoExpert by CryptoExpert
February 23, 2026
in Blockchain News
0
Nvidia's Soaring Data Center Revenue Signals Strong AI and GPU Market Position
  • Facebook
  • Twitter
  • Pinterest


You might also like

Ethereum Foundation Outlines Ethos and Responsibilities in New Mandate

AAVE Price Prediction: Targets $125-135 Recovery by April 2026

DeFi User Loses $50M in Crypto Swap Gone Wrong



Rongchai Wang
Feb 23, 2026 18:39

NVIDIA’s NVFP4 4-bit training format achieves 59% faster AI model training than BF16 while matching accuracy on Llama 3 8B benchmarks, per new research.





NVIDIA’s NVFP4 low-precision training format delivers up to 1.59x faster throughput compared to standard BF16 training while maintaining equivalent model accuracy, according to new benchmarks published by the company’s research team on February 23, 2026.

The results mark a significant milestone for 4-bit AI training, demonstrating that aggressive numerical compression doesn’t require sacrificing model quality when proper techniques are applied.

The Numbers That Matter

Testing on Llama 3 8B models trained across 1 trillion tokens, NVIDIA’s team measured throughput at 1,850 TFLOP/s per GPU with NVFP4 versus 1,165 TFLOP/s for BF16 baseline—a 59% improvement. The tests ran on GB200 NVL72 hardware using the company’s Blackwell architecture.

Downstream benchmark scores tell the real story. On MMLU, NVFP4-trained Llama 3 8B scored 45.64% compared to 45.98% for BF16. HellaSwag showed 75.59% versus 76.44%. These differences fall within noise margins for practical applications.

okex

Memory efficiency gains enabled doubling the micro-batch size from 2 to 4 during pretraining, directly improving scalability for large-scale training runs.

Why 4-Bit Training Works Now

Previous attempts at ultra-low-precision training often resulted in model divergence or significant accuracy degradation. NVIDIA’s approach sidesteps these issues through a specific recipe that’s emerged from extensive testing.

The critical insight: keeping approximately 15% of the network in higher precision prevents training collapse. Specifically, the final four transformer layers must remain in BF16. Ablation studies confirmed that fully NVFP4 models diverge during training.

The format uses a two-level scaling strategy—micro-block scaling for groups of 16 elements combined with global FP32 scaling across full tensors. This hierarchical approach manages the limited dynamic range inherent in 4-bit representations.

Random Hadamard transforms smooth tensor spectrums and reduce outliers that would otherwise cause training instability. Stochastic rounding for gradients eliminates systematic quantization bias.

Comparison With Other Low-Precision Formats

NVFP4 isn’t the only option. FP8 with current scaling (FP8-CS) achieved 1.33x speedup over BF16, while MXFP8—a block-level scaling variant optimized for Blackwell—hit 1.32x. Both formats showed slightly better convergence tracking than NVFP4 during training, though final accuracy metrics remained comparable across all approaches.

MXFP8 demonstrated marginally better performance than standard FP8, likely due to finer-grained scaling that better captures local dynamic range within tensors.

Production Deployment

The techniques are available now through NeMo Megatron Bridge, NVIDIA’s open PyTorch-native library. Switching between precision formats requires changing a single configuration flag—no model code or optimizer logic modifications needed.

For teams running large-scale training workloads on Blackwell hardware, the throughput gains translate directly to reduced training time and compute costs. A model that previously required 10 days of training could potentially complete in under 7 days with NVFP4.

The recommended recipe for NVFP4: AdamW optimizer with epsilon=1e-8, learning rate decaying from 6e-4 to 6e-6, and global batch size of 768. These parameters represent the empirical sweet spot from NVIDIA’s extensive testing across multiple architectures and datasets.

Image source: Shutterstock



Source link

  • Facebook
  • Twitter
  • Pinterest
CryptoExpert

CryptoExpert

Recommended For You

Ethereum Foundation Outlines Ethos and Responsibilities in New Mandate

by CryptoExpert
March 13, 2026
0
Ethereum Foundation Outlines Ethos and Responsibilities in New Mandate

The Ethereum Foundation, the non-profit organization that stewards the development of the Ethereum ecosystem, published its mandate on Friday, reaffirming its role and the core pillars of Ethereum.The...

Read more

AAVE Price Prediction: Targets $125-135 Recovery by April 2026

by CryptoExpert
March 13, 2026
0
AAVE Price Prediction: Recovery to $226-246 Target by December 2025 Despite Current Weakness

Rebeca Moen Mar 13, 2026 09:45 AAVE Price Prediction Summary • Short-term target (1 week) : $118-121 • Medium-term forecast (1 month) : $125-135...

Read more

DeFi User Loses $50M in Crypto Swap Gone Wrong

by CryptoExpert
March 13, 2026
0
DeFi User Loses $50M in Crypto Swap Gone Wrong

A crypto user has lost millions during a crypto swap on the decentralized finance protocol Aave, with a Maximal Extractable Value, or MEV, bot also front-running the transaction...

Read more

IBM Releases Quantum-HPC Integration Blueprint Targeting Drug Discovery

by CryptoExpert
March 12, 2026
0
IBM Research Unveils Cost-Effective AI Inferencing with Speculative Decoding

Darius Baruo Mar 12, 2026 21:21 IBM publishes reference architecture for embedding quantum processors into existing supercomputing centers, enabling molecular simulations beyond classical capabilities. ...

Read more

Why Every Blockchain Suddenly Wants Its Own Perp Dex

by CryptoExpert
March 12, 2026
0
Why Every Blockchain Suddenly Wants Its Own Perp Dex

In crypto’s latest infrastructure race, blockchains are competing to host perpetual futures exchanges. Many are now launching or incubating decentralized derivatives markets themselves, even as centralized platforms continue...

Read more
Next Post
DEFI CRUSHING BANKS!! Trillion Dollar Wealth Redistribution… Amadeo Brands and Ivan on Tech

DEFI CRUSHING BANKS!! Trillion Dollar Wealth Redistribution... Amadeo Brands and Ivan on Tech

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Browse by Category

  • Altcoin News
  • Bitcoin News
  • Blockchain News
  • Business
  • Doge News
  • Ethereum News
  • Finance
  • Market Analysis
  • Mining
  • NFT News
  • Politics
  • Regulation
  • Technology
  • Trending Cryptos
  • Video

Sitemap

  • Market Cap
  • Donations
  • Trading
  • Mining
  • Contact

Legal Information

  • Privacy Policy
  • Anti-Spam Policy
  • Copyright Notice
  • DMCA Compliance
  • Social Media Disclaimer
  • Terms Of Service

Categories

  • Altcoin News
  • Bitcoin News
  • Blockchain News
  • Business
  • Doge News
  • Ethereum News
  • Finance
  • Market Analysis
  • Mining
  • NFT News
  • Politics
  • Regulation
  • Technology
  • Trending Cryptos
  • Video

© Copyright 2024 InvestInCryptoNews.com

No Result
View All Result
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO

© Copyright 2024 InvestInCryptoNews.com

This website is using cookies to improve the user-friendliness. You agree by using the website further.

Privacy policy
bitcoin
Bitcoin (BTC) $ 70,748.00
ethereum
Ethereum (ETH) $ 2,090.13
tether
Tether (USDT) $ 1.00
bnb
BNB (BNB) $ 654.35
xrp
XRP (XRP) $ 1.40
usd-coin
USDC (USDC) $ 0.999997
solana
Solana (SOL) $ 88.02
tron
TRON (TRX) $ 0.293296
figure-heloc
Figure Heloc (FIGR_HELOC) $ 1.02
staked-ether
Lido Staked Ether (STETH) $ 2,265.05

Pin It on Pinterest

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?