Invest In Crypto News
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO
No Result
View All Result
Invest In Crypto News
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO
No Result
View All Result
Invest In Crypto News
No Result
View All Result

NVIDIA NVFP4 Training Delivers 1.59x Speed Boost Without Accuracy Loss

CryptoExpert by CryptoExpert
February 23, 2026
in Blockchain News
0
Nvidia's Soaring Data Center Revenue Signals Strong AI and GPU Market Position
  • Facebook
  • Twitter
  • Pinterest


You might also like

Franklin Templeton, BNP Paribas See Tokenization Boosting EU’s Capital Efficiency

CFTC Proposes New Rules for Sports Prediction Markets

Onchain Gambling Defies Crypto Pullback With $14B Quarter: TRM Labs



Rongchai Wang
Feb 23, 2026 18:39

NVIDIA’s NVFP4 4-bit training format achieves 59% faster AI model training than BF16 while matching accuracy on Llama 3 8B benchmarks, per new research.





NVIDIA’s NVFP4 low-precision training format delivers up to 1.59x faster throughput compared to standard BF16 training while maintaining equivalent model accuracy, according to new benchmarks published by the company’s research team on February 23, 2026.

The results mark a significant milestone for 4-bit AI training, demonstrating that aggressive numerical compression doesn’t require sacrificing model quality when proper techniques are applied.

The Numbers That Matter

Testing on Llama 3 8B models trained across 1 trillion tokens, NVIDIA’s team measured throughput at 1,850 TFLOP/s per GPU with NVFP4 versus 1,165 TFLOP/s for BF16 baseline—a 59% improvement. The tests ran on GB200 NVL72 hardware using the company’s Blackwell architecture.

Downstream benchmark scores tell the real story. On MMLU, NVFP4-trained Llama 3 8B scored 45.64% compared to 45.98% for BF16. HellaSwag showed 75.59% versus 76.44%. These differences fall within noise margins for practical applications.

okex

Memory efficiency gains enabled doubling the micro-batch size from 2 to 4 during pretraining, directly improving scalability for large-scale training runs.

Why 4-Bit Training Works Now

Previous attempts at ultra-low-precision training often resulted in model divergence or significant accuracy degradation. NVIDIA’s approach sidesteps these issues through a specific recipe that’s emerged from extensive testing.

The critical insight: keeping approximately 15% of the network in higher precision prevents training collapse. Specifically, the final four transformer layers must remain in BF16. Ablation studies confirmed that fully NVFP4 models diverge during training.

The format uses a two-level scaling strategy—micro-block scaling for groups of 16 elements combined with global FP32 scaling across full tensors. This hierarchical approach manages the limited dynamic range inherent in 4-bit representations.

Random Hadamard transforms smooth tensor spectrums and reduce outliers that would otherwise cause training instability. Stochastic rounding for gradients eliminates systematic quantization bias.

Comparison With Other Low-Precision Formats

NVFP4 isn’t the only option. FP8 with current scaling (FP8-CS) achieved 1.33x speedup over BF16, while MXFP8—a block-level scaling variant optimized for Blackwell—hit 1.32x. Both formats showed slightly better convergence tracking than NVFP4 during training, though final accuracy metrics remained comparable across all approaches.

MXFP8 demonstrated marginally better performance than standard FP8, likely due to finer-grained scaling that better captures local dynamic range within tensors.

Production Deployment

The techniques are available now through NeMo Megatron Bridge, NVIDIA’s open PyTorch-native library. Switching between precision formats requires changing a single configuration flag—no model code or optimizer logic modifications needed.

For teams running large-scale training workloads on Blackwell hardware, the throughput gains translate directly to reduced training time and compute costs. A model that previously required 10 days of training could potentially complete in under 7 days with NVFP4.

The recommended recipe for NVFP4: AdamW optimizer with epsilon=1e-8, learning rate decaying from 6e-4 to 6e-6, and global batch size of 768. These parameters represent the empirical sweet spot from NVIDIA’s extensive testing across multiple architectures and datasets.

Image source: Shutterstock



Source link

  • Facebook
  • Twitter
  • Pinterest
CryptoExpert

CryptoExpert

Recommended For You

Franklin Templeton, BNP Paribas See Tokenization Boosting EU’s Capital Efficiency

by CryptoExpert
June 11, 2026
0
Cointelegraph

Large financial institutions are turning to tokenization to improve capital efficiency and liquidity, according to representatives from Franklin Templeton and BNP Paribas.Speaking at a panel at the WAIB...

Read more

CFTC Proposes New Rules for Sports Prediction Markets

by CryptoExpert
June 11, 2026
0
CGV Leads Expansion in Bitcoin Wallet Sector with UniSat Investment

Jessie A Ellis Jun 10, 2026 22:19 The CFTC's proposal could legitimize sports prediction markets while clarifying election contract regulations. Public comments open for...

Read more

Onchain Gambling Defies Crypto Pullback With $14B Quarter: TRM Labs

by CryptoExpert
June 11, 2026
0
Cointelegraph

Prediction markets overtook onchain gambling for the first time in the opening quarter of 2026, recording $36.6 billion in volume compared with gambling's $14 billion, according to TRM...

Read more

Botanix Pulls Plug on Bitcoin L2 After 4 Years as Fee Income Falls Short

by CryptoExpert
June 10, 2026
0
Botanix Pulls Plug on Bitcoin L2 After 4 Years as Fee Income Falls Short

Key TakeawaysBitcoin L2 network Botanix is winding down all operations after a 4-year effort to build on the blockchain.The team notes the crypto market favors centralized venues like...

Read more

Claude Managed Agents Add Scheduling, Secure CLI Access

by CryptoExpert
June 10, 2026
0
Claude Managed Agents Add Scheduling, Secure CLI Access

Tony Kim Jun 09, 2026 21:28 Claude Managed Agents now support scheduled tasks and secure CLI tool integration, streamlining enterprise AI automation. ...

Read more
Next Post
DEFI CRUSHING BANKS!! Trillion Dollar Wealth Redistribution… Amadeo Brands and Ivan on Tech

DEFI CRUSHING BANKS!! Trillion Dollar Wealth Redistribution... Amadeo Brands and Ivan on Tech

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Browse by Category

  • Altcoin News
  • Bitcoin News
  • Blockchain News
  • Business
  • Doge News
  • Ethereum News
  • Finance
  • Market Analysis
  • Mining
  • NFT News
  • Politics
  • Regulation
  • Technology
  • Trending Cryptos
  • Video

Sitemap

  • Market Cap
  • Donations
  • Trading
  • Mining
  • Contact

Legal Information

  • Privacy Policy
  • Anti-Spam Policy
  • Copyright Notice
  • DMCA Compliance
  • Social Media Disclaimer
  • Terms Of Service

Categories

  • Altcoin News
  • Bitcoin News
  • Blockchain News
  • Business
  • Doge News
  • Ethereum News
  • Finance
  • Market Analysis
  • Mining
  • NFT News
  • Politics
  • Regulation
  • Technology
  • Trending Cryptos
  • Video

© Copyright 2024 InvestInCryptoNews.com

No Result
View All Result
  • Home
  • Latest News
    • Bitcoin News
    • Altcoin News
    • Ethereum News
    • Blockchain News
    • Doge News
    • NFT News
    • Video
    • Market Analysis
    • Business
    • Finance
    • Politics
    • Mining
    • Regulation
    • Technology
  • Top 10 Cryptos
  • Market Cap List
  • IC DAO
  • Donations
  • Contact
  • Buy Crypto
  • IC DAO

© Copyright 2024 InvestInCryptoNews.com

This website is using cookies to improve the user-friendliness. You agree by using the website further.

Privacy policy
bitcoin
Bitcoin (BTC) $ 63,579.00
ethereum
Ethereum (ETH) $ 1,681.09
tether
Tether (USDT) $ 0.998908
bnb
BNB (BNB) $ 603.63
usd-coin
USDC (USDC) $ 0.99982
xrp
XRP (XRP) $ 1.14
solana
Solana (SOL) $ 66.86
tron
TRON (TRX) $ 0.313878
figure-heloc
Figure Heloc (FIGR_HELOC) $ 1.03
staked-ether
Lido Staked Ether (STETH) $ 2,265.05

Pin It on Pinterest

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?