vLLM-Omni fails with "NVMLError_InvalidArgument" and "V1 LLMEngine" errors in spawned worker processes

1 week ago 6

ARTICLE AD BOX

Problem

I'm trying to run vLLM-Omni (v0.11.0rc1) with the Qwen2.5-Omni-7B model on an NVIDIA A100 GPU, but initialization fails with two critical errors in spawned worker processes:

NVML Invalid Argument Error in one worker:

vllm.third_party.pynvml.NVMLError_InvalidArgument: Invalid Argument

This occurs at:

handle = pynvml.nvmlDeviceGetHandleByIndex(physical_device_id) V1 Engine Mismatch Error in other workers: ValueError: Using V1 LLMEngine, but envs.VLLM_USE_V1=False. This should not happen. As a workaround, try using LLMEngine.from_vllm_config(...) or explicitly set VLLM_USE_V1=0 or 1 and report this issue on Github.

All 3 spawned processes fail, causing the orchestrator to timeout:

WARNING: [Orchestrator] Initialization timeout: only 0/3 stages are ready; not ready: [0, 1, 2]

Environment

GPU: NVIDIA A100-SXM4-40GB

vLLM: 0.11.0

vLLM-Omni: 0.11.0rc1

Python: 3.10

PyTorch: CUDA available in main process

Multiprocessing: spawn method

Environment variables set in bash:

CUDA_VISIBLE_DEVICES=0

VLLM_USE_V1=0

VLLM_WORKER_MULTIPROC_METHOD=spawn

Code

import os import soundfile as sf import torch def main(): from vllm_omni.entrypoints.omni_llm import OmniLLM from vllm.sampling_params import SamplingParams print("=== Starting vLLM-Omni Test ===") print(f"Environment: VLLM_USE_V1={os.environ.get('VLLM_USE_V1', 'NOT SET')}") print(f"PyTorch CUDA: {torch.cuda.is_available()}, Devices: {torch.cuda.device_count()}") audio_path = "/scratch/users/ntu/es0001an/dataset_generated/001_input.wav" os.makedirs(os.path.dirname(audio_path), exist_ok=True) if not os.path.exists(audio_path): sf.write(audio_path, torch.zeros(16000).numpy(), 16000) print(f"Created dummy audio at {audio_path}") print("\n=== Initializing OmniLLM ===") engine = OmniLLM( model="Qwen/Qwen2.5-Omni-7B", trust_remote_code=True, dtype="bfloat16", runtime={"devices": [[0], [0], [0]]}, init_sleep_seconds=180, max_model_len=2048, disable_custom_all_reduce=True, enforce_eager=True, ) prompt = { "prompt": ( "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n" "<|im_start|>user\n<|audio_bos|><|AUDIO|><|audio_eos|>\n" "Describe this audio in detail.<|im_end|>\n<|im_start|>assistant\n" ), "multi_modal_data": {"audio": [audio_path]} } sampling_params = SamplingParams(temperature=0.7, max_tokens=512) sampling_params_list = [sampling_params, sampling_params, sampling_params] print("\n=== Generating Response ===") try: results = engine.generate([prompt], sampling_params_list) if results and len(results) > 0: result = results[0] print(f"\n{'='*60}") print("SUCCESS!") print(f"{'='*60}") print(result) if hasattr(result, 'outputs') and result.outputs: for idx, output in enumerate(result.outputs): if hasattr(output, 'text') and output.text: print(f"\nText: {output.text}") if hasattr(output, 'audio') and output.audio is not None: audio_file = f'output_{idx}.wav' sf.write(audio_file, output.audio, 24000) print(f"Audio saved to: {audio_file}") else: print("No results returned") except Exception as e: print(f"Error: {e}") import traceback traceback.print_exc() if __name__ == '__main__': main()

What I've Tried

Setting VLLM_USE_V1=0 in bash script (not Python) - still fails

Using single GPU with runtime={"devices": [[0], [0], [0]]}

Verified PyTorch can access GPU in main process

Added enforce_eager=True and disable_custom_all_reduce=True

Setting environment variables in Python with os.environ - doesn't propagate to spawned children

Questions

Why does NVML fail to get GPU handle in spawned processes when CUDA_VISIBLE_DEVICES=0 is set and the main process can access the GPU fine?

Why does vLLM-Omni use V1 LLMEngine despite VLLM_USE_V1=0 being explicitly set in the shell environment?

Is this a known bug in vLLM-Omni 0.11.0rc1, or is there a correct way to configure multi-stage initialization?

Should I try:

Setting VLLM_USE_V1=1 instead?

Using fork instead of spawn?

Any insights on resolving these multiprocessing/GPU initialization issues would be greatly appreciated!

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

vLLM-Omni fails with "NVMLError_InvalidArgument" and "V1 LLMEngine" errors in spawned worker processes

ARTICLE AD BOX

Problem

Environment

What I've Tried

Questions

Related

Automated upload for lazy admin

GitLab CI: Unable to download job artifact from another project – CI_JOB_TOKEN returns 404, private token returns 401

Most efficient way to merge two lists of dictionaries by a shared key in Python? [duplicate]

LEFT SIDEBAR AD