ARTICLE AD BOX
Problem
I'm trying to run vLLM-Omni (v0.11.0rc1) with the Qwen2.5-Omni-7B model on an NVIDIA A100 GPU, but initialization fails with two critical errors in spawned worker processes:
NVML Invalid Argument Error in one worker:
vllm.third_party.pynvml.NVMLError_InvalidArgument: Invalid Argument
This occurs at:
handle = pynvml.nvmlDeviceGetHandleByIndex(physical_device_id) V1 Engine Mismatch Error in other workers: ValueError: Using V1 LLMEngine, but envs.VLLM_USE_V1=False. This should not happen. As a workaround, try using LLMEngine.from_vllm_config(...) or explicitly set VLLM_USE_V1=0 or 1 and report this issue on Github.All 3 spawned processes fail, causing the orchestrator to timeout:
WARNING: [Orchestrator] Initialization timeout: only 0/3 stages are ready; not ready: [0, 1, 2]Environment
GPU: NVIDIA A100-SXM4-40GB
vLLM: 0.11.0
vLLM-Omni: 0.11.0rc1
Python: 3.10
PyTorch: CUDA available in main process
Multiprocessing: spawn method
Environment variables set in bash:
CUDA_VISIBLE_DEVICES=0
VLLM_USE_V1=0
VLLM_WORKER_MULTIPROC_METHOD=spawn
Code
import os import soundfile as sf import torch def main(): from vllm_omni.entrypoints.omni_llm import OmniLLM from vllm.sampling_params import SamplingParams print("=== Starting vLLM-Omni Test ===") print(f"Environment: VLLM_USE_V1={os.environ.get('VLLM_USE_V1', 'NOT SET')}") print(f"PyTorch CUDA: {torch.cuda.is_available()}, Devices: {torch.cuda.device_count()}") audio_path = "/scratch/users/ntu/es0001an/dataset_generated/001_input.wav" os.makedirs(os.path.dirname(audio_path), exist_ok=True) if not os.path.exists(audio_path): sf.write(audio_path, torch.zeros(16000).numpy(), 16000) print(f"Created dummy audio at {audio_path}") print("\n=== Initializing OmniLLM ===") engine = OmniLLM( model="Qwen/Qwen2.5-Omni-7B", trust_remote_code=True, dtype="bfloat16", runtime={"devices": [[0], [0], [0]]}, init_sleep_seconds=180, max_model_len=2048, disable_custom_all_reduce=True, enforce_eager=True, ) prompt = { "prompt": ( "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n" "<|im_start|>user\n<|audio_bos|><|AUDIO|><|audio_eos|>\n" "Describe this audio in detail.<|im_end|>\n<|im_start|>assistant\n" ), "multi_modal_data": {"audio": [audio_path]} } sampling_params = SamplingParams(temperature=0.7, max_tokens=512) sampling_params_list = [sampling_params, sampling_params, sampling_params] print("\n=== Generating Response ===") try: results = engine.generate([prompt], sampling_params_list) if results and len(results) > 0: result = results[0] print(f"\n{'='*60}") print("SUCCESS!") print(f"{'='*60}") print(result) if hasattr(result, 'outputs') and result.outputs: for idx, output in enumerate(result.outputs): if hasattr(output, 'text') and output.text: print(f"\nText: {output.text}") if hasattr(output, 'audio') and output.audio is not None: audio_file = f'output_{idx}.wav' sf.write(audio_file, output.audio, 24000) print(f"Audio saved to: {audio_file}") else: print("No results returned") except Exception as e: print(f"Error: {e}") import traceback traceback.print_exc() if __name__ == '__main__': main()What I've Tried
Setting VLLM_USE_V1=0 in bash script (not Python) - still fails
Using single GPU with runtime={"devices": [[0], [0], [0]]}
Verified PyTorch can access GPU in main process
Added enforce_eager=True and disable_custom_all_reduce=True
Setting environment variables in Python with os.environ - doesn't propagate to spawned children
Questions
Why does NVML fail to get GPU handle in spawned processes when CUDA_VISIBLE_DEVICES=0 is set and the main process can access the GPU fine?
Why does vLLM-Omni use V1 LLMEngine despite VLLM_USE_V1=0 being explicitly set in the shell environment?
Is this a known bug in vLLM-Omni 0.11.0rc1, or is there a correct way to configure multi-stage initialization?
Should I try:
Setting VLLM_USE_V1=1 instead?
Using fork instead of spawn?
Any insights on resolving these multiprocessing/GPU initialization issues would be greatly appreciated!
