Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not ok 26 - attach: --stdin-ranks works #6552

Open
chu11 opened this issue Jan 13, 2025 · 1 comment
Open

not ok 26 - attach: --stdin-ranks works #6552

chu11 opened this issue Jan 13, 2025 · 1 comment

Comments

@chu11
Copy link
Member

chu11 commented Jan 13, 2025

While churning through CI runs in #6544 hit this failure several times

expecting success: 
  	id=$(flux submit -N4 -o exit-timeout=none -t60s cat) &&
  	echo hello from 0 \
  		| flux job attach -vEX --label-io -i0 $id >stdin-ranks.out &&
  	flux job eventlog -p guest.input $id &&
  	cat <<-EOF >stdin-ranks.expected &&
  	0: hello from 0
  	EOF
  	test_cmp stdin-ranks.expected stdin-ranks.out

  0.000s: job.submit {"userid":1001,"urgency":16,"flags":0,"version":1}
  0.015s: job.validate
  0.027s: job.depend
  0.027s: job.priority {"priority":16}
  0.029s: job.alloc {"annotations":{"sched":{"resource_summary":"rank[0-3]/core[0-1]"}}}
  0.051s: job.start
  0.032s: exec.init
  0.035s: exec.starting
  0.091s: exec.shell.init {"service":"1001-shell-f8AVminB","leader-rank":0,"size":4}
  0.100s: exec.shell.start {"taskmap":{"version":1,"map":[[0,4,1,1]]}}
  0.107s: exec.shell.task-exit {"localid":0,"rank":0,"state":"Exited","pid":172646,"wait_status":0,"signaled":0,"exitcode":0}
  60.052s: job.exception f8AVminB type=timeout severity=0 resource allocation expired
  60.071s: exec.complete {"status":36352}
  60.071s: exec.done
  flux-job: task(s) Alarm clock
  60.071s: job.finish {"status":36352}
  60.077s: job.release {"ranks":"all","final":true}
  60.077s: job.free
  60.077s: job.clean
  not ok 26 - attach: --stdin-ranks works
@chu11
Copy link
Member Author

chu11 commented Jan 14, 2025

huh, I just noticed

  0.091s: exec.shell.init {"service":"1001-shell-f8AVminB","leader-rank":0,"size":4}
  0.100s: exec.shell.start {"taskmap":{"version":1,"map":[[0,4,1,1]]}}
  0.107s: exec.shell.task-exit {"localid":0,"rank":0,"state":"Exited","pid":172646,"wait_status":0,"signaled":0,"exitcode":0}
  60.052s: job.exception f8AVminB type=timeout severity=0 resource allocation expired
  60.071s: exec.complete {"status":36352}

It appears the task exited immediately, but somehow the shell lingered until the 60s timeout?

Unclear if the stdin "reached" the cat.

Perhaps some very subtle racy thing involving stdin and a shell that isn't ready for it??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant