Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile subexperiment generation #529

Closed
caleb-johnson opened this issue Apr 3, 2024 · 4 comments · Fixed by #556
Closed

Profile subexperiment generation #529

caleb-johnson opened this issue Apr 3, 2024 · 4 comments · Fixed by #556
Labels
classical performance Related to computational efficiency of the code that runs on classical hardware cutting QPD-based circuit cutting code good first issue Good for newcomers

Comments

@caleb-johnson
Copy link
Collaborator

With an increased focus on utility-scale experiments, we need to profile our subexperiment generation to ensure there are no easily-correctable inefficiencies. I have noticed that for cutting schemes involving lower-rotation gates, subexperiment generation can take a very long time when doing a full sampling, even for modestly-sized circuits.

@caleb-johnson caleb-johnson added good first issue Good for newcomers classical performance Related to computational efficiency of the code that runs on classical hardware cutting QPD-based circuit cutting code labels Apr 3, 2024
@garrison
Copy link
Member

garrison commented Apr 9, 2024

I noticed that qiskit has benchmark tests based on asv in its test/benchmarks/ directory. We should add things like a benchmark related to this issue to a similar place in our repository, so that we can track performance over time, and at least have an idea of the key workflows where we care about performance.

@caleb-johnson
Copy link
Collaborator Author

Total subexperiments created: 12960
Total number of instructions in each subexperiment: Approximately 20-25

   344701052 function calls (303964246 primitive calls) in 149.617 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  149.776  149.776 {built-in method builtins.exec}
        1    0.000    0.000  149.776  149.776 <string>:1(<module>)
        1    0.120    0.120  149.775  149.775 cutting_experiments.py:40(generate_cutting_experiments)
  12963/3    0.009    0.000  132.468   44.156 passmanager.py:417(wrapper)
        2    0.000    0.000  132.468   66.234 passmanager.py:127(run)
        2    0.000    0.000  132.468   66.234 passmanager.py:172(run)
        2    0.029    0.014  132.465   66.232 parallel.py:104(parallel_map)
    12960    0.228    0.000  132.434    0.010 passmanager.py:317(_run_workflow_in_new_process)
    12960    0.091    0.000  131.421    0.010 passmanager.py:270(_run_workflow)
25920/12960    0.148    0.000  119.654    0.009 base_tasks.py:202(execute)
   103680    0.490    0.000  119.071    0.001 base_tasks.py:72(execute)
    25920    0.461    0.000  116.043    0.004 dag_fixed_point.py:28(run)
31002016/373680   31.174    0.000   83.540    0.000 copy.py:128(deepcopy)
2382912/329184    6.889    0.000   81.959    0.000 copy.py:259(_reconstruct)
1516320/38880    4.007    0.000   80.666    0.002 copy.py:227(_deepcopy_dict)
5218560/2813120    6.502    0.000   59.745    0.000 copy.py:210(_deepcopy_tuple)
5218560/2813120    2.995    0.000   56.073    0.000 copy.py:211(<listcomp>)
928800/397440    1.667    0.000   56.064    0.000 copy.py:201(_deepcopy_list)

@garrison
Copy link
Member

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented Apr 12, 2024

It appears that this is coming from the following lines, which were added in #458. Is that correct?

https://github.com/Qiskit-Extensions/circuit-knitting-toolbox/blob/21237168818ae89d0bb54002e260a06312d04b8f/circuit_knitting/cutting/cutting_experiments.py#L169-L188

I am fairly certain. I am a little puzzled as to why dag_fixed_point is so expensive, and I don't see the other passes explicitly named here. Maybe the other run calls are those passes but with a generic name? I'm not sure right now tbh

This is obviously a truncated output. Maybe the other passes are named further down, but I didn't see them on quick glance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classical performance Related to computational efficiency of the code that runs on classical hardware cutting QPD-based circuit cutting code good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants