Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support literal translations (map type !natural_language) #570

Open
BrentBlanckaert opened this issue Dec 4, 2024 · 12 comments
Open

Support literal translations (map type !natural_language) #570

BrentBlanckaert opened this issue Dec 4, 2024 · 12 comments

Comments

@BrentBlanckaert
Copy link
Collaborator

BrentBlanckaert commented Dec 4, 2024

This issue specifically focuses on defining how mappings in a return in the DSL should be handled for different programming languages. Currently, TESTed relies on heuristics to determine whether a mapping is intended for translations or programming languages.

A potential solution is to explicitly specify this in the return statement within a test suite, as illustrated below:

return: !programming_language
  "python": "{1, 2, 3}"
  "javascript": "new Set([1, 2, 3])"
@pdawyndt
Copy link
Contributor

pdawyndt commented Dec 4, 2024

@niknetniko Does this make sense to you? We want to use explicit YAML-types to make a distinction between different types of maps. For example, as a value of the return attribute we can have these types of maps:

  • map as return value (the default)
  • map defining an oracle for the return value (TESTed already supports the explicit type !oracle

We would also like to introduce a new type of map for natural language translations (proposed type: !natural_language).

@pdawyndt pdawyndt changed the title support for literal translations (map type !programming_language) Support for literal translations (map type !programming_language) Dec 4, 2024
@niknetniko
Copy link
Member

To my knowledge, TESTed actually has no heuristic at the moment. I think, if a YAML map is encountered with return:

  • If the map has !oracle, it is an explicit oracle.
  • Else the map is considered a literal map as the return value.

I don't think there is currently support for specifying a return value in multiple languages. If added, I think an explicit tag is probably the way to go. I experimented a bit, and for example, the following:

return:
  "python": "{1, 2, 3}"
  "javascript": "new Set([1, 2, 3])"

Is interpreted as a testcase where return value must be a map with the two keys "python" and "javascript".

I do think the explicit requirement makes sense if adding other map types.

However, adding support specifically for language literals also requires adding support for actually comparing these values. This would probably involve implementing a new oracle that supports comparisons in a target programming language, similar to how the programming-language-specific oracles work (or this is how I would implement it at least).

@pdawyndt
Copy link
Contributor

pdawyndt commented Dec 5, 2024

Probably the only place where TESTed-DSL currently allows to specify programming-language specific values is in statement and expression. So for statement and expression we either have a string or a map (differentiation by programming language) as a value, so explicit typing of the map is not needed there to resolve the type of the map (there is only one map). However, explicit typing will become necessary if we also introduce the option to translate the statement and expression in multiple natural languages: to resolve !programming_language maps from !natural_language maps.

I would not introduce programming_language differentiation for return values for now, as there is no direct need for it (I updated my above comment, as I was mistaken that we already supported this). However, translating the return value (and other input and output channels) in multiple natural languages is something we would like to introduce. So resolving YAML-maps as literal maps, oracles or translations in natural language will also be needed for return values.

Good to hear that we don't use heuristics for map-resolution in TESTed yet. This is good design and we have to make sure that we keep it this way.

@pdawyndt
Copy link
Contributor

pdawyndt commented Dec 5, 2024

@BrentBlanckaert: Is it clear for you that we don't want the !programming_language extension for return. We only want to keep it for statement and expression , but allow to make it explicit there to differentiate between !programming_language and !natural_language translations. Since TESTed treats maps at that place as !programming_language maps, this should remain the default for untyped maps.

@BrentBlanckaert
Copy link
Collaborator Author

BrentBlanckaert commented Dec 5, 2024

So !programming_language for returns will no longer be implemented and it's already present in in statement and expressions.
So the next step would be to to introduce !natural_language for statements, expressions, returns, stdout and stdin?

@bmesuere
Copy link
Member

bmesuere commented Dec 5, 2024

I'm a bit confused by seeing natural language pop up here. We had a meeting where we decided that we would keep natural languages outside of tested itself and first solve this in preprocessing.

@jorg-vr
Copy link
Contributor

jorg-vr commented Dec 5, 2024

As I understood it the plan was to build in support in TESTed for translations using multiple files. Eg. suite.nl.yaml and suite.en.yaml.
This should require minimal changes in tested, simply selecting the relevant file based on the provided language.

In a next step we would write a preprocessing step to be able to write these two files in a single file.
In this format for preprocessing, the above discussion might still be relevant, as that format should also be unambiguous.

@pdawyndt
Copy link
Contributor

pdawyndt commented Dec 5, 2024

@bmesuere We learned from our analysis of 916 existing Python-exercises with translations that only 53 out of these 916 exercises could take advantage of having separate test suites for each individual language (and for some of those this is not even the case as the separation is only done for a few but not all units).

Generating separate test suites only require minor changes to TESTed. In the config.json of a TESTed exercise we now have the following configuration.

"evaluation": {
  "handler": "TESTed",
  "plan_name": "tested.yaml",
}

We need indeed need to extend this such that we can have separate test suites for each language supported by the exercise and then select the appropriate test suite based on the natural language setting passed to TESTed.

"evaluation": {
  "handler": "TESTed",
  "plan_name":  {
    "en": "tested.en.yaml",
    "nl": "tested.nl.yaml"
  }
}

This is a separate issue from this one, and indeed we need to take it up so good that you remind us about it. However, I'm not supportive of converting all Python exercises (or generating new ones) using separate test suites for each language. That would not favor maintainability and ergonomics of supporting automated assessment.

I made a separate issue for this: #571

@jorg-vr
Copy link
Contributor

jorg-vr commented Dec 5, 2024

This is not about ergonomics. The end result experience for the user should be the same.

This is about separation of concerns. TESTed is already a rather complex project. Adding language support as a preprocessing step makes this a separate maintainable package, without increasing the complexity of the current tested project.

@pdawyndt
Copy link
Contributor

pdawyndt commented Dec 5, 2024

We could actually keep everything out of the hands of TESTed. Since Dodona is calling the judge, we could also have a separate config.en.json and config.nl.json to specify separate locations for an English or a Dutch test suite (with all the other configuration settings shared). This would completely take i18n out of the hands of TESTed, and leave it up to the calling party. Would that be a good user experience (in terms of maintaining config files)?

As an author of multilingual programming exercises I'm definitely concerned about my experience as a user. Having to maintain separate test suites for each individual language is not a good user experience for me, as 99% of these test suites is shared content in my experience. The part that is translated is only a thin layer, so why not keep it that way? So yes, for me this is about keeping things simple. If there was a meeting where we decided that we would keep natural languages outside of TESTed itself, I wasn't in that meeting.

@jorg-vr
Copy link
Contributor

jorg-vr commented Dec 5, 2024

This would completely take i18n out of the hands of TESTed, and leave it up to the calling party.

I would not be against this idea, actually. As it is probably better to call the preprocessor in Dodona on exercise change, instead of on every judge run.
This preprocessor step could also be used to support multi programming language descriptions and configs for exercises if desired.
But the prepocessor itself will always be judge specific as the desired output format is judge specific, so we'll have to think on how to properly design this.

If there was a meeting where we decided that we would keep natural languages outside of TESTed itself, I wasn't in that meeting.

This was brought up by Bart and me in the meeting where you showed us the detailed analysis of your 916 exercises to convert.

But since you have a different recollection of that meeting, I'll try to reiterate my takeaways from that meeting. I think the solution I took away would also resolve your needs.

From your presentation:

  • There is a very large numbers of exercises that support translations, we need to somehow support this in TESTed before they can be converted.
  • These translations often translate a very small set of keys, which are reused a lot (eg function names). We would ideally have some sort kind of templating system to support this and avoid repetition.
  • An ideal templating system would also be able to reduce other repetition often present in TESTed specifications

Bart and I agreed that such a templating system could be relevant for advanced users. Other university courses could also benefit from auto translations in exercises. But we also had some hesitations:

  • We also want to keep the current simple config for non advanced users
  • We do not want to bloat TESTed with non essential features

This is when the preprocessing step was suggested as a solution.
The solution would be to specify a new format that supports templating and translations.
A preprocessing step could then be written to convert this new format to the existing format.
Ideally this preprocessing step should happen in such a way that the end user doesn't have to deal with it.

We didn't discuss implementation details, leaving this to you. But for clarity this is how I imagined the next steps:

  1. Write a way for TESTed to accept a different config based on language (I was thinking of i18n: support separate test suites for different natural languages #571 but your Dodona suggestion might also work)
  2. Write a preprocessor program that creates a config for each supported language
  3. Integrate this preprocessor in such a way that it is hidden from the user (such as running before tested or running after every pull on dodona or creating and publishing a github action)

I think this solution solves all of our requirements.
If you think it doesn't we should schedule another meeting to further discuss this.

@bmesuere
Copy link
Member

bmesuere commented Dec 5, 2024

You were in that meeting Peter, but you are misinterpreting the implications. We're explicitly saying you should not maintain 2 separate files.

The conclusion was that we don't want to complicate TESTed itself with natural language support since that would distract from what TESTed does well and because it would add a maintenance burden to already complicated code. The proposed solution was indeed to feed TESTed a specific test suite file based on the natural language. This would allow novice users to easily benefit from translated output without having to learn new syntax and would require minimal changes to our code.

For advanced users who don't want to maintain separate suite files, we proposed to use a template system that runs as a preprocessor. You basically write a single test suite file tests.yaml.template and the preprocessor outputs a tests.nl.yaml and tests.en.yaml. The templating system could both handle multiple natural languages as well as complex repetitions like you explained and suggested during that meeting. This preprocessor would thus be a separate script.

Since you would be the primary user, we suggested that you would propose a format and see what works and what doesn't from the experience in converting the old Python exercises. Initially, you could run the preprocessor yourself when committing and if it becomes a more mature format, we could run it on Dodona when processing exercise updates or as part of test execution.

@pdawyndt pdawyndt changed the title Support for literal translations (map type !programming_language) Support literal translations (map type !natural_language) Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants