Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I register tables from HADOOP to NESSIE,There is a com.amazonaws.AmazonClientException: #50

Open
sxh-lsc opened this issue Jun 9, 2023 · 6 comments

Comments

@sxh-lsc
Copy link

sxh-lsc commented Jun 9, 2023

My hadoop warehouse is S3a://XXXXXX,
and I add the
--source-catalog-hadoop-conf fs.s3a.access.key=$AWS_ACCESS_KEY_ID,fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY,fs.s3a.endpoint=$AWS_S3_ENDPOINT
Then goes wrong with:

com.amazonaws.AmazonClientException: Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler). Response Code: 200, Response Text: OK
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738)
        at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3480)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:604)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:960)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
        at org.apache.iceberg.hadoop.HadoopCatalog.isDirectory(HadoopCatalog.java:175)
        at org.apache.iceberg.hadoop.HadoopCatalog.isNamespace(HadoopCatalog.java:376)
        at org.apache.iceberg.hadoop.HadoopCatalog.listNamespaces(HadoopCatalog.java:306)
        at org.projectnessie.tools.catalog.migration.api.CatalogMigrator.getAllNamespacesFromSourceCatalog(CatalogMigrator.java:202)
        at org.projectnessie.tools.catalog.migration.api.CatalogMigrator.getMatchingTableIdentifiers(CatalogMigrator.java:97)
        at org.projectnessie.tools.catalog.migration.cli.BaseRegisterCommand.call(BaseRegisterCommand.java:136)
        at org.projectnessie.tools.catalog.migration.cli.BaseRegisterCommand.call(BaseRegisterCommand.java:38)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
        at picocli.CommandLine.execute(CommandLine.java:2170)
        at org.projectnessie.tools.catalog.migration.cli.CatalogMigrationCLI.main(CatalogMigrationCLI.java:48)
Caused by: com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
        at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:150)
        at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:279)
        at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:75)
        at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:72)
        at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
        at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
        at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:712)
        ... 23 more
Caused by: java.lang.RuntimeException: Invalid value for IsTruncated field: 
        true
        at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler.endElement(XmlResponsesSaxParser.java:647)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
        at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:141)
        ... 29 more
@ajantha-bhat
Copy link
Contributor

Is the warehouse path for the source catalog is same as what is configured from with the engine like spark when the tables are created with the hadoop catalog?

I usually export AWS credentials in the env variable

export AWS_ACCESS_KEY_ID=xxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxx
export AWS_S3_ENDPOINT=xxxxxxx

and also configure file-io in the catalog properties io-impl=org.apache.iceberg.aws.s3.S3FileIO

@sxh-lsc sxh-lsc closed this as completed Jun 12, 2023
@sxh-lsc sxh-lsc reopened this Jun 12, 2023
@sxh-lsc
Copy link
Author

sxh-lsc commented Jun 12, 2023

Yes it is the same path as you said,but sorry I don't quite understand what problem this will cause?
All the env variables you mentioned are exported.

@ajantha-bhat
Copy link
Contributor

Yes it is the same path as you said,but sorry I don't quite understand what problem this will cause?
All the env variables you mentioned are exported.

I am not sure what causes this. But I found a similar issue discussion here. Seems to be AWS specific.
https://knowledge.informatica.com/s/article/517098?language=en_US

Are you sure you have configured file-io in the catalog properties io-impl=org.apache.iceberg.aws.s3.S3FileIO ?

@sxh-lsc
Copy link
Author

sxh-lsc commented Jun 13, 2023

Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler). Response Code: 200, Response Text: OK at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738) at

yes,I am sure.This below is my command.
java -jar iceberg-catalog-migrator-cli-0.2.0.jar register --stacktrace \ --source-catalog-type HADOOP \ --source-catalog-properties warehouse=s3a://**/***/***,io-impl=org.apache.iceberg.aws.s3.S3FileIO \ --source-catalog-hadoop-conf fs.s3a.access.key=$AWS_ACCESS_KEY_ID,fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY,fs.s3a.endpoint=$AWS_S3_ENDPOINT \ --target-catalog-type NESSIE \ --target-catalog-properties uri=http://l***:19120/api/v1/,ref=main,warehouse=s3a://***,io-impl=org.apache.iceberg.aws.s3.S3FileIO
The error messages seem like S3 list object went wrong,I found some people use .withEncodingType("url") to fix it,maybe it is about the aws S3 version?

@ajantha-bhat
Copy link
Contributor

Which version of Iceberg are you using? I will also try once locally.

As a workaround, you can pass the list of identifiers in --identifiers option

@sxh-lsc
Copy link
Author

sxh-lsc commented Jun 14, 2023

Which version of Iceberg are you using? I will also try once locally.

As a workaround, you can pass the list of identifiers in --identifiers option

I use iceberg v1.2.0. But it is already included in this tool, isn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants