2

In a spark job, I am using

.withColumn("year", year(to_timestamp(lit(col("timestamp")))))

This code used to work. But now I get the error :

"cannot resolve 'CAST(`timestamp` AS TIMESTAMP)' due to data type mismatch: cannot cast struct<int:int,long:bigint> to timestamp;"

I looks like spark is reading my timestamp column as a struct<int:int,long:bigint> instead of a int

How can I prevent that ?

Context the initial data is in jsonline. I read it using AWS GLUE glueContext.create_dynamic_frame.from_catalog. In the GLUE catalog the timestamp column is typed int.

2
  • can you show the schema by df.printSchema()?
    – pltc
    May 19, 2021 at 20:38
  • 1
    I found a way to force the type of timestamp before parsing it
    – Hugo
    May 19, 2021 at 20:53

2 Answers 2

3

Finally I solved it this way :

GF_resolved = ResolveChoice.apply(
    frame=GF_raw,
    specs=[("timestamp", "cast:int")],
    transformation_ctx="resolve timestamp type",
)

ResolveChoice is method avaible on AWS Glue DynamicFrame

1

The short answer is that you cannot prevent it if creating a dynamic frame from catalog because, as the name suggests, the schema is dynamic. See this SO for more information.

Alternative approach that is a little more compact is...

gf_resolved = gf_raw.resolveChoice(specs = [('timestamp','cast:int')])

Official documentation for the resolve choice class can be found here. AWS Resolve Choice

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.