I am trying to write casacading (v1.2) casade ( http://docs.cascading.org/cascading/1.2/userguide/htmlsingle/#N20844 ) consisting of two streams:
1) The first thread displays url in the db table (in which the identifier is automatically assigned using the value of auto-incrementing id). This stream also outputs URL pairs in the SequenceFile with the field names " urlTo ", " urlFrom ".
2) The second stream is read from both of these sources and tries to make CoGroup on " urlTo " (from SequenceFile) and " url " (from db source) to get a db record " id " for each " urlTo ".
He then does a CoGroup on " urlFrom " and " url " to get a db record " id " for each " urlFrom ".
Two threads work individually - if I call flow.complete () on the first, before starting the second thread. But if I put two threads in a cascading object, I get an error
cascading.cascade.CascadeException: no loops allowed in cascade, flow: urlLink*url*url, source: JDBCTap{connectionUrl='jdbc:mysql://localhost:3306/mydb', driverClassName='com.mysql.jdbc.Driver', tableDesc=TableDesc{tableName='urls', columnNames=null, columnDefs=null, primaryKeys=null}}, sink: JDBCTap{connectionUrl='jdbc:mysql://localhost:3306/mydb', driverClassName='com.mysql.jdbc.Driver', tableDesc=TableDesc{tableName='url_link', columnNames=[urlLinkFrom, urlLinkTo], columnDefs=[bigint(20), bigint(20)], primaryKeys=[urlLinkFrom, urlLinkTo]}}
when trying to configure a cascade.
I see that this comes from the addEdgeFor function of the CascadeConnector , but I do not understand how to solve this problem.
I have never used the Cascade / CascadeConnector . Is something missing?
hadoop cascading
Katie
source share