...
In order to train the pyspark model, you have to install spark.
TODO: see if we can get the spark model to run. If it works, describe it here.
TODO: done up to here.
Comparison of different machine learning frameworks: pyspark, Scikit-Learn, Tensorflow. In particular, evaluation of Spark for ML:
...
Reduced data set (with the purpose of reducing bias): 7000 samples, 7 classes (1000 samples per class)pySpark implementation: long training times (several hours on a virtual machine with 16 cores, 32GB RAM). This could be due to some configuration error and shows that more effort is needed to tune a pySpark model / installation.
Scikit-Learn implementation: training times in the order of magnitude of a few minutes.
Tensorflow: offers other methods and is therefore difficult to compare with the previous two wrt performance.
Open tasks: code consolidation, further evaluation of results, Spark environment configuration.