With the tremendous increment in reach of the internet, every aspect of communication in our lives are slowly turning into digital modes of communication. Massive Open Online Courses (MOOCs) allow the learners imbibe quality learning at their own comfort through form of digital learning. MOOCs check the quality of learning among their users by multiple subjective and objective tests that ensure quality learning by the users, but these results can often hoodwink the teachers since these quizzes can be given by other accounts and know the answers beforehand. We propose a network that requires no extra hardware, and low computation power , that can help users as well as the teacher estimate the engagement levels of the learner while consuming the resources. Our network uses features of the user's eyes as well as face, combined with the salient features extracted from the MOOC video to estimate attention. This can help learners learn more efficiently, as it can alert the user when their engagement levels are low, as well as let the teacher know so that they can improve on the sections of their lectures where the learners are losing their interests.