Jobs
Please send your resume and cover letter to job at internetmemory dot org.
The Internet Memory Foundation thanks all applicants for their interest, but
advises that only those selected for an interview will be contacted.
Development engineer internship, boilerplate
detection
Development engineer internship, datasets
Development engineer internship, execution-based
crawler
Development engineer internship, crawl patterns
detection
Development engineer (May 2010)
Python Web Developer (April 2010)
Crawl Engineer (April 2010)
Distributed Architecture Developer (April 2010)
The Internet Memory Foundation (formerly European Archive) offers a position
in an innovative and dynamic workplace, within a small and growing team
dedicated to culture.
Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to
improve web archives quality and completeness.
Mission
To deal with full text search and mining of our archive, a whole natural
language processing stack is used. One fundamental step of web pages analysis
is the boilerplate elimination: getting rid of advertisement, navigation bars,
footers and the like.
The goal of this internship is to compare existing tools and improve the
quality of the boilerplate removal. The tasks, under the supervision of an
engineer, will include the design, technical specifications, implementation
(code and automatic tests) and documentation.
Profile
- Completing the last year of a master degree in computer science
- autonomous, team player
- HTML, javascript, DOM
- web protocols knowledge
- a python development experience a plus
- knowledge of Linux a plus
- Erlang or other functional language experience a plus
- Good command of English
- French a plus
Details
- Contract type: internship, full time, 5 months
- Location: Montreuil (métro ligne 9 Robespierre or RER A
Vincennes)
- No telecommuting
- 1500€ per month
Please mention "boilerplate detection internship" in the subject
of your application.
The Internet Memory Foundation (formerly European Archive) offers a position
in an innovative and dynamic workplace, within a small and growing team
dedicated to culture.
Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to
improve web archives quality and completeness.
Mission
The web hosts vast amounts of tabular data. Compiling all this information
can yield interesting results. However, detecting and processing this data is a
challenge.
The main goal of this internship is to study and implement methods to detect
tabular data in our archive, and classify it. The tasks, under the supervision
of an engineer, will include design, technical specifications, implementation
(code and automatic tests) and documentation.
Profile
- Completing the last year of a master degree in computer science
- autonomous, team player
- a python development experience a plus
- web protocols knowledge a plus
- knowledge of Linux a plus
- Erlang or other functional language experience a plus
- Good command of English
- French a plus
Details
- Contract type: internship, full time, 5 months
- Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
- No telecommuting
- 1500€ per month
Please mention "datasets internship" in the subject of your application e-mail.
The Internet Memory Foundation (formerly European Archive) offers a position
in an innovative and dynamic workplace, within a small and growing team
dedicated to culture.
Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to
improve web archives quality and completeness.
Mission
The web is getting more and more dynamic: javascript-generated content,
including AJAX, and flash applications are pervasive. This hinders traditional
crawlers that rely on simple regular expression search to find links on web
pages. New approaches involving execution of the web pages have
emerged. However, they are usually very resource-intensive, preventing large
scale use.
The goal of this internship is to study execution-based techniques, and try
to eliminate the graphical rendering to save on resources. The quality and
performance of the different methods will have to be contrasted. The tasks,
under the supervision of an engineer, will include the assessment of different
methods, and the design, technical specifications, implementation (code and
automatic tests) and documentation of the necessary software to experiment with
headless crawlers and integrate it into our crawl infrastructure.
Profile
- Completing the last year of a master degree in computer science
- autonomous, team player
- web protocols knowledge
- HTML, javascript, DOM
- a python development experience a plus
- knowledge of Linux a plus
- Erlang or other functional language experience a plus
- Good command of English
- French a plus
Details
- Contract type: internship, full time
- Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
- No telecommuting
- 1500€ per month
Please mention "execution-based crawler internship" in the
subject of your application e-mail.
The Internet Memory Foundation (formerly European Archive) offers a position
in an innovative and dynamic workplace, within a small and growing team
dedicated to culture.
Internet Memory is involved in European research projects such as LiWA (http://liwa-project.eu/) whose purpose is to
improve web archives quality and completeness.
Mission
Web crawlers fetch resources from the web, scanning each one for new
links. This enables the discovery of many parts of web sites starting from a
few entry points. However, legitimate dynamic content or specifically crafted
crawler traps can get a crawler to fetch an endless stream of useless
resources.
The goal of this internship is to determine which crawl patterns indicate a
trap, implement a detection module and integrate it into our crawl
infrastructure. The tasks, under the supervision of an engineer, will include
the design, technical specifications, implementation (code and automatic tests)
and documentation of the necessary software.
Profile
- Completing the last year of a master degree in computer science
- autonomous, team player
- web protocols knowledge
- HTML, javascript, DOM
- a python development experience a plus
- knowledge of Linux a plus
- Erlang or other functional language experience a plus
- Good command of English
- French a plus
Details
Contract type: internship, full time
Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
No telecommuting
1500€ per month
Please mention "crawl patterns detection internship" in the
subject of your application e-mail.
The European Archive foundation offers a position in an innovative and
dynamic workplace, within a small and growing team dedicated to culture.
The European Archive is involved in European research projects such as LiWA
(http://liwa-project.eu/) whose purpose
is to improve web archives quality and completeness.
Mission
Design, technical specifications, implementation (code and automatic tests),
documentation and maintenance of the archival platform.
Profile
- Master degree in computer science
- 0 to 3 years of experience
- autonomous, team player
- web protocols knowledge required
- knowledge of Linux
- a python development experience a plus
- Erlang or other functional language experience a plus
- Pylons, Django experience a plus
- Good command of English
- French a plus
Details
- Contract type: CDI (full time)
- Location: Montreuil (métro ligne 9 Robespierre ou RER A
Vincennes)
- No telecommuting
Please send your resume and cover letter to jobs at europarchive dot org with
the subject line "development engineer". The European Archive thanks all
applicants for their interest, but advises that only those selected for an
interview will be contacted.
The European Archive foundation offers a position in an innovative and dynamic workplace, within a small and growing team dedicated to culture.
Mission
Design, technical specifications, implementation (code and automatic tests), documentation and maintenance of the user interface to the on-line archival platform (used to launch and monitor crawls, for quality assurance...).
Profile
- Master degree in computer science
- Autonomous, team player
- A python development experience is required
- A web interface development experience is required (HTTP, HTML, CSS, MVC design, Ajax, SQL)
- Pylons, Django experience is a plus
- Knowledge of Linux
- Good command of English
- French is a plus
Details
- Contract type: Permanent (full time)
- Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
- No telecommuting
Please send your resume and cover letter to jobs at europarchive dot org with the subject line "Web Developer". The European Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.
The European Web Archive is a new Web Crawler Engineer to join our Paris-based team and help us archive the Internet and preserve this information for future generations.
Find out more about our organization and web archive at www.europarchive.org
Your responsibilities include:
-
Running a set of tools including several web crawlers to collect content from the Internet.
-
Work with the QA team to ensure it is complete and of highest quality
-
Monitoring all production systems using automated tools
-
Working directly with our partner National Libraries, Archives and Universities to collect specific content on the Internet for preservation
Experience Needed:
-
Excellent knowledge of HTML, Javascript and Web technologies in general
-
Extensive use of Linux shell scripting
-
Experience in Internet protocols (HTTP is a must have)
-
Able to work in loosely structured start up work environment
Education:
-
Computer Science Bachelor, Master or equivalent work experience
Details
- Contract type: Permanent (full time)
- Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
- No telecommuting
Please send your resume and cover letter to jobs at europarchive dot org with the subject line "Web Crawl Engineer". The European Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.
The European Archive is looking for an experienced developer to join our Paris-based engineering team to participate in the development of our distributed web archiving infrastructure. Find out more about our organization and web archive at www.europarchive.org
Your responsibilities include:
-
Participate in the specification of our evolving distributed web archiving platform.
-
Develop and integrate modules of the platform
Experience Needed:
-
Excellent knowledge in distributed platform development
-
Fluent in Python and Erlang
-
Experience in Internet protocols (HTTP is a must have)
-
Able to work in loosely structured start up work environment
Education:
-
Computer Science PhD, Master or equivalent work experience
Details
- Contract type: Permanent (full time)
- Location: Montreuil (métro ligne 9 Robespierre or RER A Vincennes)
- No telecommuting
Please send your resume and cover letter to jobs at europarchive dot org with the subject line "Platform Developer". The European Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted.