duplicate-documet-image-finder

A JavaScript library to find duplicate document images.

How does it work?

It extracts the text of images using OCR and uses levenshtein distance to calculate the similarity between two texts.

Methods and Interfaces

find. Find the duplicated images. You can pass your own OCR results.

async find(images:HTMLImageElement[],textLinesOfImages?:TextLine[][],progressCallback?:any):Promise<HTMLImageElement[]>

TextLine

export interface TextLine{
  x:number;
  y:number;
  width:number;
  height:number;
  text:string;
}

Install

Via NPM:

npm install duplicate-document-image-finder

Via CDN:

<script type="module">
  import { DuplicateDocumentImageFinder } from 'https://cdn.jsdelivr.net/npm/duplicate-document-image-finder/dist/duplicate-document-image-finder.js';
</script>

License

MIT

duplicate-document-image-finder

duplicate-documet-image-finder

Methods and Interfaces

Install

License

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

duplicate-document-image-finder

duplicate-documet-image-finder

Methods and Interfaces

Install

License

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads